How exactly does Essbase work? Part I

This blog post started out as a (another?) blog post about database fragmentation and it quickly went down a completely different – and exciting – path.  Well, exciting to me.  I considered naming the title of this post “Okay, but why?” and “Attack of the B trees!” and a myriad of other thoughts going through my head during my research phase.  However, in my quest to defragment my database I was given a glimpse into how Essbase (BSO) databases work.  These series of posts are not absolute statements, nor are they ones in which I claim to know everything about the guts of Essbase. I am simply sharing what I have learned over the past week or so.  In fact, if others out there have more to share or find an incorrect statement in my post, please post a comment!  We will all be better for it.

My history of learning Essbase is very solitary.  I was not fortunate enough to be mentored by a sage admin for years.  I learned in bits and pieces and understood that certain Essbase truths existed and that I should follow them…but didn’t know why and didn’t know whom to ask why.  This blog is turning from a “here’s the error code…here’s how to fix it!” to a learning of the inner workings – and how they apply to Essbase “truths”.  This does lead in to errors and how to fix them…and why they are errors.  This will be evolving and should not be seen as an end all, be all for the guts.  Let’s dive in…

A couple weeks ago I had a horribly fragmented database.  To take the database down, export the data and reload the data was just not an option since it was prime data loading time.  Why was the time a factor?  To export the data at level 0 would take hours…I just knew it (and, in fact, once I was able to export the data, it took 9 hours) because the database was so fragmented.  So…how did I know the database was fragmented?  I was in EAS and paid attention to a certain statistic -> Average Clustering Ratio.  It was at .38.  POINT THREE EIGHT.  I also looked in the application’s database folder on the backend and saw that I had 3 index files and 12 page files.  We usually have 1 index and 4 page files.  Since I knew I could not export the data for a couple days, I decided to research fragmentation a bit more just to be more informed.  We have a script that causes pretty bad fragmentation no matter what I try so I was trying to research more of what I could do to lessen the fragmentation until we could go through the defragmentation process.

I have always been a MaxL person…never really have used Esscmd.  Until now.  In my research of fragmentation I came across a command in Esscmd that I have learned to love in the past week -> GETDBSTATS.  I ran this script in my test database and saw the following information about my application/database:

Calc_DBSTATS_1 I started looking through some of the statistics and saw items I expected to see: number of dimensions, block size, max number of blocks, etc.  I also saw something I had seen on my database statistics tab in EAS -> Average Clustering Ratio.  I started thinking…what exactly is the Average Clustering Ratio?  Well, I found out, this ratio is the Essbase indicator of how sequential your data is in your ind/pag files.  A ratio of “1.00” means that all of your blocks are ordered correctly per the outline.  Loading data and performing calculations on the data will make your sequence become out of order.  But, why?  When you write back to a block (think multiplying a block by a given rate), Essbase cannot always fit your new data back in to the old slot.  So, Essbase will, essentially, append the updated record to the end of the data record instead of putting it back in the original slot and/or removing the old block.  Ergo – fragmentation.

When fragmentation happens, it is important to defragment your database (duh).  The way to defragment your database is to perform a restructure of your database so that the index and page files get back in line.  There are 2 different kinds of restructures: Implicit and Explicit.  Implicit restructures result from changes to the outline and empty blocks are not removed.  Within implicit restructures lie 3 different types of restructures: Dense, Sparse and Outline-only.  Dense restructures occur when a member of a dense dimension in the outline is moved, deleted or added.  The index is recreated automatically so that the index points to the new data blocks.  This restructure takes the most time.  A sparse restructure is when a member of  a sparse dimension in the outline is moved, deleted or added.  Essbase will restructure the index and create new index files.  The final type of implicit restructure is the outline-only restructure.  In this type of restructure, the index and data files are not touched.  These result from member name changes, alias additions/changes and dynamic calc formula changes, etc…ie: items that hold no data.  Explicit restructures in a BSO cube happen when an admin physically initiates a restructure (think right-clicking on a database and selecting “Restructure…”).  By choosing this method, Essbase will perform a dense restructure and actually remove the empty blocks.  Similarly, you can choose to perform an explicit restructure in MaxL by executing the following:

alter database ‘Sample’.’Basic’ force restructure;

So let’s go to my test database.  My Average Clustering Ratio is a 0.2533565.  Ouch.  But (!), I did know this was going to happen given the calc I was running.  So what does this mean?  My database index and data files (pag files) are about 75% outta whack.  In fact, look at the screenshot of the files:

Calc_1_IndPag For comparison, here is how the structure normally looks:

Normal

With seeing what I was seeing and knowing what I had just learned, I decided to defragment the database the old fashioned, tried and true way – export the data as level 0, clear the data and reload the data back to the database.  Once I did this, my Average Clustering Ratio was back at 1 and my index and data files were back to normal.  Calcs were running the right amount of time and data exports took their normal time (not 9 hours!).

What ties into this ratio is the Average Fragmentation Quotient.  From my example, you can see that my number is 7.681853.  What is this quotient, you ask?  This quotient measures the amount of free space in your database.  If we go back to Essbase appending a data block that doesn’t fit anymore to the end of the data file, you have empty spaces.  This number tells you have fragmented your database is.  In my case, my database is 7.68% fragmented.

So…how can you prevent fragmentation?  Well, this will lead into my next article in the series.  You may have seen another statistic on the Esscmd screen – Index Type of B+ Tree.  I did not know what this was, so I started researching…and what I found made so many things click, including how to prevent fragmentation.  Not to speak too soon on the why’s, but you can prevent fragmentation by sorting your data loads based on your outline.  Sort your data starting from the bottom of your outline and moving to the top…or (generally speaking) the largest sparse dimension to the smallest followed by the smallest dense dimension to the largest.  This will force Essbase to only open a block of data once for data loading versus multiple times which is one cause of fragmentation.  Also, make the data source as small as possible; use headers in your data files whenever possible.  But…why?  Be sure to check out Part II.  🙂

So what’s Part II about?  B+ Tree indexes.  The fun is about to begin.

 

2 comments

  1. Sarah, love the post and can’t wait for the next one. Just a comment about your statement “When you write back to a block (think multiplying a block by a given rate), Essbase cannot always fit your new data back in to the old slot. So, Essbase will, essentially, append the updated record to the end of the data record instead of putting it back in the original slot and/or removing the old block. Ergo – fragmentation.”. This has been debated a lot and I finally got Cameron and others to agree with me. Except for Exalytics and version 11.1.2.3.500 Essbase never has written a block back to the same location. This is\was to prevent corruption in case the server crashed during an update. Essbase would always write a changed block to a new location and then flip a pointer in the index(.ind) file to the new location. The fact that they just added a special setting for Exalytics to write back to the same location proves it.
    What could happen is that Essbase keeps a file of the “openings” created in the disk the databasename.ind file, and will try to reuse that space if it can. I don’t think it is very good at it but it does reduce the additional disk used somewhat.

    1. Hi Glenn – Yes, Cameron informed me of this last week via email. 🙂 As I told him, it gets rather muddy when someone from the Oracle CEAL team says one thing, but reality is another. Thanks for replying; I do intend to do a follow up to this post next week with what I have been told.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s