May 10, 2010

Exadata Storage Server Hybrid Columnar Compression (EHCC)

Since I first wrote the article I'm about to publish, there were several developments until last 27th April when I presented it in public at the Extreme Performance Day in Lisbon, and internally at the "Exadata Top Guns" event in Berlin, 11th April.

So I decided I could make it public. It's the story of my first Exadata PoC trying to prove out the EHCC compression ratio. Hope you all like it and can learn and comment it.


It was the second day of my second week at Oracle as an employee. I've been cruising the Oracle hallways for two decades, but always wearing other hats. I've even represented Oracle when as a freelance consultant I was an Oracle University trainer. But this is it. This is the real deal. I've made it into the lion's den!
So I was trying to calmly settle myself into this huge organization that is Oracle Corp, when I got the call: "You have to go and do a Proof-of-Concept [PoC] at this customer about how Exadata compresses data". No panic, only thrill. The customer, TIMw.e., a mobile content producer and seller, based in Portugal, but with offices in four continents, was one of my first clients back in 2006 when I was an independent consultant. So no news here. Doing a PoC, fine. No news either.
But the hard one here was how would I return at a site where I had accountability and play that asset into making them at ease with Exadata Hybrid Column Compressing (EHCC) enough so they could buy the DB Machine. Technical stuff was not a problem. I've already had a CentOS 5 Virtual Machine setted up with Oracle 11gR2, and I just had to apply here a patch in order to be able to emulate the algorithm of EHCC. I've applied patch n.o 8896202 on my box and ran a coupple of scripts in order to get the DBMS_COMPRESSION package created. This package would emulate EHCC algorithm and tells you the magic number: compression ratio. By using the following method (PL/SQL procedure):
You would feed it with parameters like username, table, tablespace and desired method of compression and the procedure would sample a million rows out of your table and give you back the magic number. The four methods of compression are divided into 2 categories and inside these two categories, a high and a low method. These are the combinations:
1.   For Query Low Compression
2.   For Query High Compression
3.   For Archive Low Compression
4.   For Archive High Compression
The "For Query" methods lets you run OLTP without any compromise, and the "For Archive" methods can achieve astonishing ratios, but applies at historical data. I've decided to have a number in my hand prior to head off in to the customer site. And so I've created a table with highly random data using the following SQL:
AS SELECT dbms_random.normal ColNum1,
dbms_random.random ColNum2,
dbms_random.value ColNum3,
dbms_random.string('a',40) ColStr1
FROM dual
CONNECT BY level <= 1000000;

And then I ran the DBMS_COMPRESSION.GET_COMPRESSION_RATIO with method n.o 2 ("For Query High Compression") and I've got a ratio of 1.5. Not exactly what I was expecting, but at least with this number I would better manage expectations and not compromise myself or the corporation with any ratios.
And so I went to the customer's office and sat down with the CTO and their DBA. The plan would be to run the algorithm into a single partition of a table that receives each day about 20 million records.
We began late, and so only by 07:20pm did their DBA handed over me the data so I could load into by 11gR2 box. Whilst I was importing the whole data, I was curious to see what kind of ratio could I achive. And so by the time the import process had imported about 2.5 million records, I launched the HCC simulation onto this table and I got the following output:
SQL> set serverout on
SQL> exec get_ehcc_cr(2);
Compression Advisor self-check validation successful. select count(*) on both
Uncompressed and EHCC Compressed format = 1000001 rows
COMPRESSED_TYPE = "Compress For Query High"

PL/SQL procedure successfully completed.

I could only see the last line saying the compression ratio was 8. Eight!

I freaked out and immediatly punched a huge puncher the customer has in the middle of the open space. The CTO jumped out of his chair and came to see the (partial yet conclusive) result. He started to talk some mambo-jambo speach about Oracle writting a statement of commitement where we would compromise with a compression ratio of at least 5. I laughed and saw his excitement as a positive thing. Of course Oracle wouldn't write such a document, but the customer was thrilled and getting that much out of a PoC was enough for me. I've ended up loading the whole 20 million records, arriving home
after midnight and crunching the numbers again, this time with the whole 20 mil. I got a solid 8.2 and went to bed happy for my first PoC at Oracle Corp.


It turned out this customer bought the box, and are becoming a worldwide Oracle Exadata reference.


  1. You've done it once again! Great writing.

  2. Can you help me?
    see this case ... I have a table with 70 million records. all final months of a new phaco load via INSERT over 2 million records, this table is to partition or sub partition, but the sub partitions does not seem to have the partitions EHCC active.
    I researched and found that I need to run all partitions one: alter table move subpartition vendas.tabela unique_data compress tablespace query is high, for all partitions.
    my question is: is there a faster way of doing this procedure since we have a volume of 130 subpartitions?
    thank you

  3. Subpartitions and compression pose some restrictions. One of them is that you can't have subpartitions with different types of compression from the get go. Either manually (as you're already doing) or using these tips from the oakies: