February 22, 2012

I've touched a Big Data Appliance... and I'm still the same

This week I'm in Reading, UK to be hadoopized by Cloudera and all of the sudden: wham! bam! The Oracle Big Data Appliance (BDA) arrives to the Oracle Solution Center! Only due to Mr. Bayliss courtesy I've managed to go inside the data center with other colleagues and see/touch the latest and greatest Engineered System by Oracle. It's a 36 rack units worth of raw computing power with 216 cores of Intel 5675's and 648 TeraBytes of capacity. Each node with 48GB of RAM and all the nodes connected with infiniband networking technology, both inside the cluster and for the outside. Meaning it can easily connect to an Exadata cluster and maintain the high throughput numbers.

Nevertheless after touching it and photographing it I've managed to get myself together :-). The magic is not in the hardware itself of course, but on what runs inside! And that is Cloudera's Distribution of Hadoop version 3 (CDH3), plus Oracle's Big Data connectors and eventually the Oracle NoSQL database. I say eventually because it's not a mandatory component, whereas the connectors are.

Even more magic is what you can do with CDH3. We've been toying around with it and until now I have to say that I totally subscribe Cloudera's vision that the biggest elephant in the room is not the yellow one; but the lack of manpower to master MapReduce programming. I'll drill down into this plethora of technology that looks like a Zoo, or a Pokemon mise en scene in the near future in this blog. Stay tuned.

In the following days I hope I can accomplish a bit more than just count words with CDH3! Although I still don't know what will I do with the invaluable information and insight that came from knowing that Shakespeare mentions in his entire literature, the word "Oracle" 27 times. Welcome to Big Data.

Here are some snaps:

The expression Big Data materialised onto a rack looks sci-fi, but it's not!

What might look sci-fi is a yellow elephant and what is he doing in my tennis shoe?

** UPDATE **

In the meantime Oracle has launched 2 more hardware versions of the BDA, and a large number of software version. The software included is also growing in terms of options. The BDA has now an exclusive software option called Big Data SQL (BDS). Big Data SQL gives you the power of Oracle SQL on top of data that sits in Hadoop. More over, BDS gives you offloading capabilities much like what you have in Exadata storage software. This is a major breakthrough for companies bringing new data sets into their data centers, but want to keep their tradicional SQL-based tools of processes. Before BDS data would have to be moved around between Hadoop and the Oracle relational database. Now you can leave data where it sits and apply logic to it, but also security policies.

Other new development from the early ages of the BDA is the existence of all of Cloudera's ecosystem comprised of such tools as BDR, Navigator, Search and Impala. Shark was also added afterwards. The amount of software that comes packed in this system is immense. Oracle has also unlocked some direct connections to the Hadoop world by means of an enriched set of Big Data Connectors and an OBIEE Hadoop gateway. More Big Data Connector news since then: 1) you now can process XML data staged in Hadoop into XML DBs, 2) the ability to have more Knowledge Modules (KMs) in Oracle Data Integrator. KMs that will produce transformation based not just on Hive but also on Shark and Impala. Also the ability for Oracle Golden Gate to deliver data to Hadoop in parallel threads is a ground breaking development. I'm sure much more will come, making this a truly integrated ecosystem and an exciting platform to develop applications on top of.

No comments:

Post a Comment