February 10, 2015

Relational to Hadoop by Oracle (CopyToBDA)

From the Oracle Product Development team comes a fantastic feature to offload data from RDBMS to Hadoop. It's called CopyToBDA!

Here's the description. Thanks to Melli Annamalai for the write up:

"Copy to BDA is a feature of Oracle Big Data SQL. Copy to BDA enables copy of Oracle Database tables to Oracle Big Data Appliance for query with Hive.    In this version, the high level steps are:

       - Create an external table with the ORACLE_DATAPUMP access driver.  This access driver will create an external table and populate the external Data Pump format files with Oracle Database table data.  
      - Copy the data pump files to Oracle Big Data Appliance.
      - Create a Hive table on the data pump files on the Big Data Appliance.  The data pump files can be queried with Hive and any Hadoop application that can access a Hive table.

 Future plans include the ability to write external data pump files directly to HDFS (via FUSE) during external table creation."

So what does this mean?

It means that Oracle is giving you the cake and the ability to slice and eat it! Big Data SQL is a technology that allows you to use HDFS (Hadoop Distributed File System) as yet another storage layer of your RDBMS. It means that you can offload some data (historical) to Hadoop and keep your code and processes intact. No code changes. How?

You decide which data to move to HDFS by using the CopyToBDA utility, and then by using the power of Big Data SQL you can combine your queries on data that sits botg on Hadoop and on RDBMS. There should be no room for religious discussions on which is better, by letting data reside on where data should be.

What's the criteria for splitting? There is no rule of thumb, but low level non-transactional data is a good fit for HDFS as well as historical transactional data. Large to Medium online transactional data should reside on RDBMS unless your apps were built with a different architecture.




