October 27, 2009

Some SQL*Loader tips about masks

I've always been a fan of SQL*Loader, if not for nothing else, for the fact that today along with SQL*Plus, this is one of the few products that still has the "SQL*" prefix, so famous in the late 80's, and throughout the 90's in Oracle product catalogue. Today SQL is a acronym that is no longer associated with Oracle. Unfortunately for Oracle, MicroSoft had done one of the most successfull brand spinnings in the history of IT, by taking over the SQL "word" as the name of their main brand of database products. Well but since Oracle had done the same with IBM, and probably MySQL will do the same with SQL Server, the good old Oracle utilities remain among the stainless steel products of modern IT.

But what is SQL*Loader for? Well, in fact it has very little to do with SQL but more with loading. It's a command line utility for loading data into an Oracle database. It has been around for years, and when someone hands over to me data in some weird format so I can process it, my immediate reaction is to load it into a table in Oracle with SQL*Loader and then take it from there. After information is inside a table inside an Oracle database, I feel I can do anything!

The base of SQL*Loader is the Control File. Sadly the name is the same as the database controlfile, but the contents are different. This is a text file where you "programme" how SQL*Loader will interpret the raw data.

I'll give an example using a TomCat logfile which has entries in the following format:
[09:11:25] INFO :SQL: SELECT column FROM table

Our goal here is to load into a table the SELECT statement. There are lots of ways of doing so, but I'll choose the character delimiter aproach.

First we create a table called MYTABLE with two columns: one for the rubbish ("[09:11:25] INFO :SQL") and another one for the SQL statement:
create table MYTABLE (col varchar2(20), command varchar2(4000)) tablespace whatever;

The delimiter string will be the two points and the space which is unique in the whole entry: ": "

After that let's build a SQL*Loader controlfile with the following code (we'll name the file "mytable.ctl"):
LOAD DATA

INFILE 'tomcat_logfile.dat'

REPLACE INTO TABLE mytable

fields terminated by ": "

trailing nullcols

(col,command)

After that you just have to build a shell script with this command:
$ sqlldr user/password CONTROL=mytable.ctl LOG=mytable.log BAD=mytable.bad

And your data should load into the table like a charm!

The problem arises when the SQL Statement from the TomCat logfile is big but still less than 4000 (much less. I had 347 chars long statement blowing in my face). In these situations you will get the following error (the rejected records go into the file "mytable.bad"):
Record 207: Rejected - Error on table MYTABLE, column COMANDO.
Field in data file exceeds maximum length

The 207 is just a number that references the order of the entries to be loaded into the database. The rest of the error is very very intriguing, since our column is 4000 long. But... does SQL*Loader knows that?

Does it have to know? Well, it does not make any sense but the answer is "yes". SQL*Loader will better know how long
your destiny columns are in order to load big entries sucessfully.

So the changes made to the controlfile are in bold:
LOAD DATA

INFILE 'tomcat_logfile.dat'

REPLACE INTO TABLE mytable

fields terminated by ": "

trailing nullcols

(col char(20),comando char(4000))

After this I've loaded and the log says everything is tip-top:
Table MYTABLE, loaded from every logical record.

Insert option in effect for this table: REPLACE

TRAILING NULLCOLS option in effect

Column Name                  Position   Len  Term Encl Datatype

------------------------------ ---------- ----- ---- ---- ---------------------

COL                                 FIRST     *           CHARACTER

Terminator string : ': '

COMANDO                              NEXT  4000           CHARACTER

Terminator string : ': '

value used for ROWS parameter changed from 64 to 60

Table MYTABLE:

5568 Rows successfully loaded.

0 Rows not loaded due to data errors.

0 Rows not loaded because all WHEN clauses were failed.

0 Rows not loaded because all fields were null.

The same thing applies to date columns, where you have to state the format in the controlfile.

Here's an example of a controlfile you should use when loading date data that will go into date columns:
LOAD DATA

(...)

(date_column DATE "YYYY-MM-DD",

(...)

In our TomCat example one might want to load the log timestamp, so we can still use a delimiter character ("]") and load it in two columns like this:

 



SQL> alter table mytable add col2 date;

Table altered.

Now load into the new column the time of the TomCat log entry:
LOAD DATA

INFILE 'tomcat_logfile.dat'

REPLACE INTO TABLE mytable

fields terminated by ": "

trailing nullcols

(col2 DATE "[HH24:MI:SS",comando char(4000))

Please note that I've added the open of the square backets inside the date mask and it works!! But this is not the only intriguing thing, since I haven't added the day-month-year part in the date mask the database assumed the first day of the current month: "1st October 2009".

Neat!

Actually the SQL*Loader controlfile syntax is very rich and has many more features than the ones I've stated, but with the controlfile examples I've gave here and a little imagination, you can load virtually everything into an Oracle database that comes in text format.

Just another tip: if you're on your client system (PC) and want to load stuff with SQL*Loader you have to install Oracle Client and guarantee that the "Oracle Utilities" are installed.

With a TNS connection and SQL*Loader on your PC you are then good to go and start loading data, or even build your own interface.

LMC.

No comments:

Post a Comment