October 23, 2009

Smart Flash Redo Logging and the Best way to deal with ARCHIVED LOGs

21 Oct 2011
Smart Flash Redo Logging: A small step from development a giant leap indatabase performance
Since the first ages of Oracle Database Architecture, one of the most common bottlenecks in high load transactional systems could be the speed of the disks where the log files were stored. One of the symptoms of this problem would be an upsetting event called “Log File Sync”. Basically the hiccups on the redo log files storage, no matter how good the latency was, would create a cascading effect on the performance of the whole logging subsystem affecting the whole system's responsiveness.

An apparent solution would be to put redo log files inside SSD disks or any other low latency storage solution. But you will eventually hit the wall and its cascading effects, if the storage couldn't cope with the amount of transactions that it's supposed to deal with. Another apparent solution that some might think of would be to create only one member per redo log group. But it's not, and it raises the risk which is an unnecessary measure.

The solution that the Oracle Database development team came up with in 11.2.0.3 (also in 11.2.0.2 BP11) is so powerful and yet so simple!

Now when log writer (LGWR) picks up a bunch of log entries from the log buffer to write them to redo logs, it just sends a write request to both disk and flash, the first one to send back an acknowledgement that the write was done, is taken as the confirmation of the write. By now you might be thinking: “Well flash is going to be always be the first one!”. You're almost right. But if flash is always faster, then what happens to the disk files? They're written anyway, but if disk fails or lags behind, the log information is in flash. And if flash fails, the log information will be in another cell. So the hiccups are eliminated and the high transactional systems would have to wear out flash cards, which is something that just won't happen before any other piece get worn out.

Let's recap:

  1. Requirements:

    1. Version 11.2.2.4 at the cell level

    1. Version 11.2.0.3 at the DB level (when it comes out) or 11.2.0.2 BP11

  1. Smart Flash Logging has to be enabled (it's enabled by default, only worry if you disabled it)

  1. In systems that are not highly transactional like ODSs or DWs you can disable this feature for all databases or for just one database (in cellcli prompt):

    1. For all use the command DROP FLASHLOG [FORCE]

    1. For just one, alter the default IORM plan: ALTER IORMPLAN dbplan=((name=test, flashLog=off))

  1. To monitor the behaviour of this mechanism go to the cellcli console and use the following command:

    1. LIST FLASHLOG [DETAIL]

    1. or use all the statistics that start with FL_*, like for example: FL_FLASH_IO_ERRS (number of errors when writing to flash => the few times that the disk saved the day).

This is ground breaking stuff. It's the removal of an old restriction with an ingenious solution.

LMC

23 Oct 2009

Best way to deal with ARCHIVED LOGs

This article will try to address a common misunderstanding of the role archived logs and its backups play in the recovery process.
Some notions first:

1) If you backup your archived logs to disk regularly you should delete them after the backup using the DELETE INPUT clause

2) If you backup your archived logs to tape regularly you might want to keep a few of those around (on disk) just in case...

What happens to your archived logs on disk when you backed them up? If you don't use the DELETE INPUT clause they stay there. Take a look:

-- Before the backup

SQL> select count(*) from  v$archived_log where deleted = 'NO';

COUNT(*)
----------
32

-- RMAN backup launched:

RMAN> backup archivelog all;
(...)
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:55
Finished backup at 23-OCT-09

-- During the first stage of the backup

SQL> r
1* select count(*) from  v$archived_log where deleted = 'NO'

COUNT(*)
----------
33

One more archived log was generated.

-- When the backup finishes

SQL> r
1* select count(*) from  v$archived_log where deleted = 'NO'

COUNT(*)
----------
33

The archived logs that were backed up, are still around because we haven't used the DELETE INPUT clause.
Now how can we get rid of these archived logs that have already been backed up?
What's the best way to clean them regularly?

If we backup again archived logs but with the DELETE INPUT clause, RMAN will backup *again* the ones that haven't been deleted, but are already backed up:

RMAN> backup archivelog all delete input;

After this we are clean, but we have duplicated archived logs in the backups!

SQL> select count(*) from  v$archived_log where deleted = 'NO';

COUNT(*)
----------
0

Now let's produce some redo:

SQL> create table hr.teste (n number, n2 number, n3 number, n4 number)
2  tablespace users;

Table created.

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     38
Next log sequence to archive   40
Current log sequence           40

SQL> begin
2  for i in 1..100000 loop
3  insert into hr.teste values (i,i,i,i);
4  commit;
5  end loop;
6  end;
7  /

PL/SQL procedure successfully completed.

SQL> archive log list
Database log mode              Archive Mode
Automatic archival             Enabled
Archive destination            USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     39
Next log sequence to archive   41
Current log sequence           41

SQL> select count(*) from  v$archived_log where deleted = 'NO';

COUNT(*)
----------
1

The following RMAN command will *ALWAYS* produce archived logs to backup:

RMAN> backup archivelog all;

Since it does a "switch logfile" in the beggining of the operation. So if you do it hourly you will always have produce new archived logs.

To avoid backing up always the same archived logs you should  insert the following clause:

RMAN> backup archivelog all not backed up 1 times;

This still does the "switch logfile" in the beggining of the operation, but it won't backup the same archived logs again.

If you use DELETE INPUT in the mix what happens?

RMAN> backup archivelog all not backed up 1 times delete input;

skipping archived log file /u01/app/oracle/fra/ORCL/archivelog/2009_10_23/o1_mf_1_40_5g32906q_.arc; already backed up 2 time(s)
skipping archived log file /u01/app/oracle/fra/ORCL/archivelog/2009_10_23/o1_mf_1_41_5g32gxdd_.arc; already backed up 2 time(s)
(...)
channel ORA_DISK_1: deleting archived log(s)
archived log file name=/u01/app/oracle/fra/ORCL/archivelog/2009_10_23/o1_mf_1_40_5g32906q_.arc RECID=36 STAMP=701005489
archived log file name=/u01/app/oracle/fra/ORCL/archivelog/2009_10_23/o1_mf_1_41_5g32gxdd_.arc RECID=37 STAMP=701005677
(...)
Finished backup at 23-OCT-09

So with DELETE INPUT mixed in you not only backup only the ones that haven't been backed up yet, but erase from disk the ones that had been already backed up.

Confused?

Imagine you wan't to backup to tape every hour, but wish to maintain a 24 hour period of archived logs in your recovery area (disk).

Your hourly sbt_tape archivelog backup should be done with:

RMAN> backup archivelog all not backed up 1 times;

And your daily sbt_tape archivelog backup should be done with:

RMAN> backup archivelog all not backed up 1 times delete input;

If you're backing up to disk just use one job hourly or each 2 hours or whatever frequency, that will execute this:

RMAN> backup archivelog all delete input;

Still Confused?

If you backup your archived logs to disk, skip this article and have a nice weekend.
If you backup you archived logs to tape have you ever tried to make a restore? Do you know how painfull it is to get them from tape? Ooh. It's not painfull at all in your case? You can also skip this article and go on to do more useful or entertaining things.

But if you can relate to the fact that a kind of "buffer" of archived logs should be kept in disk when you backup to tape, please considerer the backup architecture I've suggested before.

Didn't understood jack?

Maybe this Oracle DBA thing is not for you ;-)

LMC.

5 comments:

  1. I think instead of having two script as daily and hourly you can run something like this so you will always have 1 day of archivelogs any time (still depends on the needs )

    run{
    BACKUP ARCHIVELOG ALL not backed up 1 times;
    DELETE NOPROMPT COPY OF ARCHIVELOG ALL COMPLETED BEFORE 'SYSDATE-1';
    }

    ReplyDelete
  2. What a great piece of advice Coskan. Thanks a bunch.
    Still, there are some sites that don't want to "loose" time during archivelog backup to tape deleting stuff every hour, and they will want some dedicated jobs just for the delete. Customers allways come with weird motives and myths not to do things the right way!
    If I had to choose? I'll go with your aproach blinfolded. But if the customer starts triping (which unfortunatly they do) I'll might suggest the two job aproach.

    Thanks again.

    LMC.

    ReplyDelete
  3. i have physical standby database in place.how can i ensure that rman removes the archives which are backedup and applied at my standby database .Please help me in this regard.

    regards
    sangeeth

    ReplyDelete
  4. Awesome article. Awesome explanation... LOL< so this Oracle DBA thing is for me :)

    ReplyDelete