Over-the-Top Tales from the Trenches.
Motto: Bringing order to the chaos of every day DBA life.
Dear Diary,
Is that a whiff of decay in the air, the pungent odour of corruption? No… not me, the database.
Oracle prides itself on the robust nature of its database. Kill it off, or have a sudden power outage, and 99.9% of the time the db will pick itself up, clean off the dust of uncommitted transactions, and get back into the game.
There is nothing more nasty or dangerous to any database than a filesystem/disk with IO issues. To paraphase the bard, “Is the data to be or not to be? that is the question”. Did the block(s) get written on disk or did something else happen? Or did you get fries with that?
As a DBA, you are paid to wear the tin-foil hat occasionally to protect yourself from the soothing signals transmitted by disk caches and controllers, informing you that everything is ok… go back to sleep… your data is ok…
Is there an easy way to check for block corruptions and also logical corruptions (e.g., dodgy indexes) after the hint of filesystem IO issues?
The first thing that comes to mind is DBV. So you have 250+ datafiles. You hunt for a script to check each one, then trawl through the mountain of data DBV produces for each datafile.
On Unix you could run a shell script like:
#!/bin/bash BLOCKSIZE=$1 DATADIR=$2 cd $DATADIR ls -1 *.dbf | while read FILE do dbv file=$FILE blocksize=$BLOCKSIZE done
Call the shell script like this:
./dbv.sh 8192 /oracle/oradata/$ORACLE_SID >> dbv.log 2>&1
While that dbv.sh will work on most recent and no-so-recent versions of Oracle, there is another way, one which will populate a table in the db for you if there are any corruptions, and give you a reasonable ETA on the total amount of time required. This way is The Tao of RMAN.
RMAN is able to check the database for block and logical corruptions using a nice command called
BACKUP VALIDATE CHECK LOGICAL DATABASE
.
In this example command file, RMAN uses 4 channels to speed up the process and potentially thrash your disk. A good option would be to use iostat -x 5 5
, as described by Alex in this article, to determine when the disks are saturated, and modify the command file accordingly.
You could also use the RATE
option to limit the disk rate like this, ALLOCATE CHANNEL c1 DEVICE TYPE DISK RATE=1500K;
Here is the command file:
run { allocate channel c1 device type disk ; allocate channel c2 device type disk ; allocate channel c3 device type disk ; allocate channel c4 device type disk ; backup validate check logical database; }
To run the command file and have the rman output go to a logfile, call it like this:
rman target / cmdfile rman_check_corrupt.cmd log rman_check_corrupt.log 2>&1 &
While RMAN does its thing, you can jump into the database and check how long Oracle estimates the long-running process will take, using this SQL:
select sid,start_time,elapsed_seconds,time_remaining, round(time_remaining/60,2) "Min",message from v$session_longops where time_remaining > 0 /
As RMAN proceeds, you can perform some stationary exercise at your desk, increase your blood pressure and heart rate by running
select count(*) from v$database_block_corruption;
So was it six corruptions or only five, are you feeling lucky?
The best part of RMAN is that is will run on primary and standby databases equally effectively. However, neither the mightly dbv nor even RMAN will help you with the insidous data guard corruption bug.
Have Fun!
Paul
Related posts in the series:
Creating standby databases using RMAN duplicate
The mysterious world of shmmax and shmall
7 Comments. Leave new
Great article!
Thanks for this, just what I needed to know about checking blocks in RMAN without doing a full backup.
hi,
i would like to remark the following:
I was receiving ORA-0600 on update on a table …
I’m on Oracle 10.2.0.4 and backuped up with rman my database.
I did not get any row in v$database_block_corruption.
Afterwards, i used backup validate check logical.
This time also, i did not get any row in v$database_block_corruption.
We did a VALIDATE STRUCTURE cascade which gave the following:
ERROR at line 1:
ORA-01499: table/index cross reference failure – see trace file
The resolution applied was : drop/create of all indexes on the table.
Question:
1/ why was it not detected with RMAN ( normal or check logical ) ?
2/ we are reluctant to do ANALYZE VALIDATE for all tables in the database because, the performance can suffer. Is there any other way to check all the tables ?
Thank you for your reply.
Now that we know that we have block corruption..
How do we fix it?
Hi Sir,
I am SQL DBA, but have high interest in ORACLE, i used your steps to rename db name instead of NID utility.
It works, but when i check pmon process [ps -ef | grep pmon], it shows ora_pmon_prod[old one] as opposed to expected ora_pmon_prod1. Any thoughts and help much appreciated! Thank you Sir!
There is a typo to the URL to Marc’s article :-) One of these days, I wish to move to Pythian, not just yet … hate having interviews though :-)
Thank you. We had to remove the link because the original article seems to have moved. Thanks for your comment.