I’ve just been paged about a “corrupt block”. At first glance, it’s kind of scary to receive such a message 40 minutes before your shift ends. Right?
After examining an alert.log (see below) and looking through knowledge bases, I found an explanation that wasn’t as bad as I thought it would be. I liked the explanation I found, therefore I decided to share it with the rest of the world.
Here you go, folks!
ALERT.LOG ... Wed Nov 7 23:34:35 2012 Hex dump of (file 55, block 742286) in trace file /u01/db/tech_st/10.2.0/admin/PRDB_host/udump/PRDB_ora_10778.trc Corrupt block relative dba: 0x0dcb538e (file 55, block 742286) Fractured block found during backing up datafile Data in bad block: type: 6 format: 2 rdba: 0x0dcb538e last change scn: 0x002c.5c0523dc seq: 0x2 flg: 0x04 spare1: 0x0 spare2: 0x0 spare3: 0x0 consistency value in tail: 0xc0b10601 check value in block header: 0xafaf computed block checksum: 0xe36e Reread of blocknum=742286, file=/u01/oradata/apps_index_9.dbf. found valid data Wed Nov 7 23:39:11 2012
And a support analyst said:
“Looking at the error, it looks like your Rman backups are running when the error was reported.
Say Rman has read the first 8 OS blocks and by the time it comes to read the remaining 8 OS blocks of 8K oracle block it see’s block has been modified by some server process.
Now since first 8 blocks are from different SCN and next 8 OS blocks of the same oracle blocks are at different SCN Rman cannot back up the block as its fractured.
So it will again retry and try to get a consistent copy (Not read consistent but block having head and tail information same) and went it finds it gives message found valid data.
Please note Rman tries to backup a consistent image of the block(Head and tail portion of the block should match). Oracle blocks are made of 0S blocks.One 8 k block is made of 16 (512 bytes) OS blocks.
So if Rman is unable to get a consistent image of the block it would report errors saying found fractured block while trying to backup that block. Rman would however retry again and see if its now able to get the consistent image of the block .
Clearly in your case we can see it has found a valid block.
Please note it’s recommended to run the Rman backup when the load on the database is less.
If the block that rman wants to backup is been modified quite frequently then rman wouldn’t be able to get a consistent image of the block.”
It’s kind of a nice explanation right? I learned something today. And in the end, my shift finished well. :)
Have a good shift, folks. :)
Nice talking to you on your SQL v.s. srvctl blog. Though I didn’t hear any feedback from Christo Kutrovsky, the blog author of OPT_ESTIMATE hint: Usage Guide. :)
May I bravely ask you to share one of your (RMAN) backup and recovery presentation slides? So I could save my time and use it to share knowledge with some other DBAs.
Thanks and enjoy the rest after a good shift,
Similarly, while validating a datafile using DBV, DBV also increments the number against “# of Influx blocks” in its report when it re-reads a valid block which was found fractured during first read.
Similarly, while validating a datafile using DBV, DBV also increments the number against “# of Influx blocks” in its report when it re-reads a block which was found fractured during first read.
You can find my RMAN and other presentations under SlideShare.
Well, that’s all good, I had another question though – if it’s possible to suppress these warnings. They are noise and should be ignored, but our monitoring system goes off on them. I think there is a retry count and if RMAN hits it, another message is printed (need to check, not sure at the moment).
Any ideas of hidden parameters etc that can suppress these fractured block warnings during RMAN backups?
>> Any ideas of hidden parameters etc that can suppress these fractured block warnings during RMAN backups?
We have adjusted our monitoring to ignore the errors if it confirmed that reread works. I would suggest look on how to adjust your monitoring to ignore those too.
I wouldn’t expect Oracle to have any hidden parameter for disabling those messages. I would like to be wrong and if someone knows the way please feel free to contribute :)
I disagress, “Oracle blocks are made of 0S blocks.One 8 k block is made of 16 (512 bytes) OS blocks.”
Oracle blocks are DB_BLOCK_SIZE and are broken into the OS block size by the host OS. The re-read is actually protection written into Oracle to protect you from fractured block writes or simple write timing issues which you are seeing with you stack. There are no hidden parameters to turn this off and why would you? If the block is re-read to get the entire Oracle block then it’s better than a multi-terabyte backup failing because Oracle did not re-read.
Thanks for leaving a message, passing by my blog post.
>> There are no hidden parameters to turn this off and why would you?
I don’t think anyone would like to switch off the protection. A front DBA may want to disable the alert log message. You see many of us have alert.log monitoring and a page is sent each time “ORA-” or any other keyword (in this case “Corrupt block”) is found. Many have an ignore list build into the monitoring, where we can say ignore “XXX”. However it wouldn’t be good to include “Corrupt block” into the monitoring ignore list as we really want to know about any “real” corruption. However in this particular case the error could be ignored as it follows by “Reread of blocknum=YYYYYY, file=ZZZZZZZZZZZZZZZZZ. found valid data”. The fact that the message is on the several lines makes it difficult to code the ignore condition. This is why a DBA may want to switch the warning MESSAGE (as opposite to the protection itself) if and only if the re-read operation was successful.
Does my explanation make sense?
Thanks once again and hope to here from you more ;)
isn’t it a bug?