When we talk about logical corruption, there are two different failure states that fall under this label:
- Accidental or incorrect modification of application data by a user or application.
In this scenario, a user or application, either by misadventure or resulting from an application bug, changes data in a database to incorrect or inappropriate values. An example would be an engineer who performs an update, but forgets to formulate the predicate such that it updates only a single record, and instead accidentally updates (and commits) changes to thousands of records. When we perform an assessment of a client’s systems, we look carefully at how the client is managing retention of database undo data, archived redo logs and the recycle bin. Many clients assume that physical backups serve all aspects of recoverability for Oracle. On the contrary, effective management of these components can greatly reduce the complexity, RPO and RTO in repairing this type of fault.
- Logical (and physical) corruption of data blocks. Block corruptions come in two types:
Physical corruptions (media corrupt blocks) are blocks that have sustained obvious physical damage. When Oracle detects an inconsistency between the CSN in the block header and the CSN in the block footer, or the expected header and footer structures are not present or are mangled, then the Oracle session raises an exception upon read of the block (ORA-01578: ORACLE data block corrupted…). The call to Oracle fails, and the exception is written to the Oracle alert log and trace files. Physical corruptions are generally the result of infrastructure problems, and can be introduced in a variety of ways. Some possible sources of physical corruption are storage array cache corruption, array firmware bugs, filesystem bugs and array controller battery failure combined with power outage. One can imagine at least a dozen other possible sources of such corruption. Physically corrupt blocks can be repaired using Oracle Recovery Manager’s BLOCKRECOVER command. This operation restores and recovers the block in place in the file without interrupting any other sessions operating against the database.
Logically corrupt blocks are blocks that have good header and footer CSNs, but that have some other kind of internal inconsistency. For instance, one of the block header structures, which tracks the number of locks associated with rows in the block, differs from the actual number of locks present. Another example would be if the header information on available space differs from the true available space on the block. Upon encountering these types of faults, the calling session generally will raise ORA-00600 (“internal error”) with additional arguments that allow us to diagnose the specific type of defect, and the call will fail. The exception will be written to the alert log and trace files. Like physical corruption, there are a wide range of possible ways that the fault could have been introduced, including all of the ways listed above for physical corruption. However, logically corrupt blocks are much more likely to have been introduced as a result of a failure in the Oracle software, or as a result of an Oracle bug or cache corruption.
By default, Oracle has features that seek to perform sanity checks on blocks before they are written. However, for highly risk-averse enterprises, additional checks, including checks for logical inconsistencies and block checksum verification can be enabled. These features consume additional resources, so should be used judiciously.