Oracle 10.2.0.3 coming soon, and a data guard corruption bug

Posted in: Technical Track

It looks like Oracle has started testing the 10.2.0.3 patchset. A preliminary list of bugs fixed is at in MetaLink note 391116.1. The “important” bug fixes are

  • corruption in NOCACHE LOB’s (bug 5212539, also fixed in 9.2.0.8 and upcoming 10.1.0.x release)
  • wrong results in aggregate functions using the “hash group by” access path (bug 4604970)
  • PGA corruption when using shared server (bug 5114396)
  • Server handle leak in Windows (bug 5077897)
  • workaround for changed locking behavior of SELECT FOR UPDATE queries in 9.2.0.6/10.1.0.4/10.2.0.1 (bug 4969880)

But the most serious issue is index corruption on databases upgraded to 10.1.0.5 through 10.2.0.2, when using data guard in redo apply mode. Paraphrasing note 386830.1, bad redo metadata for index blocks gets written, and is not detectable by standard corruption checks. If this same block is used to generate redo of its own (after a state change or instance recover, for example) the block may get corrupted. Errors will happen querying or updating such blocks on the standby in read-only mode or if the standby becomes a primary site. If the corrupt block is a bootstrap index, the database won’t start up at all.

For index corruption to occur, the following things must happen, in order:

  1. The database is upgraded from a pre-10.1.0.5 version to a version betweeen 10.1.0.5 and 10.2.0.2
  2. redo from an index block is applied elsewhere (typically a physical standby/data guard redo apply)
  3. the location where the redo was applied is modified and generates redo of its own (typically after a role change)
  4. Applying this newly-generated redo will result in corruption (typically done to the former primary database after a role change)

To fix:

  • Apply the one-off patch for bug 5380055 on your platform
  • If you have a database that has already applied version 10.1.0.5+ redo (typically a physical standby), there are additional steps in note 386830.1 to “bump” the database SCN. This operation must be done in restricted mode, will require downtime, and can be dangerous, so be careful!
email

Author

Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

Marc is a passionate and creative problem solver, drawing on deep understanding of the full enterprise application stack to identify the root cause of problems and to deploy sustainable solutions. Marc has a strong background in performance tuning and high availability, developing many of the tools and processes used to monitor and manage critical production databases at Pythian. He is proud to be the very first DataStax Platinum Certified Administrator for Apache Cassandra.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *