Which Risks Are You Protected From?

Posted in: Technical Track

Kevin recently mentioned one very nice blog. I was going through some posts there and this entry reminded me one story. I’m sure many of you can recall similar cases.

I worked on one site for a while and during 2.5 years it didn’t face a single media corruption of Oracle datafiles. Not that it’s a low profile site – quite the opposite and storage infrastructure was setup very well there – no SPOF, mirrored inside SAN boxes and between boxes, redundant switches, HBAs, controllers, you name it – dream of a DBA. Even change management procedures were followed thoroughly.

But one day, my fellow DBA (who is usually extremely cautious and reviews his actions at least twice) overwrote a controlfile with some crap. Even the fact that controlfiles were on raw devices didn’t prevent this disaster from happening. Trivial error as we found out later – a DBA mistakenly swapped arguments of a tar command (like “tar cvf * file.tar” instead of “tar cvf file.tar *”) and tar happily used controlfile as a tape device. :) End result – 10 minutes outage while I was figuring out what happened, dd’ing controlfile image from another mirror and starting the instance. By the way, it was a RAC database and, of course, RAC didn’t help – surprisingly for some managers.

So they were kind of protected with multiplexed controlfiles even though recovery wasn’t transparent (wouldn’t it be nice if Oracle could survive loss of minority of multiplexed controlfiles – just like CRS with voting disks?). Interesting, that online redo logs were not multiplexed and recovery could have been a bit trickier should the current redo log be overwritten. The reason for that was that they had already quadruple mirroring and people were blindly ignoring human factor and Mr. Murphy – “it must be enough if we already mirrored it 4 times”.

What we see? Well implemented protection against one class of problems while ignoring obvious threats from another side. Perhaps, because of all kind of vendors making fuss about their technology and its importance, while nobody focusing attention on the areas that require low investments but as much important or even more.

In my experience human factor risk is one areas that is heavily underestimated most of the times.

So what are your stories?

email
Want to talk with an expert? Schedule a call with our team to get the conversation started.

About the Author

What does it take to be chief technology officer at a company of technology experts? Experience. Imagination. Passion. Alex Gorbachev has all three. He’s played a key role in taking the company global, having set up Pythian’s Asia Pacific operations. Today, the CTO office is an incubator of new services and technologies – a mini-startup inside Pythian. Most recently, Alex built a Big Data Engineering services team and established a Data Science practice. Highly sought after for his deep expertise and interest in emerging trends, Alex routinely speaks at industry events as a member of the OakTable.

No comments

Leave a Reply

Your email address will not be published. Required fields are marked *