A couple weeks ago I did a short blog post about SAN storage failures and how people are blinded by all the bells and whistles that are supposed to make storage arrays 100% reliable and failsafe. My conclusion was that there is no way to avoid storage failures, and that a better way is to anticipate those failures and be ready to handle them with minimal service impact.
I referenced a wake up call from a CTO of an Australian hosting company. Let me quote it again:
The outage, blamed on an IBM storage array, saw the company’s chief technology officer promise “significant changes to the way we deploy and manage our storage environment”.
Today, I stumbled across another article that demonstrates their solution of the storage reliability problem. From Melbourne IT on $18m Oracle revamp:
… to improve the reliability of its operational support systems at a cost of $7 million over three years, which has also seen it switch storage vendors from IBM to EMC. Data corruption that had occurred on its IBM storage systems were blamed for a several day outage experienced at the company’s WebCentral web-hosting business.
So we see that, instead of learning the right lesson, they conclude, “This IBM storage stuff isn’t reliable, EMC sales folks convinced me that they are better. Now my storage will not fail.” The “significant changes to the way we deploy and manage our storage environment” were mere vendor change.
Well, data recovery services will be flourishing!