How many times have we heard the assurance of storage administrators (fueled by the SAN vendor’s claims) that their top-of-the-shelf SAN arrays simply cannot fail. Unfortunately, reality proves this wrong and we see it regularly with our customers.
At the moment of this writing, one of our DBA teams has just completed failover to the standby database as a result of a database crash caused by a SAN issue. A few hours have passed, and parts of these databases are still not available on the formerly primary host, but traffic is being handled just fine on the standby. This customer provides SaaS type of services. Imagine what hours of downtime would do for them and their clients?
Unfortunately, people get bitten by this overestimated (god-like I’d say) SAN reliability. It must, however, be said: SANs do fail!
Do you want such a wake up call for your executives?
The outage, blamed on an IBM storage array, saw the company’s chief technology officer promise “significant changes to the way we deploy and manage our storage environment”.
Since I mentioned one Australian example, here is one more storage failure scenario described by our friends at Open Query. There are many cases from literally any industry, and some of them are rather complicated while others are just plain obvious.
Is there a silver bullet? Well, not as solution but as a concept, yes — simply admit that SANs do fail — this what should drive infrastructure design for business continuity. Actually, I should extrapolate it to another design principle — everything fails, but that’s another story.