If you answer anything else but something like “last month and every month before that”, then you are probably in troubles. Learn from Wikipedia’s Data Center Overheating.
It doesn’t mean that they didn’t regularly test their disaster recovery process. Maybe they did but the failover mechanism was broken after the last test.
A regular DR procedure validation is designed to minimize the risk of a broken process to go unnoticed. If the failure is detected during a regular switchover process, you are prepared to handle it way better (or potentially just leave services on the currently primary site) than during emergency failover when you get to the “Oh shit!” moment under the tremendous pressure to get services back.
The business has to find the balance between switchover frequency, the risk they are prepared to take and change management processes (the more cowboy-style you operate, the higher the risk and more often you need to test your DR scenario).
Most of our customer leveraging regular DR switchovers, do it every month or two and run for a while on either site as a primary. This is ideal scenario and I wish everyone can adhere to the similar business continuity strategies. Do you?
No comments