We get it. Preparing for the worst can be a bit of a drag. That’s likely why, according to some surveys, up to 40% of organizations don’t even bother to test their IT systems’ backup and disaster recovery (BDR) capabilities.
However, as the current COVID-19 crisis has demonstrated, the need for business continuity and a strong BDR plan is more important than ever. It’s not just about preparing for big national or international emergencies, either. Your data can be endangered by a veritable cocktail of seemingly pedestrian but very real threats. This includes fires, broken water pipes, faulty cooling systems, and outright device failure. It also includes much more significant events such as regional natural disasters and global pandemics.
Although it’s nearly impossible to prepare for every possible contingency, a good BDR plan should be flexible and broad enough to deal with almost anything that threatens your data. To that end, here are four “must-do” action items for every enterprise deploying a BDR plan.
1. Take Stock and Set Goals
Start by taking an inventory of your entire data and IT estate. Document the number of devices (and types), hardware details, and applications. Note that some experts recommend separating your applications into “priority tiers,” from business-critical applications to less important items. This greatly enhances the disaster recovery process when minutes and seconds count.
To understand your strengths and limitations, document your capabilities. Make sure to note which data types need back up, and whether this includes any sensitive data. This includes personally identifiable information (PII), personal health information (PHI), and/or payment card information (PCI) governed by regulations such as GDPR or CCPA.
Enterprises must also assess their tolerance to downtime well in advance of an emergency. Some companies can deal with being down for a day or so. Others simply can’t. Devise an RTO (recovery time objective) and RPO (recovery point objective) targeting a specific recovery time and tolerance for data loss that fits your business. For example, a recent Pythian client set an objective of taking no more than 12 hours to recover the last 12 months’ worth of data.
2. Decide on the Technology
The BDR technology you implement depends largely on where your data lives: On-premise, in the cloud, or both? Organizations with classified or top-secret data might prefer staying on-premise. However, cloud services such as AWS Backup or Cloud BigTable allow for convenient and centralized backup automation that you can run (more or less) continuously.
Note: You can also set up automated backups in your on-premise system, but you’ll need to integrate third-party server backup software or services.
One of these include built-in multi-region redundancy. Google, for example, has 13 data centers dispersed in different regions across the U.S. So if there’s a major national disaster affecting the entire East Coast, Google’s BDR services are still available through data centers in the Midwest and the West Coast. Organizations that want to cover all their bases should also consider a multi-cloud approach to BDR, in case one cloud provider experiences a complete outage.
If you’re committed to staying on-premise, make sure to create redundancy by distributing your backups across at least more than one data center. For example, if you have an on-premise Hadoop cluster, you can create a separate cluster in a different data center that’s supplied data by your on-premise backups. However, like most on-premises solutions, this can get expensive in a hurry.
3. Define Key Roles and Responsibilities
A less technical but still very important element of BDR planning is to ensure people know what they’re doing if and when disaster strikes. Everyone (not just the IT group) should know their role and responsibilities during a disaster recovery event. BDR plans should clearly define any key roles among staff, along with who exactly is responsible for managing backups in normal times. This should also include internal and external communications plans; The former to ensure that all your staff and key people are on the same page, and the latter to keep your prospects and clients informed in the event the issue becomes public.
If you outsource to an IT or managed services firm, you should also ensure your service-level agreements (SLAs) define the level of service your team can expect during a disaster.
4. Test, Test, Test… and then Test Some More
It’s a well-known saying in DR circles that any given system is only as good as its last test. Indeed, all that great BDR planning you’ve done will mostly be for naught if you don’t also set up a disciplined testing regime. This includes how you’ll test, what you’ll test, and how often you’ll test. Regular testing should ideally expose any possible weak links in the system. It should also test staff and cloud providers to make sure everyone’s on their toes.
The best part? The results of all that testing can then be compared to your defined RTO and RPO. This helps ensure that you’re on track to meet your DR goals in case of a disaster.
CASE STUDY: Pythian provides invaluable Backup & Disaster Recovery (BDR) consulting for a company with no backup or recovery capabilities.
Pythian has years of experience helping clients select and implement on-premise, cloud, and hybrid BDR solutions. Interested in talking to a technical expert? Schedule a tech call with our team to get the conversation started.
Interested in working with Alifiya? Schedule a tech call.