For years, IT infrastructure teams have been designing, building, and maintaining a wide spectrum of disaster recovery (DR) solutions to recover critical information systems after an outage. Many of these do-it-yourself disaster recovery (DIYDR) approaches carry hidden risks that come to light at the worst possible time. Perhaps the biggest problem with a DIYDR approach is that it often gives the illusion that the business is protected from an outage. How?
- DIYDR is often treated as a low-value project until an organization is in a full-blown outage or disaster - DR is on every IT department’s list of ongoing projects to maintain, but over time, DR often takes a back seat to higher profile business projects like application upgrades or digital transformation efforts.
- Effective disaster recovery requires significant IT staff time and effort – Most DIYDR solutions involve multiple technologies along with automation, scripting, and well-documented procedures. Maintaining these components is difficult enough in a static IT environment, and most IT environments are under constant change. DIYDR often starts with great intentions, but little understanding of the ongoing time commitment to keep DR up-to-date and effective.
- DIYDR is rarely tested properly – Few IT organizations are equipped to do a full-blown outage simulation with complete failover to a disaster recovery environment. Tests often include small subsets of the environment. Failures in testing are often documented and improved upon, but without full environment failover simulations, the risk of failure remains high.
- DIYDR often depends on heroic effort from IT staff, vendors, and technologies – Your IT team may be exceptionally talented but expecting them to make dozens of good decisions in the heat of a 36 hour no-sleep recovery is a bad business decision.
- DIYDR often diverts business focus – When asked, the IT team will almost always say, “we can recover.” Is this causing the business to not fully evaluate the cost of downtime and the impact to the business? Does it cause business and IT leadership to not evaluate or even skip full DR simulation testing until it is too late?
How can business and IT leaders reduce this DIYDR risk?
- Do the math on downtime – Both the business and IT need to evaluate the true cost of downtime to align the disaster recovery budget to the cost of an outage. Get started with a cost-of-downtime calculator.
- Understand how applications and organizational data are being used. Which employees access what applications? How, when, and where do they access them?
- Inventory and Prioritize all data, applications, network elements, and access points. Once all these IT elements are properly understood and defined, prioritize them into recovery tiers and collaborate with executive sponsors to assign a Recovery Point Objective (RPO) and Recovery Time Objective (RTO) for each tier that is aligned with overall business strategy.
- Create a DR budget that is inclusive of every aspect of your DR plan, including required hardware, software and hosting site, costs associated with hiring full-time employees or Consultants to implement your DR plan, and the recurring costs that are required to keep your plan viable including training, testing, hosting fees and software licenses.
- Test, test, test – Testing is only one way to determine if your DIYDR solution works. Prove your solution by conducting a full simulation at least once a year, along with multiple simulations of smaller data sets during the year. Testing is the ONLY way to prove if your DIYDR is working.
- Evaluate the team’s ability to maintain the DR workload – If the DR tests fail, both IT and business leadership need to examine the ongoing DR workload, comparing the cost of downtime to the effort required to maintain it. And don’t forget to consider that more IT staff may not be the answer.
There is another way – outsource the heavy-lifting associated with DIY DR
After reading this list of risk mitigation strategies, you may be thinking that a DIY approach to DR may not be the best use of your team’s time. There is no getting around the fact that a thorough and well-aligned DR plan that provides the requisite IT resiliency demands a substantial upfront investment in terms of staff time and resources, as well as ongoing commitments that aren’t minor.
As IT teams in every industry are being asked to do more with less, expecting your internal team to devise, deploy, test and maintain a proper DR plan just isn’t realistic… especially if you’re team is being tasked with strategic initiatives that drive long-term competitive value. To enable your team to be more strategic, consider implementing DR as a Service (DRaaS) as an alternative to DIYDR.
Cloud-based DRaaS solutions are now widely used by organizations with high downtime costs and focused IT teams to eliminate the traditional “heavy lifting” associated with DR. The best DRaaS solutions combine data replication, full network failover, and high levels of automation to minimize the ongoing maintenance commitment from the IT staff. Ideally, all IT needs to do is keep the DRaaS provider up-to-date on which systems need to be protected, and to test the DRaaS solution regularly. DRaaS frees up IT staff to focus on high-value business projects.
Could DRaaS be a good fit for your organization? Read here about how Bouchard Insurance leveraged DRaaS to protect against a hurricane without businesses losses from downtime and without impossible expectations from the IT team. Got questions? Contact me.
Doug Theis is the Director of Market Strategy in Expedient’s Indianapolis market focused on engaging with and improving the regional IT community through planning, sponsoring and attending community events, facilitating IT-focused continuing education opportunities, and sharing strategies, tactics, and research to help IT professionals stay abreast of best practices and industry trends. Connect with Doug at doug.theis@expedient.com and follow him on Twitter.