A Product Is Not a Plan, and a Plan Must Be Documented
Having a strong backup product in place — or even a solid DR service — is not the same as having a proper business continuity and disaster recovery plan. A true BCDR plan requires coordination, communication, regular testing and continuity, and it must be formally documented.
With this in mind, let’s break down some of the elements that a BCDR plan should include. This is by no means an exhaustive list, but it covers a lot of important ground.
Map Your Assets and Dependencies
Understand your mission-critical workflows and the systems that sustain them. Map out how different systems — including authentication services, applications, databases and network configurations — rely on each other.
Clearly Define RPOs and RTOs
Recovery point objective, or RPO, answers the question, “How much data can we afford to lose, measured in time?” This is important in determining how frequently you run backups. Traditional systems run them every 24 hours, which might not be enough.
Recovery time objective, or RTO, is the maximum amount of system or application downtime an organization can tolerate after a disruption before there is an unacceptable impact. RTO is central to the discussion of business continuity. Even the best backup in the world isn’t guaranteed to meet your RTO, because a backup won’t necessarily restore full-system functionality quickly enough. That’s the domain of DR.
Set a Replication Strategy
Your RPOs and RTOs will influence whether you must continuously, or periodically, copy data from primary systems to a secondary environment. Synchronous replication simultaneously writes data to secondary environments. Asynchronous writes it to the primary system first, and the secondary later.
Geographically Separate Your DR Site
FEMA recommends that DR locations be far enough from your primary site to avoid shared disaster zones, but close enough for replication. The sweet spot is typically within 500 miles. A general rule of thumb is that it should be a short plane ride (or a long drive) away.
Implement Secure Access Controls
Your security controls must be mirrored in your DR environment. Everything from your multifactor authentication to VPN and firewall configurations need to function seamlessly in your secondary environment.
Regularly Test and Validate
If you don’t test your DR plan, how can you know it’s working? Run DR failover drills annually, at a minimum. Facilitate tabletop exercises with key stakeholders to rehearse communication, recovery steps and staff roles. Panic can set in quickly during a disruption, so it’s important to be able to fall back on well-drilled procedures.
RELATED: CISA shares data about tabletop exercises and other services for SLGs.
Train Staff for Crisis Mode
In that same vein, everyone acts differently under pressure. Certain stakeholders may become more concerned about certain systems, so the mentality of sticking to the plan must prevail.
Create a Communication Plan
Staff may have trouble connecting if phones, email or messaging apps go down. Make sure you have updated contact lists and have planned out backup communication methods, such as two-way radio.
Document Everything Offline
If your plan lives on a server that crashes, it isn’t much good to you during a crisis. Keep printed or offline-accessible documentation in a secure area and use playbooks organized by disaster type.
Update Your BCDR Plan Regularly
Your BCDR strategy must evolve with your infrastructure. Review it annually or after major changes and perform post-mortems after test incidents or real-world scenarios.
Remember: This isn’t about getting back to normal. It’s about staying operational no matter what. As operational conditions change, your BCDR must evolve accordingly.
Click the banner below to sign up for the StateTech newsletter for weekly updates.