Categories:

Disaster Recovery: Ensure Business Continuity

A company or organization needs to be well prepared for a variety anomalies in order to ensure the business can continue its operations without having to worry about extraneous loss in revenue or decline in business operations. Disasters can include a natural disaster such as a fire or earthquake or a more sophisticated attack like a cyber attack.

What is Disaster Recovery?

A disaster recovery (DR) is a business’s method of being able to regain functionality to its IT infrastructure after a disruption (typically a natural disaster or cyber attack). There is also a variety of methods that can be featured as an aspect to a disaster recovery plan, all of which should help ensure the continuity of business operations.

The cost of disasters

Disasters can be costly for an operation, especially for a business that is not prepared. According to the IT Disaster Recovery Preparedness (DRP) Council’s 2015 report, a single hour of downtime for a small operation can cost roughly $8,000 dollars, a midsize company can expect approximately $74,000 while a large enterprise can anticipate up to $700,000. Based on a survey conducted by Zetta, they noted that over half of the companies surveyed (54%) faced a downtime that exceeded over 8 hours within the past 5 years. Of that selected population, they reported that roughly 66% of them have faced losses exceeding $20,000 per day of downtime.

How does a disaster recovery plan work?

The way a disaster recovery works is based on the replication of data being processed within an off-premise site that is unaffected by the given disaster. When a server collapses due to a disruption, a business needs to be able to recover the lost data from where they have it backed up. The backups are key, if they are lost, the operations will suffer tragic losses in revenue and loss of operation. In an ideal scenario, an organization or company could transfer the computer processing it does to a remote spot for the continuation of its operations.

Benefits of disaster recovery

The following benefits, when utilizing a disaster recovery plan, include:

  • Minimizing the possible risk of delay
  • Guarantee the reliability of standby systems
  • Offers a standard testing the plan
  • Minimize decision-making during the case of a disaster
  • Reducing potential legal liabilities and other costs, such as loss of revenue and operation
  • Lower unnecessary stress from the work environment.

Key elements for an effective disaster recovery plan

Some of the key elements for an effective disaster recovery plan include a disaster recovery team, risk evaluation, business-critical asset identification, backups, and testing and operations.

  • Disaster recovery team: This group will have the role of creating, implementing, and managing the disaster recovery plan. They should be able to clarify everyone’s roles. In the case of a disaster, the group should be able to connect with their peers, other members of the company, vendors and even their customers.
  • Risk evaluation: This group will assess the probable hazards that could pose a threat to an organization. They will need to strategize what needs to be utilized so that a business can push forward.
  • Business-critical asset identification: For a solid disaster recovery plan, it should include specific documentation for which systems, applications, data, and other resources are deemed vital for business continuity. They will also need to note the proper steps that are important to recover the data.
  • Backups: For this element, they need to determine what is necessary for backup, how the backup implementation will be executed and who should will be responsible for the backup. Many experts, including myself, will deem backups as the most valuable piece for a disaster recovery plan. For this, a recovery point objective (RPO) and recovery time objective (RTO) are involved as key metrics. The duration of how long an organization can handle a downtime and how regularly they back up the data will be noted within the disaster recovery strategy.
  • Testing and optimization: A recovery team should, on a regular basis, test and update their strategies in order to withstand potential threats. Being able to continue to verify a company being prepared to shield itself against a disaster will serve as measures of success. By being able to plan and respond to a threat, such as a cyber attack, it is crucial that a company or organization could continuously test and optimize their security protocols and data protection strategies while also being able to have measures that are in place to detect a possible breach.

How to establish a disaster recovery team

Regardless of the disaster, the proper team is crucial and should always be first step. The following key areas should be factored in, for an IT department, the event of a disaster:

  • Crisis management: This role will be responsible for facilitating recovery plans, coordinating the actions during the entirety of the recovery phase and be able to solve any concerns that may show up.
  • Business continuity: This group will be responsible for making the recovery plan properly parallels the company or organization’s demands, which will be dependent on the impact analysis that is conducted in the wake of the disaster.
  • Impact assessment and recovery: This team will be liable for the recovery aspect that involves IT infrastructure (servers, databases, networks, etc.)
  • IT applications: This role will be responsible for assessing what applications, based on what is written within a restorative plan, would be implemented. The role would entail tasks that include the configuration of applications, integrations, settings and also consistency of data.

For those that are not within the IT department, the following roles should be considered:

  • Executive management: This team will be responsible for the compliance of policies, budgets and approval of each phase during the recovery plan.
  • Critical business units: This group will be able to offer their input on the disaster recovery planning so that specific concerns are addressed.

Types of disaster recovery

The following are methods for a disaster recovery:

  • Back-up: This is the most common type of disaster recovery and involves storing data on either a removable drive or in an off-site location. Despite being the popular method for disaster recovery, this only accommodates for the bare minimum of a business continuity plan; the reason being is that the IT infrastructure is not, typically, backed up.
  • Cold Site: For a cold site, an organization would establish a minimal infrastructure, which does not get used often since it is typically activated in the wake of a disaster or fire for businesses to continue their operations. This can help, based on a business continuity plan, since the operations can resume, but this does not ensure the recovery and protection of data. This method typically would involve the combination of other methods to work effectively.
  • Hot Site: This site maintains copies of data at all times, which are up-to-date. In comparison to a cold site, they are harder to establish and are much more expensive, but they significantly minimize a possible down time.
  • Disaster Recovery as a Service (DRaaS): A DRaaS provider (such as VMware, IBM, Microsoft, AWS, Druva, etc.) would take the organization’s computer processing and transfer them to their own cloud infrastructure. This enables a way for a business to continue its operations since their data is backed up in a vendor’s location, regardless if the business’s operations are down or not. DRaaS models offer 2 main types of payment plans: subscription and pay-per-use (PPU) models. The benefit of selecting a DRaaS provider is that they allow for minimizing latency if an organization’s servers are down. However, if there was a natural disaster, such as an earthquake, it could potentially affect the DRaaS provider as well.
  • Backup as a Service: Somewhat like the process of backing up data at a remote location, a Back Up as a Service would involve a third party backing up an organization’s data. However, they would not be able to back up the IT infrastructure.
  • Datacenter disaster recovery: This aspect leans on protecting data and recovery against physical elements. For example, this method could involve tools to ensure equipment withstands against a fire (fire suppression tools) or a backup power source could ensure there is always power in the wake of a power outage. However, since these help with physical disasters, the tools for this method will fail against a cyber attack.
  • Virtualization: This is an interesting method as an organization could back up their data and other resources through an off-site virtual machine that remain unaffected by a physical disaster. This method allows for businesses to automate some processes of a disaster recovery plan, ensuring things get back online sooner. However, for virtualization to be effective, there needs to be regular updates with the data/workload. Also, companies need to ensure that their IT team is aware of the total amount of virtual machines could be running within an organization at any given time.
  • Point-in-time copies: This process takes a copy of an entire database. Data could be restored through this method, but the condition is that backups are situated in places where disasters did not affect them. This method is also called a point-in-time snapshot.
  • Instant recovery: Similar to how a point-in-time copies is, they differentiate this method taking snapshots of an entire virtual machine (VM) rather than copying a database.

Industry leaders for DRaaS

  • Microsoft Azure Site Recovery
  • AWS CloudEndure
  • VMware Business Continuity & Disaster Recovery
  • Druva Phoenix
  • Sungard Availability Services DRaaS
  • Acronis AnyData Engine
  • Zerto IT Resilience Platform
  • IBM Disaster Recovery Services
  • Ekco Disaster Recovery
  • CenturyLink Managed Services
  • Daisy Business Continuity

10-Step Disaster Recovery Plan

The following is an example of a step-by-step process for a Disaster Recovery Plan.

  1. Create an inventory: A company should know what resources are being utilized to operate their businesses. Having an accounting of the inventory aids in the recovery process, especially if one knows what was affected or not.
  2. Establish a recovery timeline: Once the documentation is established for the inventory, one could make the choice of what timeframes will be set as benchmarks as to when they should resume operations again. Some industries have a different tolerance to their timeline (for example a healthcare facility would need a timeframe of minutes or less while other industries can withstand a longer timeframe). Key metrics that are used for this process are Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
  3. Communicate: Everyone, at all times (even before a disaster), should know what IT operations could or could not be potentially affected, what would happen after the disaster strikes and who would be deemed liable for handling the situations. One should have an account of how certain employees may or may not be impacted if there is a downtime. Also, there should be an effective means of communication with peers if there is a power or Internet outage.
  4. Backup: There are various options as to how data would get backed up, as mentioned throughout the article. If one were to maintain backups physically on-premise, that would a vulnerability in the wake of a natural disaster. Regardless of how data is backed up, there is always a risk. One thing to note is that not all data needs to be backed up. Through the inventory phase, one could decide what is necessary and what is not.
  5. Factor physical damage: If a disaster strikes, there could be potential damage on hardware and other physical resources that could lead to issues with business continuity (from factors as small as a cut cable to even a botched server). One should have a plan to take account of the physical damages so that way there are plans to repair or replace them.
  6. Consider human factor: Unfortunately, people can be cause of the disaster, regardless of the intent being malicious or not. This piece becomes important because one should be able to lock down who and who does not have administrative rights and credentials to the systems, in the case of a catastrophe. No one, including third-party vendors, should have extensive permissions that go beyond their roles. For example Target was dealt with a tragic data breach when cybercriminals utilized the credentials of an HVAC vendor to breach their network and cost the millions of dollars, back in 2013. If someone is in sales, they should not have the account privileges that allow them to access information that is outside of their role (payroll, networks, creation/deleting users, etc.). Also, in today’s day and age, everyone should be educated on basics of cybersecurity and be able to identify things such as phishing, which is one of the most common ways to cause a breach.
  7. Insurance: Damages can be expensive when a natural disaster occurs, hence why an organization should factor in a form of catastrophe insurance. This portion can be optional, but it should definitely be factored in as something an organization should consider investing in.
  8. Test the Disaster Recovery Plan: At the bare minimum, the recovery plan should be tested on annual basis (the more, the better). If the plan remains out-of-date, the repercussions could become much greater, especially in the case of a natural disaster that could render an organization’s data lost for good.
  9. Combine Disaster Recovery (DR) and Business Continuity (BC): IT is a pivotal piece that helps an organization move forward and operate, but this is a piece of a Business Continuity Plan. One should be able to develop and test an effective Business Continuity Plan to ensure the every remains confident in the solutions, if a disaster were to occur at any point.
  10. Connect with the right partners: Disaster recovery is not a one time thing, it needs to be actively maintained and updated. As time continues, technologies, procedures and equipment advance further, therefore the plans of a disaster recovery need to be adjusted to accommodate for those. One should consider working with the right partners so that way one’s disaster recovery plan makes sense and is actionable/reliable in the case of a probable disaster.

Key metrics

The main goal is to be able to protect an organization in the case of all, or a portion, of an operation is deemed partially/completely unusable. The minimization of downtime and data loss, during the case of a disaster recovery, would be measured based on the concepts of Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

  • Recovery Time Objective (RTO): time until a system is back up and operational
  • Recovery Point Objective (RPO): a measure of the ability to recover a file(s) by being able to specify a point time for the restore of the selected backup copy.

Example of disaster recovery plan utilized in industry

Back in 2016, Hyundai Heavy Industries (HHI) faced extensive damage when one of their facilities suffered the aftermath of a 5.8 magnitude earthquake. The company had a backup center in Ulsan City, Korea, and it offered the company an opportunity to assess its disaster recovery systems and see if they are prepared for a disruption. In their situation, they partnered with IBM to implement a solution to handle their situation, in order to get them back in operation.

Relationship to Business Continuity Plan (BCP)

When working in IT or any executive/management role, one should not mistaken what a disaster recovery is in comparison to a Business Continuity Plan (BCP). A Business Continuity Plan is a comprehensive plan, for organization, that features the disaster recovery plan. The Business Continuity Plan would feature 5 main components:

  • Business Resumption Plan (BRP)
  • Occupant Emergency Plan (OEP)
  • Continuity of Operations Plan (COP)
  • Incident Management Plan (IMP)
  • Disaster Recovery Plan (DR)

One may notice that the first three components mentioned do not involve the IT infrastructure. An Incident Management Plan does deal with IT infrastructure due to the fact that it sets the foundation for addressing a probably cyber attack. The downside is that it does have a means of representing a catalyst to activate the Disaster Recovery Plan, meaning that the Disaster Recovery Plan is the only Business Continuity Component with interest to IT.