According to a post-Blackout 2003 survey by my company, Info-Tech Research Group, more than 60 % of IT departments did not have formal plans and procedures in place to deal with the blackout. And, although more than 76 % of companies surveyed said that the blackout had an impact on their organisation, most of them admitted that they were not sufficiently prepared.
Another survey, by CSO magazine (sister publication to Darwin), found that chief security officers lack confidence in the electric industry and power grid capabilities, with 59 % saying it's likely another major US blackout will happen in the next 12 months. Further, 76 % lack confidence the electric industry will be modernized and not subject to its present vulnerabilities in five years.
Clearly, business continuity, or disaster recovery, plans are called for. But are you among the 60 % lacking a formal plan?
A client recently told me that he had just finished his company's disaster recovery plan. I asked to see it. He was puzzled. He didn't know what I meant. He hadn't written a disaster recovery plan — he had formalized how his backups were done, made arrangements to get the tapes stored off site (right next door in an attached building) and bought a new tape backup system. He asked what type of plan I developed for another client. I told him that our plan consisted of an operational analysis, a discussion of the tolerance for down time, the prioritization of recovery, documentation of systems, development of teams, and the assignment of roles and responsibilities for each team. He was amazed that we went to "so much work."
The truth is, all of the information we gathered in that particular plan is information that you are going to need in a recovery situation anyhow. It makes sense to gather it up front, in a calm "normal" environment, rather than trying to gather information in the middle of a panic.
The concept isn't complex. The idea is to collect as much information as possible about the operating environment, applications, hardware and services. Determine what services are most critical to the daily operation of the organisation. Assign teams to handle various aspects of the recovery. And define the roles and responsibilities for each team.
The result is a document that outlines what needs to be done, who is doing it and in what order. Additional information is provided, letting the recovery team know who to contact for specific information regarding services, software, etc.
I am not suggesting that you put your organisation through an over-complicated, unnecessarily painful process. It is simply a matter of educating the organisation on the importance of being prepared.
The following outlines the steps required to develop a disaster recovery plan for your organisation.
Scope of the Plan
Before any significant amount of work can be done, it is essential to know exactly what the scope of the disaster recovery plan is. Gain consensus from the department managers and determine who should be on the disaster recovery planning team. At a minimum, there should be one person from each department, and as many as you can spare from IT. There should be a management level person from IT on the team, and that person should keep the senior management team up to speed on the progress of the group.
As with any major project, a kickoff meeting should be held to define the scope of the project. In the initial kickoff meeting, discuss the following:
- Identify what components of the organisation are to be included. At a minimum one would think that the main data centre and its equipment should be included. But decisions have to be made regarding remote locations, non-essential software, desktops, services and the like.
- Introduce the disaster recovery planning (DRP) project manager. Depending on the culture in your organisation, this can be a senior IT person, or a DRP consultant brought in to work with the team. (If your IT department is frequently maligned and openly criticised, you may want to seriously consider an outside consultant. This is particularly valuable when a significant shift in culture is required to get other members of the organisation on board with the concept of disaster recovery planning.)
- Define the responsibility of each member of the team. Basically IT will run the project, but very critical input will be required from each department. Define who will be involved in a recovery situation from each department. Later in the project you will define the role they will play.
- Put a project timeline together. The worst thing that the DRP project manager can do is not set a project timeline and a scheduled delivery date. DRP already has the stigma of being a project that can be put on the back burner. If no schedule is in place and the team is not held accountable for meeting delivery dates, the project will not get done in a timely fashion. Something else will happen that will cause the project to get pushed back or put off. The "tyranny of the urgent" will reign supreme. The "urgent" day-to-day matters will overshadow the critical bigger picture task of developing the plan.
- Discuss the steps that the team will go through to develop the plan. The disaster recovery plan is a growing, living document that will be in a state of constant change. Layout a "template" of what steps the team will go through, but leave it flexible to meet the specific needs of your organisation.
A basic template consists of the following steps:
- Identify Recovery Components (the scope of the plan)
- Perform an Operational Analysis
- Perform a Disaster Risk Analysis
- Document Systems and Applications
- Tolerance Analysis
- Prioritization of Recovery
- Organisational Chart
- Contact List
- Team Development
- Roles and Responsibilities
- Review of Plan
- Policy Changes
- Testing of Plan
- Plan Maintenance
- In the Event of an Emergency
- Using the Plan
Before you set this paper aside in a cold sweat, realise that this is not as intimidating as it looks.
Many of these steps can be handled in a single meeting and much of this information is already available. The development of the plan simply puts this information in a standardised format and keeps it in one place.
Disaster Risk Analysis
In a meeting held with the disaster recovery planning team, review the possible disaster risks that your organisation faces. This will depend on what you consider to be a disaster and the physical location of your facilities.
For some, any significant downtime of a main server could be considered a disaster. If this is the case, then the team must determine likely scenarios that could cause the server to be down. The idea is to prepare for those things you can control and plan a recovery for those you can't. The following is a partial list of potential sources of disaster that your organisation may want to consider.
- Hardware failure
- Power outages
- Physical security
- Civil unrest
- Labour disputes
The objective of this portion of the plan is to evaluate the likelihood of each type of disaster and list potential causes for each. The first six items should be checked off already because it is guaranteed that you will need to have some level of preparation for a potential disaster in these areas.
For each area that is appropriate for your organisation, identify potential risks. Determine if there is any reasonable action that can be done to mitigate the risk. For example, everyone needs to be concerned about a fire in the data centre. A reasonable action would be to install heat and smoke detectors and potentially an alarm system. This doesn't eliminate the chance of a fire, but it mitigates the impact that a fire could have on the organisation.
Evaluating these possible risk areas should generate a list of action items that should be included in the disaster recovery document. Risks that cannot be prevented or controlled, such as another blackout, will provide the organisation with a set of things to prepare for and include in the plan.
The team should document all identified risks, action items and concerns, and include this in the disaster recovery plan.
Pilots say that any landing you walk away from is a good landing. The best disaster recovery plan you will ever develop is the one you never have to use. The secret to success is to plan for the absolute worst and be ready for anything. Be thorough in the development of the plan. Test it. Keep it up to date. Think about recovery during your development projects.
The next time the lights go out, may you not be among the companies ill-prepared to deal with the dark.