Drills vital for effective distaster recovery

Practice makes perfect when it comes to DR, experts say

In November, a fire broke out in one of the buildings on ISTA Pharmaceuticals' main campus, forcing about 50 employees to move to another location on the property. After the building's sprinklers kicked in, the entire network had to be shut down because the water threatened the equipment carrying the company's inbound data traffic.

Managers and employees at the pharmaceutical company handled the situation with composure, says IT Director Keith Bereskin. The company's network and core applications were back online within two hours, and only 10 of the affected employees had to stay away from their offices for more than three hours, according to Bereskin.

That wasn't bad, "considering it wasn't something we formally talked about", he says.

ISTA's "mini-disaster" happened to coincide with a disaster recovery gap analysis being conducted at the company. In that analysis, a consultant discovered that the IT department, which oversees disaster recovery coordination, and the business divisions needed to communicate more effectively, says Bereskin.

The fire and the subsequent analysis helped spur ongoing discussions between Bereskin and his peers in various departments to determine what their expectations would be during a recovery. Among other things, they're working to identify the data they would need right away and the systems and processes that would have to be restarted immediately.

"People have said to me that we should have had these discussions previously," says Bereskin. Now they do; Bereskin says he coordinates disaster planning meetings with his business peers several times a year.

The situation at ISTA highlights the types of communication problems that often exist among disaster recovery managers, business executives and line workers, according to disaster recovery experts. "The people side of disaster recovery planning is often overlooked," says John Linse, director of business continuity services at EMC. At many organisations, when it comes to communicating disaster recovery plans, "there's almost this 'shoot, ready, aim' kind of approach", says Linse.

For instance, one EMC customer didn't have an effective disaster recovery plan in place when it suffered a power outage in June, so a security guard ended up being the one who made the decision to send home the 1,300 affected employees. The outage lasted two days and cost the company US$1.3 million (NZ$1.65 million) in business, including estimated lost revenue for orders that couldn't be taken. Afterwards, EMC helped the company craft a business continuity plan that included identifying key business processes that need to stay up during a disaster — and which people are responsible for them.

At Austin Energy, CIO Andres Carvallo says the purchase of a disaster recovery planning tool was an essential element in bringing key decision-makers together to craft a recovery plan in late 2003. Using the Living Disaster Recovery Planning System (LDRPS) from Strohl Systems Group, Carvallo and Austin Energy's disaster recovery manager worked with supervisor-level business process owners to identify which processes needed to be recovered and when.

"As you go through this business by business, you populate the software with business processes and the people who need to be involved in the decision-making," says Carvallo. "In our case, 1,600 people are impacted by the tool."

Although LDRPS is only one component of Carvallo's effort to communicate the disaster recovery plan to his fellow Austin Energy employees, he says it has played a big role in helping the utility map a strategy and get the message to resonate with its staff.

Since Austin Energy deals with power outages on a regular basis, disaster recovery is already embedded into its culture, but Carvallo says that prior to his arrival at the utility in early 2003, business continuity "really wasn't understood as a responsibility of every line of business. So we had to drive this company-wide".

LDRPS has helped Carvallo achieve that goal because it can track the percentage of the disaster recovery process that each manager is responsible for. "It helps drive this whole notion of accountability," he says.

Carvallo's approach to engaging the decision-makers and line managers ultimately responsible for executing key business processes underscores the importance of spreading disaster recovery planning to all corners of an organisation.

One way to get the word out is by organising a field trip. Shortly after Vinny Licht became CIO and took over disaster recovery responsibilities at Tauck World Discovery five years ago, he arranged for employees to visit the tour operator's disaster recovery site.

The turnout and response "was huge", says Licht. "[Employees] know we have a site and [that] if there's a disaster, everyone should go there."

"To have a really effective plan, you have to wire it into the DNA of the organisation," says Rod Masney, chairman of the Americas' SAP Users' Group and global director of IT infrastructure services at Owens- Illinois , a glass container manufacturer.

Five years ago, when he was employed at a different company, Masney worked with business leaders to craft a disaster recovery plan that included creating recovery procedures for each business unit.

To engage some of the senior business managers who were "less passionate" about disaster recovery planning, Masney and other business leaders drew them into practice drills "so that they could see, hear and understand our objectives for key functional areas". Involving stragglers in the practice tests helped convince them of the need to document and test disaster recovery procedures within their areas of responsibility, says Masney.

To help make it easier for slow-to-respond managers to develop business continuity plans for their departments, Masney and other members of the disaster recovery planning group provided them with business continuity software templates that other business units had already developed. The templates included a guide to help managers identify which people in their organisations should respond to help get operations up and running again.

Most of the dawdlers "got on board very quickly," says Masney. But that response wasn't universal.

"We had one functional area where we had trouble getting those folks on board," he says. "They didn't really understand what we were trying to do. Perhaps we weren't providing the right type of education to them."

Some of the managerial and employee resistance to disaster recovery planning can be chalked up to the fact that business people face other day-to-day demands that often carry a stronger sense of immediacy, says Jim Michael, treasurer of Share, an IBM user group.

An effective way of communicating a disaster recovery plan to employees is to summarise the critical business processes that need to continue, and explain how they're being prioritised and why, says Michael, who is also an IT manager at a California state university. "You don't hand them a 140-page document and say, 'Go figure this out.' You're respecting the fact that this is a complicated process and that you're trying to make it clear to them," he says.

Like Carvallo, Michael has stressed the importance of engaging the line managers who are closest to the business processes being addressed.

Says Michael, "The plan is only going to be as effective as [business managers] help the plan to become."

Including front-line employees in practice drills not only ensures that the plan works; it shows people what to do.

Practice doesn't guarantee success, but test drills certainly help disaster recovery managers and project teams to identify gaps and areas for improvement in their organisations' disaster preparedness. Practitioners offer the following checklist of what to do (and what not to do) during a test run:

-- Do: Make sure that key decision-makers and rank-and-file employees alike have access to the disaster recovery plan, or a simple set of instructions they can keep in their purses or wallets.

-- Do: Before the drill starts, identify a single leader to communicate to employees what needs to be done.

-- Do: Before the drill starts, identify a single leader to communicate to employees what needs to be done.

-- Do: Establish clear objectives for the exercise. Understand what is meant by success.

-- Do: Make sure you have mission-critical data stored at a location away from your primary datacentre and pull that data into test drills.

-- Don't: Practice for just one type of event. Disasters come in all shapes and sizes. Practice for different scenarios (for example, a network outage or a pandemic) to help employees understand the impact of different types of disasters and what their roles are expected to be.

-- Don't: Use your test drill to figure out your communication plan. Testing communication should be a key part of your drill. Disaster recovery team members should have off-site contact information for key personnel, and they should keep that information both at work and at home.

-- Don't: Play the test drill as a low-key event. Even though it's only a drill, behave as though it's a real crisis. Practice the way you want it to play out in real life.

Join the newsletter!

Error: Please check your email address.

Tags managementdrillsdisaster recovery

Show Comments

Market Place

[]