Planning for the unthinkable

Working out what to do should a disaster occur used to be complicated, but automated systems have made it easier. Darren Greenwood finds some New Zealand examples

Six steps to disaster-proofing

Iron Mountain says there are six fundamental areas that need to be addressed:

▪ Assign accountability, responsibility and authority

▪ Assess storage risk

▪ Protect information everywhere – at all times

▪ Copy your back-up tapes

▪ Encrypt all sensitive data

▪ Consider electronic-vaulting

Wellington is built on notoriously shaky ground, but the New Zealand Fire Service’s IT manager, Ian Scott, is confident in the event of a full-scale disaster the service’s Auckland back-up datacentre will kick-in quickly to deliver essential IT services.

Scott is confident because not only is the system’s design simple, but it has also been fully and repeatedly tested (see box).

For many New Zealand organisations, such strong failsafe recovery may not be necessary. But businesses are increasingly coming to depend on information technology, so it’s becoming critical to ensure IT services can be restored with minimal disruption.

Businesses are also being faced with legal requirements to store data for many years — and to ensure that data is readily available. And, on top of this, corporate governance regulations are shifting responsibility for business continuity and risk management back to directors and executives.

Fortunately, the cost of putting in place effective disaster-recovery and back-up technology is coming down, as this technology migrates down the food-chain — from government and corporates to small- and medium-sized enterprises.

Sam Mulholland, of Dunedin-based Standby Consulting, says the growing dependence on technology is leading to greater use of replication and resilient hardware. Hardware manufacturers now include features like hot swappable disk and power supplies — and software manufacturers include replicating features — in their standard offerings.

Virtual security

The other major trend is a move to virtual machines. This technology is particularly suitable for those with more than 15 servers. Users find virtualisation reliable. It offers excellent recovery, particularly when a whole server can be “dragged and dropped” from one location to another.

Auckland-based Commercial IT Services specialises in such virtualisation, saying virtual machines can be up and running within minutes, and are more cost-effective than replication as the need for extra hardware is avoided.

Voice-over-IP is another useful technology, as it allows secondary PABXs to step in should a system fail. Load-balancing of call centre work is similarly possible.

Tapes are still used, mainly for off-site storage, explains Mulholland, but tape-based systems can lose data, so replication and mirroring is finding favour. Some clients also split the load between two sites; having two centres allows for upgrades and repairs while keeping systems flowing.

Internet-based solutions are not proving as popular as expected because of unreliability and bandwidth limitations. They might work for small firms, but not for large ones. Some companies, such as Revera, based at Albany on Auckland’s North Shore, are also using storage vaults

Standby Consulting does not itself supply the hardware and software for disaster-recovery projects, as it believes its independence has value in the market. Instead, it advises organisations on what to do.

Mulholland says disasters may only happen once every five to 20 years and some losses are acceptable. But organisations need to reduce their risk, see how they can do this, and analyse the impact of a disaster on their organisation. They then need to work out the recovery sequence of business processes, assessing what is critical and what isn’t.

“Then you can build an effective, cost-effective and cost-justified strategy. Management understands why you need it — the cost; the plans,” he says.

Typically, Standby will devise a business-impact assessment and a DR/BC (disaster recovery/business continuity) plan for customers. This starts with staff interviews regarding what systems are critical. These are then ranked in order of importance and tolerable downtime. Recommendations are then made, which may include a need for new technology.

Health boards lower their DR risk

Hutt Valley DHB is now installing a storage area network in each of its two computer rooms and virtualising its servers.

“If one server breaks down, you can shift the processing capacity and storage to the second computer-room in real-time,” says CIO Tony Cooke.

The DHB has a BC plan, based on Standby’s templates, and, while it expects localised failures, if there is a major fire it knows where to find the relevant information, says Cooke.

Harry Barber, CIO of Gisborne-based Tairwhiti DHB, also used Standby to assess risks, identify critical components and rank them. Such a “useful piece of work” helped determine spending needs, but left the heath board with a DR/BC plan that is “a living document” which can be constantly be updated.

“DR is only as good as its currency. It’s finding the time to stay on top of it,” Barber says.

The Aviation and Tourism Training organisation also enlisted Standby to help with its planning.

“They identified gaps and weaknesses, and strengths, and suggested how we fix the weaknesses. Some we did, some we didn’t due to cost issues. They simply wrote the plan,” says quality assurance officer Sharon Payne.

A raft of vendors, other consultants and systems integrators are happy to fill the local DR gap.

Iron Mountain, for example, offers a tape-exchange service, as well as an internet-based back-up offering called LiveVault. The company, formerly known as Pickfords, has remote sites around New Zealand and offers daily pick-ups of tapes for customers.

Customers include Fonterra and Air New Zealand, for the company’s “cradle to grave” service for records.

Auckland manager Steve Haythorne confirms that DR/BC is now more visible on company agendas and that companies realise they can’t let their lowest-paid staff take the tapes home.

Recruiter Hudson Global uses the tape pick-up service, with IT manager Tony McCarthy saying the service is reliable and that it’s easy to have tapes restored.

Mirror sites were too dear, as were internet-based alternatives, because of bandwidth costs.

McCarthy says it also helps knowing when Iron Mountain will pick up the tapes. “You can set your watch by them,” he says.

Allied Workforce is similarly satisfied, according to IT manager Saxon Harfell-Jones. He uses the tape-exchange service rather than just taking the tapes home, as his predecessor did.

Now, they are kept in a controlled environment and can be safely restored. Iron Mountain’s website also allows ad hoc or other changes in return times, he adds.

Shane Clark, operations manager for hosted infrastructure provider OneNet, says using the service made sense as it strengthened an existing arrangement with Iron Mountain. Its customers also required off-site storage with multiple copies being made.

“They are very responsive. People seeking such services should see how the company performs; what the access is to the tapes. You have to have a company you can trust,” says Clark.

‘Excellent’ storage

Hosting and storage company Revera offers back-up as part of its wider hosting service.

Mike Baker, director of Wellington-based Helium Worksearch, says the Revera service covers all of his company’s IT support, with all their systems loaded into Revera’s own datacentre.

“It’s all within their back-up structure. If we have a problem, they deal with it. Once we lost some documents and they restored them from their large back-up,” he says.

Revera also offers a web-based “back-up over wire” service, whose users include Transit New Zealand. However, the more generalist providers offer a wider range of options. For example, Axon recently created a Centre of Excellence for its storage offerings, promising “flexibility” of services and technology.

Centre head Craig Davies says firms are starting to bring services back in-house, but they still need guidance, especially when it comes to the challenge of measuring “criticality”.

Gen-i, which has eight “telco-standard” datacentres across New Zealand, reports a growing need for robust systems that meet the demands being made legal storage requirements. Business continuity manager David Reason echoes others when he says that virtualisation is being used to meet this extra demand, through its “smaller footprint”.

When Datacom installed the government’s Log-on Service (GLS) it met the demand for security and availability by sharing loads both within and across datacentres.

“Clever design of a robust, high-availability environment will inherently be disaster-proof,” says director Steve Matheson.

But he has good news for those whose systems aren’t so robust at the moment.

“With the economies of scale created by shared infrastructures, and the cost of hardware and high-speed networking generally declining, more organisations are able to enjoy this (disaster-proof) security,” he says.

SIDEBAR

Slaughtering DR costs

Martin Wellesley likens his service to insurance, with monthly payments. The director of Albany-based Plan-B uses a service run by DMD Internet and claims to “slaughter DR costs by 95%”.

Plan-B offers a tape-based back-up and recovery service, coupled with offices and related services that customers can use should they face a disaster like a fire that prevents them from using their office.

Around 420 companies use Plan-B, including Lumley Insurance, BMW New Zealand, Ford and Daimler-Chrysler, and Pernod-Ricard (Montana).

Wellesley says his firm targets the financial controller or chief executive, as these are the people who are ultimately responsible for disaster recovery and business continuity.

Every business has a “nerve centre”, he explains, such as a head office and the technologies needed to run it – including the IT and phone systems – plus company records.

If access to the nerve centre is denied, say by fire, or there are computer problems, a business can suffer major damage if it cannot operate for several hours, never mind a few days or weeks. It is the length of the “downtime” that can turn a crisis into a disaster.

Plan-B assesses how soon the phones, computers, etcetera must be working again to avoid the business collapsing. After this risk assessment, Plan-B then provides guidelines for the company to follow.

Then, every day after this, Plan-B will pick-up recovery tapes, which are stored securely off-site, and handle the DR/business continuity problem. There is no need for DR consultants or even a DR handbook for staff to learn.

In the event of a technical failure, say, Plan B can restore and return copies of “lost files” or programs within a few hours. And, if the nerve centre is out of action because of a fire, Plan-B has alternative premises – with a 150-seat office space and equipment in Albany and Mt Wellington, and a mobile van that can visit customers. Plan-B technicians can deal with the transfer of phone numbers and software systems to alternative sites.

Wellesley says around 5% of customers activate Plan-B each year, with a couple needing alternative office space.

His service is unique, he says. But it doesn’t cover a major disaster, such as an earthquake or anything else that might destroy Auckland’s infrastructure.

Such coverage is not always necessary, because if Auckland was flattened by an earthquake, for example, everyone would be affected. Even if systems could be restored, there might well be no power, the roads might be closed and employees everywhere would be too concerned about themselves and their families to turn up for work.

When New Orleans was hit by Hurricane Katrina, companies failed not because they had no disaster-recovery systems, says Wellesley. It was because they chose to leave the city.

Gillian Fairhurst, IS manager for BMW New Zealand, uses Plan-B to ensure the 24/7 operation of BMW’s dealer management system, which supports the car company’s dealerships nationwide.

Plan-B recovers data from tape back-ups, and handles BMW technologies such as Unix, Linux and Microsoft operating systems and hardware. BMW has tested the plan but has not needed to activate it for real.

Fairhurst says the Plan-B system runs smoothly. She says firms looking at DR/BC systems need to involve all staff in identifying critical applications and be realistic about restoration times. Regular testing is also essential.

Simple systems safest

The New Zealand Fire Service previously had no business continuity plan, just back-up data tapes stored off-site, says IT manager Ian Scott.

But over the past five years, the number of its in-house business applications has increased from four to 10, making them “operationally important” for 24/7 service.

Working with its incumbent supplier, Gen-I, the fire service developed a replicated DR site in Auckland, with exchange, data and Citrix servers, to work alongside the service’s Wellington production site.

Monthly tests disconnect the DR site from the production site, to check the availability of systems and applications to ensure all changes on the production site are replicated on the DR site. Every six months a full failover to the DR site is carried out.

“With the regular tests we can be confident that in the event of a disaster a full failover to Auckland will be successful,” says Scott.

The applications and technology using replication provide real-time updates of all information to the DR site, and the simple design minimises the effort involved in duplicating changes between the sites. Systems must be simple, and tests a regular and a routine function, says Scott.

“Have up-to-date documentation, and ensure a number of IT staff have a good technical and operational understanding of how the systems work in the failover to DR. Make sure everyone understands the DR process. Involve your IT partners in the process [and] ensure senior management endorses the DR/BC plan,” Scott advises.

Join the newsletter!

Error: Please check your email address.

Tags disaster recoverySpecial IDstandbyRevera

Show Comments
[]