The IBM datacentre responsible for Air New Zealand's computer failure is not best equipped to host mission-critical applications and may need to be replaced or retrofitted with extra generators to bring it up to best practice, sources say.
Air New Zealand's computer systems were crippled on the last day of the school holidays, delaying dozens of flights, after a power generator at IBM's Newton data centre failed due to a faulty oil pressure sensor.
IBM and Air New Zealand are still investigating the Newton outage, but it is understood IBM was relying on a single generator to power the airline's mainframe during scheduled maintenance on an uninterruptible power supply (UPS) attached to the datacentre's main power supply.
A UPS is designed to smooth out brown-outs and spikes in the electricity supply and provide battery back-up in case of a power cut.
Ron Hughes, the president of California Data Centre Design Group, a US consulting firm that specialises in datacentre power systems, says IBM appears to have followed "standard operating procedure", but he questions the reliance on a single back-up generator.
"What they were doing in terms of maintenance practices was probably the way to go. The issue I would have was why there was only one generator.
"Current design practice is that if you have got anything that is mission-critical, you are going to run it in a `tier 3' or `tier 4' datacentre where you have redundancy built in. If you need one generator, you have two, if you need two, you have three. It sounds like, on that site, they had a big single point of failure."
Air New Zealand spokeswoman Tracy Mills would not comment on whether it would be reasonable to expect its mainframe to be protected with an additional back-up generator, saying that question should be directed to IBM "as they are the provider of this service". IBM declined to comment.
An industry source says the airline would have known about the back-up arrangements at the datacentre, which was originally owned by Air New Zealand.
"The reality is the site has been there for a long time and it has not been designed and built to the same standards that might exist if you were building a new datacentre. Retrofitting some of that resilience is something that boils down to the business case and issues of risk, and who wants to carry that risk.
"It would be fair to assume the disaster recovery provisions weren't initiated with the haste or focus that the issue might have deserved."
The source says Air New Zealand's main outsourcing contract with IBM is up for renewal in two years and it might prove hard for the airline to terminate the arrangement early. "Anything involving mainframes is both costly and prolonged to try and resolve."
Auckland University information technology director Stephen Whiteside says there has been under-investment in datacentre capacity in recent years.
IBM New Zealand is understood to have been seeking approval from its Australian parent for at least two years to build a more modern datacentre in Auckland at a cost of tens of millions of dollars, with funding not yet confirmed.
The university removed its back-up systems from the Newton data centre three months ago because of capacity restraints. Mr Whiteside says the university got a good service from IBM, but was disappointed by the range of alternative options. "We felt, for the Auckland region, there has not been a lot of choices."
The biggest recent investment in commercial datacentre capacity was made by privately owned Datacom. Prime Minister John Key opened its $30 million Orbit data centre on Auckland's North Shore in May.
Datacom chief operating officer Steve Matheson says the centre has three generators and two more are being installed. One would always be available as an extra back-up. "I think we are the only commercially available facility in Auckland that has an extra generator."
PULLING THE PLUG
November 4, 2007: Emergency services are forced to resort to whiteboards and paper to track call-outs after a power outage at Telecom's Auckland exchange brings down police and fire computers.
October 15, 2002: ETSL's eftpos network is offline for more than an hour after an operator accidently triggers an emergency power-off switch at EDS' Auckland datacentre.
October 14, 2001: ETSL's eftpos network crashes for 35 minutes. EDS says an uninterruptible power supply at its Auckland datacentre failed as it was switching from generator to mains power following scheduled maintenance.
September 13, 2001: Thousands of workers went unpaid after a power failure at EDS' Auckland data centre. The Reserve Bank was forced to delay overnight settlement transactions. There was no immediate explanation as to why back-up systems did not kick in.
December 13, 1998: The National Library's computer room is burned out destroying millions of dollars worth of Sun servers after a new power supply is installed. The incident is not made public till three years later.