While Air New Zealand has a secondary datacentre in Auckland, it wasn't chosen as the most appropriate fallback on Sunday afternoon, when the outage that affected check-in, booking and call centre functions, Air NZ CIO Julia Raue says.
The incident occurred while running the main datacentre deliberately on generator power, in order to conduct maintenance on the uninterruptible power supply (UPS) system, Raue says.
“The intention was for the IBM team to bring down the UPS for maintenance, and run all systems on generator power deliberately bypassing the UPS during this maintenance window.
One hour into the window, the generator failed leaving all systems with no power,” says Raue.
The quickest expedient was to shift the systems back to mains power and this was done within “a matter of minutes”.
Unfortunately there had been a “crude and unclean shutdown of all systems”, she says. “On restart, some data corruption and reboot issues were experienced across various platforms.” Some key systems were then brought up at the secondary site.
Asked whether damage to the systems from a “few minutes” outage might reflect inadequate planning of service-level agreements (SLAs) and other contractual terms in Air NZ’s contract with IBM, and therefore some share of the blame to be shouldered by the airline, Raue suggests this cannot be argued.
“While the terms of our contract with IBM are explicitly confidential, responsibilities are clear and well understood by both parties. What was disappointing was the significantly extended outage that resulted, which was well outside those contracted SLAs, and the consequences to our inconvenienced customers.
“We will continue to work with IBM to assist them to identify cause, ongoing risk, full resolution and areas of improvement,” Raue says.