ANZ has two main processing centres in Australia -- one for production, the other for disaster recovery. In New Zealand there are four main sites, two in Auckland and two in Wellington.
The number of local sites may be reduced to two if infrastructure assessments deem such cuts necessary, but the bank claims any moves along these lines will not affect the reliability of its backup systems because fewer sites will be easier to manage.
The sites run the central LAN server farms, Tandem switching platform (for Eftpos and ATM), the contact centre systems (telephone banking) and the communication links for New Zealand. Each site has its own dual UPS with backup and diesel generators, plus dual fibre links for telecommunications.
And despite concerns from David Tripe of the Massey University Centre for Banking Studies, ANZ planning and projects manager John Hughes says processing in Australia has no extra risks to New Zealand data as there are at least three telephone links across the Tasman.
ANZ, says Hughes, has several levels of backup systems depending on the importance of a particular business function. ANZ says its systems operate between 99.8 and 99.9% of the time. In February only one out of 30 major applications missed their service level agreement, Hughes says.
The DASD disks are backed up mostly using mirroring systems (EMC is used to mirror DASD disks on the mainframe sites for disaster recovery purposes tapes). RAID "self-healing" storage tapes are used extensively, including in LAN servers.
Other backups are done to tape cartridge using automated tape libraries (ATLs), which on the mainframe are Storage Technology and IBM products. On the LAN these are Dell ATLs.
“The usual cycling of backups is done, complete with offsite storage of this media. The communications systems have dual feeds to all major sites -- including two separate high-speed links transtasman,” says Hughes.
To protect against outages and problems in internet and telephone banking ANZ uses dual production systems at separate sites, so if one system fails the other takes up the slack.
Other systems like Eftpos and ATM need a very quick switch to a backup system, Hughes explains. This was successfully achieved in a test for ATM services by switching from production to a disaster recovery system.
Other systems, like the LAN network, still allow limited functionality if one processing centre goes down, he says. The bank has also recently implemented dual access around the main centres of New Zealand for all HQ desktops attached to the two main central LAN server farms.
The HQ LAN systems for New Zealand run from two central site server farms in data centre environments, says Hughes. This allows for central backup systems and the ability (in the future) to allow one system to backup the other should a server farm completely fail. The individual servers in the server farm have dynamic fail-over to a backup server. Each individual server also has dual power supply and RAID storage, as well as being attached to the building UPS, he says.
Power supplies include battery backups and ultimately diesel backups. Such backups for backups also extend to the Tandem systems (Compaq Himalaya Non-Stop systems).
“This is achieved using dual CPU and dual DASD and dual I/O paths to all devices. Thus, even if a CPU, DASD unit or I/O path fails, the back-up pair takes over and the system continues processing,” says Hughes.
“In addition to this, ANZ runs a “twin system” at the disaster recovery processing centre, in the event of the whole system or site failure,” he says.
For the main banking systems ANZ operates an enterprise test centre for the specific end-to-end testing of large applications like customer accounts, customer transactions and card processing. The bank has a dedicated team for user acceptance testing.
Critical systems are tested six-monthly, others on a yearly basis. Methods range from communications testing to fully turning off the production system and turning on disaster recovery. A recent DR test of the Bonus Bonds system involved a duplicate prize draw being run using a third party for some of the processing.
“Most of these tests involve only a component or single application, so that we can test our capability to switch to the DR facilities, and also give our staff the chance to practice these products,” explains Hughes.
ANZ claims its systems are “extremely successful” with the telephone banking system allowing progressive upgrades to systems during slack periods (typically overnight), while the other systems have the whole load.
The Tandem systems achieves an availability “much better than 99.93%” demanded in the SLA, says Hughes.
“The HQ LAN environment now achieves a very good reliability with the central server architecture and a standard build for the desktop. As opposed to the ‘bad old days of LAN’ when the system just grew with little or no planning, and suffered the consequent problems,” he says.
Monthly availability targets are set for all applications, again depending on the business requirements. For example, the customer account system (CAP) is one of the main banking systems on the mainframe, and claims a 99.85% availability SLA.
Hughes says 24/7 banking presents extra problems even though the front-end systems are architectured for such continuity.
“Some of the back-end systems still have large batch processing components. As such, this then limits functionality to read-only for certain times. In addition, back-end mainframe systems typically require some down-time each month for system changes, application upgrades, or hardware upgrades,” he says.
Hughes adds that ANZ also has an extensive business continuity plan for IT, so they can carry on business if they are unable to use their current building.