Stress-testing a network usually consists of letting the network boys stay late to play games, but when your network is a telecommunications company, stress testing and disaster recovery takes on a whole new meaning.
Late in August, TelstraClear took down its entire network in a simulated in-house outage that lasted for two full days and took in a residential bill-run as well.
Chief operating officer Luigi Sorbello ran the exercise.
“It involved all our front-of-house products, CRM, billing and residential order management systems.”
Sorbello, who is based in Melbourne, runs TelstraClear’s system integration unit built around its purchase of Sytec in late 2004. Sytec staff, along with representatives from Sybase and Oracle, were on hand to oversee the test.
TelstraClear bought a new Sun E25K server as the heart of its new $2.3 million datacentre, housed on Auckland’s North Shore. The centre consists of the Sun box and a Cisco/Hitachi storage area network with around 1.8TB of storage for each of the three core applications. The SAN stores up to 36TB in total.
The entire system is mirrored in Wellington and data is automatically replicated across from the production site to the DR site in real time — one of the advantages of owning a telco, says Sorbello. Running at 1GBit/s, it took under three hours to populate the Wellington database with 2TB of information.
The shut-down of the Auckland centre started at mid-day on a Thursday.
“We took down the Oracle and Sybase databases, brought down the Wellington centre, rebooted Wellington and brought the apps back online from there.” The total transition time was around four hours.
“All call centre operations, billing and order enquiries worked on with no noticeable change,” says Sorbello. The Wellington centre continued to run as primary site until 3am on the Sunday when the process was reversed and Auckland took over again.
This was the first time TelstraClear had taken down its entire system like this.
“We wanted to prove we had the processes and disciplines to survive.” With customers demanding greater amounts of uptime and service-level agreements becoming tighter, Sorbello says it was important to TelstraClear management, but also to the front-line staff, that the company knew it could deliver on its promises.
“We wanted to go beyond the level of compliance demanded by regulations like ... [Sarbanes Oxley].”
TelstraClear plans future tests of its network. “While the technology is good, the important thing is that we have the people, the processes and the systems to deliver what we say we will,” he says. Sorbello says he takes great pride in the way the team performed under pressure.
“We wanted to prove to ourselves and to our customers that we could function with one or the other datacentre out of action,” he says.