RIM: System upgrade snafu led to BlackBerry email outage

There appears to have been a problem with this specific upgrade that caused the intermittent service delays, says RIM

One day after a service outage temporarily left BlackBerry users in North America without access to their email, Research In Motion said an initial investigation indicated that the outage was caused by problems with an internal data routing system that recently had been upgraded.

The upgrade was part of an ongoing effort to increase network capacity, "but there appears to have been a problem with this specific upgrade that caused the intermittent service delays," RIM said in a statement. No further explanation was provided.

RIM repeated earlier comments that the BlackBerry service was restored quickly and that no messages were lost during the outage, which started at about 3:30pm EST on Monday.

"RIM continues to focus on providing industry-leading reliability in its products and services, and continues to invest in its infrastructure and processes," the company added in its statement. It concluded by apologising to customers for any inconveniences that they experienced as a result of the problems.

Tuesday's outage was the second in less than a year that RIM blamed on an upgrade to the systems that support the BlackBerry service, which now has about 12 million subscribers.

Last April, the Waterloo, Ontario-based company said the flawed installation of cache optimisation software led to a half-day outage, which was worsened by a failure to switch the service over to a backup system. At the time, RIM promised that it would bolster some of its testing, monitoring and recovery processes in an effort to prevent repeat episodes.

It wasn't clear how many users were affected by the latest snafu, although a Verizon Wireless spokesman said that only data transmissions were affected — not phone calls. Email access appeared to be hampered for customers of all US wireless carriers, including Verizon Wireless, AT&T and Sprint Nextel, according to users and analysts.

John Halamka, CIO at CareGroup Healthcare System and Harvard Medical School in Boston, said the outage affected hundreds of BlackBerry users in the medical organisations for about four hours, lasting until 7:20pm EST on Monday. RIM notified him of the outage via email about an hour after it began, Halamka said, adding that the vendor estimated that about half of its subscribers were susceptible to the email problems.

"Luckily, the outage was at the end of the day, so my users on the East Coast were not vocal about the outage," Halamka said. "We do depend on BlackBerry services for many mission-critical functions, so I hope there were lessons learned [by RIM] to prevent future outages."

Asked whether he might seek alternatives to that BlackBerry service, considering this was the second outage in less than a year, Halamka was sympathetic to RIM. "I know the angst those inside RIM are feeling now," he said. "I suspect this outage will be the catalyst that results in more redundancy, leading to fewer [and] shorter outages in the future."

Halamka said he targets 99.9% uptime for all of the systems he oversees, which allows only eight hours of downtime per year. He has a more exacting standard of 99.99% uptime for the most mission-critical systems. But he didn't say how many hours of outages would cause real concerns about the continued use of the BlackBerry service at CareGroup and Harvard Medical School.

Phillip Redman, an analyst at Gartner, said it will be "critical to keep the highest service levels" possible for the service, especially as the number of BlackBerry users grows even further. If outages at RIM become a trend, they "will push users toward other options that don't have a single point of failure," Redman said.

Join the newsletter!

Error: Please check your email address.

Tags BlackberryRIM

Show Comments
[]