Nationwide Mutual, a US insurance company, has consolidated 14 distributed SANs (storage area networks) into four, housed in two production datacentres, in just 90 days. To add to its achievement, the company used only four full-time storage experts for the actual migration. The result: by simplifying its SAN architecture, Nationwide doubled the number of terabytes its administrator can manage.
Alan Grantham, a storage architect at the company, told users at the Storage Networking World conference earlier this month that the massive storage consolidation, which was completed a year ago, came off without a glitch because of one thing — planning.
Grantham didn’t reveal the project’s cost or discuss return on investment, but he spoke candidly about barring senior IT management from his datacentre during the project, calculating the risk involved with consolidating storage infrastructure and the importance of testing every failure the company ever had in the past on the new SAN — before going live. The project also involved consolidating 102 Fibre Channel switches to 20 Cisco director-class switches.
Grantham says there are three paths for going live on a new storage architecture: a live migration, a partial migration and an off-line migration.
Live migrations represent a very high risk to businesses because any operator error can create an outage, he told the conference audience.
A partial migration, which uses multi-path software to disable one data path at a time, in order to redirect data flow to the new SAN, requires a well-documented environment and is also very high risk because one error resulting in a disabled path creates an outage on a host.
Nationwide chose to perform an offline migration, where host servers are brought down and then reconnected to the new SAN. While it requires a business outage, “it’s the least complex and it’s often the fastest”, Grantham says.
“We were doing 100 servers in five hours — most environments want to be at that stage.”
However, in order to perform an offline migration, business unit leaders must be sold on the benefits and reduced risks, he says.
“If you do not sell [to your] your business units, consolidation is incredibly hard. I probably went out and met with 60 large-scale business units and said, ‘This is what we’re doing; this is why; this is [to] your benefit, and this is what happens if you don’t do it’.”
Grantham also strongly suggested creating a datacentre cable strategy to ensure resiliency, so that no single accident can take out an entire network. He also recommended documenting the entire storage infrastructure.
“I have a very good SRM tool. However, I still have my guys walk through racks of servers counting fibre cables and saying this server is not on our list,” he says. “We found 11 servers not documented anywhere on our SAN and some of them were 64-bit processor servers running critical business [processes].”
The number one killer of all projects is lack of funding, Grantham says. If, as an IT manager, you think it’s going to cost a certain amount, fight for that amount.
Grantham also echoed a common mantra of IT project managers: “Communications, communications, communications.”
Make sure you have a clear division of responsibilities and form a small dedicated team, he says. At Nationwide, a team made up of Grantham, two engineers and one vendor contractor performed the migration.
“Break migration down to monkey-friendly tasks. Have easy-to-follow processes, check lists, and you have plug-in cable,” Grantham says. “Yes, it’s a pain. But, on the night of the migration, they won’t be questioning what they’re doing.”