Combine open-source software, distributed storage running on low-cost hardware and the World Wide Web, and what do you get? Storage for as little as 15c per gigabyte per month, and another 10-20c for each gigabyte users upload or download.
That’s a pretty good deal, especially considering that Andrew Reichman, an analyst at Forrester Research, estimates it costs US$15-$25 (NZ$20-$33) per gigabyte just to buy the hardware and software needed for secondary (backup or archival) storage, and US$50-plus per gigabyte for the primary storage needed for business-critical applications such as stock trading or airline reservations. Neither of these prices take into account ongoing management costs.
But don’t throw away your fibre channel storage-area network (SAN) yet. These web-based services lack the performance required for online transactional applications or giant database queries. Then there’s the question of security and how much of their data companies will trust to a node somewhere in the internet “cloud”.
Still, if promising new technologies deliver, they could reduce corporate reliance on the proprietary, higher-priced storage hardware and software sold by industry giants such as EMC , IBM and Hitachi Data Systems, and a host of smaller players.
The first technology enabling this new storage platform is open-source storage software. This can be in the form of tools for specific storage functions, such as the Amanda open-source backup and the Darik’s Boot and Nuke (DBAN) disk-wiping utility. It also includes network file systems such as Lustre, OpenAFS and SAMBA, which can form the foundations of entire storage infrastructures.
The second technology is distributed grid- or cluster-based storage architectures from start-ups such as Cleversafe and established services such as MozyPro from Berkeley Data Systems.
The third enabling technology is the use of industry-standard servers and disk drives in lieu of high-end storage arrays in these architectures.
Berkeley Data Systems, for example, bases its MozyPro online backup services on its storage clustering and file serving software running on “white box” (unbranded) servers running in the Berkeley Data Systems datacentre that store data on their internal drives. The price: US$4 per month for each desktop or server using the service and 50c per month for each gigabyte of data stored.
Unlike other online storage providers that safeguard customers’ data by storing multiple copies, Berkeley’s software saves 33% of the original data, from which it can restore the complete original if needed. This means it must store only 33% more data than a customer sends it, compared to other storage providers who must store 300% of the original data, says Vance Checketts, Berkeley’s vice president for products.
Cleversafe, a start-up that is alpha-testing software it will offer to other companies to build open-source, web-based distributed storage architectures, goes further. Its software uses algorithms to split encrypted data into 11 “slices”, which are stored on distributed servers and must be combined to yield any usable information.
Using the same algorithms, the software can recreate the original data from any of the original slices. By eliminating the backup, archiving and restoration of entire files, Cleversafe reduces the amount of “extra” data a company must store to protect critical information from the current 300% or more of actual data to 130%, according to Chris Gladwin, the company’s CEO.
He also claims the data-slicing is inherently secure because no one storage node contains an entire copy of any file, making it harder to steal or corrupt. Availability is also assured because any five of the 11 nodes can fail, and the software can still recover the data, he says.
Perhaps the biggest online player is Amazon.com. Adam Selipsky, vice president of product management and developer relations for Amazon Web Services, says its S3 service is provided by “multiple arrays of storage servers at multiple locations, storing multiple copies [of customers’ data]”. It is aimed at developers who can experiment with building innovative applications because of its low cost: 15c per month for each gigabyte of data stored, 10c for each gigabyte uploaded, and 13-18c for each gigabyte downloaded. Selipsky won’t describe the technology used in S3 except to say it includes “multiple arrays of storage servers at multiple locations, storing multiple copies” of data and that Amazon “predominantly uses open-source software” throughout its infrastructure.
John Webster, an analyst at Illuminata, says the combination of open-source software and grid storage technologies could pose a real risk to vendors of copy, backup and disaster recovery software.
Others, however, predict users will keep buying proprietary products for their most critical applications.
One reason is the inherent latency and unpredictability of the internet, which a storage manager cannot tweak for rock-solid reliability and predictable response times. Security is another concern. Jeff Pieper, president of Pieper & Associates, a marketing design firm, is the type of SMB customer being courted by the online storage vendors. However, he says he has to sign a multi-page non-disclosure form with many of his customers and plans to keep their data on his 4TB SAN from Hitachi to be sure it’s safe.
Customers who build their own grids would have control over their networks, and thus might be able to use them even for primary storage, says Webster.
Then there is the question of actual savings. Reichman says upfront costs for distributed storage are undoubtedly far lower than for in-house storage hardware, but it’s still unclear how long-term management costs will compare. Gladwin says it’s too early to discuss specific pricing for Cleversafe grids, but he says customers should see savings “at least proportional” to the reduced disk space, power, floor space and management they will require.
Reichman says the major storage hardware vendors will inevitably lose some business as customers move storage from in-house hardware to web-based providers. But he says vendors that also sell servers could “make up some of the revenue” by selling low-priced servers and other “building blocks for the grid”.
Stelyos believes grid-based storage could even be a boost for those vendors. “Even though Cleversafe allows you to use less expensive hardware, the reality is that big companies building grids in their IT departments will not tolerate buying cheap disks. The corporations who are buying EMC now and want to build on a grid model, who are they going to buy their disks from?” he asks.
Like other online vendors, Berkeley Data Systems founder and CEO Josh Coates sees MozyPro replacing tape-based backup more frequently than high-end disk. Cleversafe’s Gladwin sees his company’s products as a complement to, rather than a replacement for, current storage offerings. While backup is built into a Cleversafe grid by virtue of how data is stored, he still expects many customers to continue to take, for example, snapshots to capture the state of their data at a given point in time.
Reichman predicts that small to medium-sized businesses will likely be the first to use such services, to avoid the “tremendously difficult” job of managing their own storage. As these new technologies are proven, Reichman sees larger companies moving more secondary storage to such third-party vendors. Others may adopt such technologies internally, he says, allowing them to reap the cost savings while maintaining control over their own storage. Some banks are already evaluating such a move, he says.
Amazon’s Selipsky argues there’s a place for Amazon S3 in the enterprise because, like smaller organisations, enterprises “want very simple, very easy to interact with, very easy to integrate, highly reliable services”. He also says many departments or groups within large companies lack the budget or organisational ability to fund large infrastructure projects, but “might have $500 or $5,000 or $50,000 to mess with during a quarter, to prove a concept, [or] to try something out”.
Any move to grid storage won’t happen overnight, nor does it have to. A dramatically new approach such as Cleversafe’s needs evangelisation, says Stelyos, as well as “understanding the technology to a certain extent”.
Gladwin also points out that “IT organisations generally replace hardware every four years or so. If someone just bought a brand new architecture... they’re not going to scrap it six months later,” he says. In two to three years, though, Gladwin expects that “distributed architectures will [be used often] for large data-archival applications.”
By then the pioneers, on both the customer and the developer side, will have a much better idea of how big a storage revolution they have on their hands.