Combine open-source software, distributed storage running on low-cost hardware and the World Wide Web, and what do you get? Storage for as little as 15 cents per gigabyte per month, and 10-20 cents more for each gigabyte users upload or download.
That's a pretty good deal, especially when it costs US$15-$25 (NZ$19-$33) per gigabyte just to buy the hardware and software needed for secondary (backup or archival) storage, and US$50-plus per gigabyte for the primary storage needed for business-critical applications such as share trading or airline reservations systems, says Andrew Reichman, an analyst at Forrester Research. Also, neither of the prices quoted above take into account ongoing management costs.
But don't throw away your Fibre Channel storage-area network (SAN) just yet. These web-based services lack the performance required for online transactional applications or giant database queries. There are also questions about security and how much data companies are willing trust to a node somewhere on the internet.
Still, a combination of three technologies could reduce reliance on higher-priced, proprietary storage hardware and software from industry giants like EMC, IBM, Hewlett Packard and Hitachi Data Systems, not to mention a host of smaller vendors. And the established companies seem to be aware of the trend: In October, EMC acquired Berkeley Data Systems, whose Mozy services provide web-based backup to consumers and businesses.
The first piece of the technology puzzle is open-source storage software. Examples of promising open-source applications include tools for specific storage functions, such as the Amanda open-source backup system and the Darik's Boot and Nuke disk-wiping utility. Others include network file systems such as Lustre, OpenAFS and Samba, which can form the foundations of entire storage infrastructures.
Next on the list are distributed grid- or cluster-based storage architectures from start-ups like Cleversafe and established services such as Mozy.
And those architectures are built using the third technology: industry-standard servers and disk drives, which supplant high-end storage arrays.
For example, the MozyPro online backup service uses Berkeley storage-clustering and file-serving software running on "white box" (unbranded) servers that store data on their internal drives. The price: US$3.95 per month for each desktop or server using the service, and 50 cents per month for each gigabyte of data stored. Unlike other online storage systems that store multiple copies of customers' data, Berkeley's software saves 33% of the original data, from which it can restore the complete original if needed. This means it must store only 33% more data than a customer sends it. In contrast, other storage providers must store 300% of the original data, says Vance Checketts, EMC's vice president for Mozy products.
Cleversafe, a start-up that is beta-testing software it will offer to other companies to build open-source, web-based distributed storage architectures, goes further. Its software uses algorithms to split encrypted data into "slices" that are stored on distributed servers and must be combined to yield any usable information. CEO Chris Gladwin says that the data-slicing is inherently secure because no one storage node contains an entire copy of any file, making it harder to steal or corrupt. Availability is also assured because the software can recover the data if some of the nodes fail.
The Planet.com Internet Services, a Houston-based hosting firm, is investigating Cleversafe's Dispersed Storage software as a way to use older servers to create low-cost storage grids. "Instead of going for three years or four years, with the proper upgrades in disk drives, we could get five to six years of life out of them and at the same time offer storage to our customers," says Planet.com chairman and CEO Doug Erwin.
Perhaps the biggest provider of online storage services is Amazon.com Inc. Adam Selipsky, vice president of product management and developer relations for Amazon Web Services, says the company's S3 service is supported by "multiple arrays of storage servers at multiple locations, storing multiple copies" of customers' data. Designed for software developers who could benefit from using low-cost storage as they experiment with building new applications, the service costs 15 cents per month for each gigabyte of data stored, 10 cents for each gigabyte uploaded, and 13-18 cents for each gigabyte downloaded. Selipsky declines to describe the technology used in S3 in further detail, except to say that Amazon "predominantly uses open-source software" throughout its infrastructure.
John Webster, an analyst at Illuminata, says the combination of open-source software and grid storage technologies could pose a real threat to vendors of copy, backup and disaster recovery software. "If this approach really works, it's a game-changer", because it could fundamentally simplify storage management, he says.
By selling "storage devices that are grid-ready, [Cleversafe] completely disrupts the market", says Stelios Valavanis, president of onShore Networks, a networking consulting firm. "This affects the big storage makers such as EMC. That's whose lunch Cleversafe will eat."
Some other observers, however, predict that users will keep buying proprietary products for their most critical applications — in part because they're concerned about the inherent latency and unpredictability of the internet.
Security is another concern. Jeff Pieper, president of Pieper & Associates, a marketing design firm, is the type of small business customer the online storage vendors are courting. But he says he has to sign multipage non-disclosure forms with many of his customers, so he plans to keep their data on his 4TB SAN from Hitachi to be sure it's safe.
However, users who choose to build their own grids in-house can maintain control over their networks, and they might be more willing to use them for primary storage, says Webster.
Then there is the question of whether users of web-based storage services actually save money. Reichman says upfront costs for distributed storage are undoubtedly far lower than they would be for in-house storage hardware, but it's still unclear how long-term management costs will compare.
Reichman predicts that small to mid-size businesses will likely be the first to use such services, to avoid the "tremendously difficult" job of managing their own storage. As these new technologies are proved, he sees larger companies moving more secondary storage to these third-party vendors. Others may adopt such technologies internally, he says, allowing them to reap the cost savings while maintaining control over their own storage. Some banks are already evaluating such moves, he says.
Any move to grid storage won't happen overnight, nor does it have to. A dramatically new approach such as Cleversafe's needs marketing, says Valavanis, as well as time for the technology to mature.
Gladwin also points out that "IT organisations generally replace hardware every four years or so. If someone just bought a brand-new architecture, they're not going to scrap it six months later." In two to three years, though, Gladwin expects that "distributed architectures will become often used for large data-archival applications".
By that time, the pioneers on both the customer and the developer sides will have a much better idea of how big a storage revolution they have on their hands.