IBM is walking a mile in your shoes. In building the advanced Web site for the Olympic Games -- touted as the largest site ever built -- the company is tackling a host of problems that companies like yours face in deploying intranets on a large scale. The site, actually a group of sites spanning the globe, is online now but the bulk of the Olympic information -- athletes' biographies, event results, photos, video clips and more -- will appear this Friday with the opening of the games. The site is intended as a showpiece for IBM's Internet/World Wide Web technologies and expertise and its performance will be judged as ruthlessly and immediately as that of any athlete.
IBM software architects have been wrestling with a number of the critical issues that will face corporate MIS groups in the future: how to handle unpredictable, and unpredictably large, traffic loads and vast amounts of data; how to make it easy for users to access just the data they want; how to manage Web-based information spread over multiple servers in different locations; and how to automate the work of updating Web information and ensuring HTML links are current and unbroken.
"Corporate sites will run into these issues in spades when they start rolling out intranets in volume," says Don DePalma, senior analyst for Forrester Research, a Cambridge, Massachusetts, market research company.
"We've concluded there's little off-the-shelf software for such a project," says David Grossman, technology wizard with IBM's Internet division. "You can't just slap in HyperText Protocol Daemon Web servers and the File Transfer Protocol and expect that to scale into a reliable service."
The hardware foundation for the Web site is IBM's SP2 parallel computer, based on Reduced Instruction Set Computing processors, and running multiple copies of the IBM DB2 database. Information is transmitted from the Olympic operations systems in Atlanta to two Web sites, the primary site in Connecticut with 53 SP2 processors and a secondary site in New York with 16 nodes.
The ability to create such a highly scalable site is one of IBM's most powerful innovations, says Bruce Gill, vice-president and research director for Internet strategies at Gartner Group, a Stamford, Connecticut, market research company. At the Web site, the data transmitted from the Atlanta systems is processed on workstations by a special program that classifies the contents of the files. These classifications are reviewed by teams of human editors and the data is then copied to three mirror Web sites, also based on SP2 computers, around the world. Then, anyone with a Web browser can access the home page and search for information.
One of the key IBM goals is to make the site as interactive as possible and let users access data almost as it changes, says Jose-Luis Iribarren, IBM's manager of Olympic and sports Internet systems. "Interactivity means making information easy for people to access," he says. "But the downside is that you need a lot of programming support."
One big chunk of programming went into building software that fully exploits the capabilities of the underlying parallel computer. "Traditional Web server software running in a multinode environment runs in sequential or serial mode," Iribarren says. One server handles all of the incoming load until it is saturated and then the next server takes over. By contrast, all the SP2 nodes can be linked and the traffic load spread automatically and evenly across them. Additional nodes can be plugged in while the site continues running.
Another key programming task has been creating near-real-time links between the Atlanta operations systems and the Web, storing data in DB2 databases and building Web pages on demand. This is handled by IBM's emerging Web Objects Management (WOM) architecture. WOM is a set of software interfaces, services and tools that creates a consistent layer across data and applications on the corporate net and across Web servers.
Typical Web applications today are based on HTML pages -- flat files -- on a Web server. But such applications "run out of gas very quickly in a large, complex site", Grossman says. By contrast, WOM, in effect, breaks the Web page into components, or objects, and stores information (called metadata) about these objects in a relational database.
Then, in response to a browser request, WOM refers to the databases to assemble the objects into a finished Web page. Except for the event results information, WOM will manage all the Web-based information.
According to Gill, the IBM architecture can work as well with Oracle or Sybase databases as with DB2. The implications are far-reaching because WOM lets MIS groups open controlled doorways into existing corporate networks. "The Web becomes, in effect, a terminal into the glass house transaction systems and data," Gill says. "This point is not lost on IBM. It is saying the Olympic Web represents the next couple of years of IBM technology for the Web."
The actual Web page components -- the GIF and HTML files and so on -- that WOM uses are all managed by the Distributed Files System (DFS), which is an implementation of a key part of the Open Software Foundation's Distributed Computing Environment. DFS tracks all files on all disks. Administrators and developers -- and WOM -- see a tree of files, which can be anywhere.
Through DFS, WOM can drop or update the hypertext links among Web page components automatically, a critical issue in a complex and fast-changing Web site.