TORONTO (10/10/2003) - "Cheap and fast" hardware is the way to go, according to Craig Nevill-Manning, a senior research scientist with Google Inc.
Speaking at the 13th annual IBM Centres for Advanced Studies Conference (CASCON) on Tuesday in Toronto, Nevill-Manning offered some insight into the search engine company's processes -- including its lack of spending on hardware.
"Cheap hardware allows more computation per query," he said, explaining that the trade-off between software and hardware is that the software is written with the assumption that the hardware will fail -- so Google tries to keep the hardware simple.
"We want to exploit the processing and the power of the off-the-shelf hardware," he explained as part of a keynote address at the software developer conference.
This 'fast and cheap' mantra at Mountain View, Calif.-based Google resonates through the more than 10,000 servers used by query searchers worldwide, turning over more than 200 million search queries a day among four billion Web documents.
By using commodity PC hardware, which is similar to that of home PCs, Google buys cheap and builds high levels of redundancy into its system in an effort to compensate for the fact that one full day of Google use on a server is the equivalent of 40 machine years, Nevill-Manning said.
"Each server has many twins," he said. "Replication is needed for scalability."
This is where Google compensates for its use of inexpensive hardware: by ensuring that replication of the computers exists at many different levels and over a wide array of the various types of servers used in Google's data centers. As a result, there are many identical data centers around the world, Nevill-Manning added.
Because of the replication, maintenance on the servers can be taken at a slower pace, where some computers might not even be online within a week or more. The company also uses a monitoring system to watch the health of both computers and applications and knows instantly when there is a machine failure.
"Because the system is built this way, if a machine goes down, it doesn't have to be repaired right away," he said. "We can save money by doing this in a lazy fashion."
While the days of its garage operations and its cork-insulated aluminium data center racks are over, the company still believes in "fast and cheap," and continues to follow its mission of organizing the world's information, making it universally accessible and useful.
The mandate might seem broad, but the impetus to keep query searches relevant, while still being "fast and good," still comes with many challenges.
Nevill-Manning said that keeping the index server updated -- where there are over 390 million images stored -- is probably the most time-intensive task which is why the company hires a large team of people to do research in the quality of information retrieval.
When a query is entered into Google, it flows through several different servers including the index server and the document servers, where a short summary of the search results are provided for the searcher. All these servers are replicated.
The system is based on algorithms that are used to search for common links to Web sites, Nevill-Manning explained, which in turn delivers quick results.
As far as speed of the query results is concerned, he said that over time people using the search engine have increasing expectations of the pace a search should take.
"The amount of time a query spends in Google is small," he said. "Search in five years will be even more accurate and more user-centered."
The CASCON software development conference -- also known as the 'Meeting of the Minds' -- is co-sponsored by IBM Centers for Advanced Studies and the National Research Council Canada.