Google searches for an enterprise space

Google's search appliances give good results but could use some polish

The Google Search Appliance packages up the company's famously accurate technology into an easy-to-use search engine for intranets and public-facing corporate sites. In our Clear Choice test of the GB-1001 model, we found that while the searching and indexing features live up to the Google name, the product lacks polish and advanced management features.

The appliance's honeycomb case caught our eye, but the gloss wore off as we began to notice occasional unevenness in the appliance. For example, the appliance takes a number of minutes to start up and run its various system checks. To alert the user to when it is finished, it plays a little tune. In testing in our server room and at a nearby facility, we couldn't hear the tune over the dull roar typical of such environments, and so had to manually probe for the system's state.

The GB-1001 does not provide obvious light indicators or a small LCD screen on the unit. No on-off switch is provided, as the designer likely intended users go through the proper shutdown procedure. We experienced an unplanned UPS failure and, upon power restoration the box recovered properly once it performed an automated rebuild of its RAID system that lasted several hours. After users trigger a shutdown, through the web administration system provided, they will need to be careful not to cut power too early, otherwise a RAID rebuild wait will result.

We also found other polish points to be lacking. Within the administration system, confirmations of configuration changes didn't appear in a logical place, form fields were slightly misaligned or oddly arranged, warning messages did not appear reliably, help information was too concise or lacked good examples, result output previews didn't always work and, in some cases, error messages lacked detail.

There were some bright spots, including clear installation documentation, colour-coded cables and a built-in DHCP server that allowed us to plug in a laptop and quickly configure the network settings.

Using a web-based GUI, the first step after installation is likely be defining a search index by indicating starting URLs, URL patterns and file types that should be recorded and discarded by the crawler.

According to Google, the crawler is capable of indexing 220 types of content. In our test we found no limitation in the crawler, and also found that the device tended to discover files we were not aware of in some test data sets. The user will likely want to break up the indexed documents into different collections based upon a URL pattern. The GB-1001 allows for an unlimited number of collections.

The crawler is quite adept at dealing with secured content. It handles Secure-HTTP connections and can negotiate basic authentication, NT LAN Manager authentication, and custom cookie and form-based access. The GB-1001 can crawl content from databases, including Oracle, SQL Server, mySQL, IBM DB2 and Sybase. If the user happens upon a data type the crawler cannot access, he or she can feed it directly to the device in an XML format.

Google does limit its appliances by document count, starting with 500,000 for the base unit (for smaller deployments, use the Google Mini — see story below). The user can, of course, increase licence limits, and associated hardware, to build-out a search infrastructure that could support millions of documents. In sizing the appliance be aware that if you plan on doing direct database indexing, Google will count each record as a document, so you might chew up a licence very quickly.

One aspect of the crawl process that we especially liked was the diagnostics facility, which was not only useful in understanding what the crawler was doing, but it also clearly helped us isolate such indexing problems as broken links, server issues and access-denied problems.

The GB-1001 provides a great deal of flexibility for the search page and result listings. Some administrators may be happy to use the page layout helper and modify the logo and basic aspects of the search page. However, most folks will probably want to modify the results to fully integrate them into the look and feel of the site. If you are familiar with XML Stylesheet Language Transformation, you can modify a near 3,000-line template that controls just about every aspect of the search form and result. If this doesn't suit, just use the raw XML returned from the appliance and do whatever you like, including putting it into another system.

Google's approach is to implement searches in an easy-to-use black box fashion, which could place constraints on a private search. You turn the appliance loose and it ranks based upon the Google algorithm. We were pleased that the accuracy of the test search lived up to what we see in everyday use of the Google. It easily found buried test phrases and correctly identified primary documents.

The GB-1001 provides features to massage the results. Unfortunately, some of them a bit limited or not well documented. The most valuable feature for search customisation is the KeyMatch configuration, which allows the user to define keywords, phrases and exact queries. The latter returns up to three matches, or five if you dig to find out about a setting change. The "Synonym" setting provides a useful way to suggest alternate search terms triggered by the original query. It is also possible to create filters against the domain in which a document is found, the language a document is written in, the file type it was created in or the meta tag it was given. The meta tag facility, if carefully applied, can provide a rich system for slicing indexed data in a variety of ways: by author, owner or rating, for example.

Various front-end and search-result features we tested took an unpredictable length of time to register our changes. If you add synonyms, keyword matches or a variety of other template changes, you typically can't see the result right away. You must be patient if you like to tinker.

In terms of performance, the GB-1001 appliances start at around 300 queries per minute (versus the Mini's rate of 60 queries per minute — see below). Our test verified that the Google Search Appliance unit was roughly four times faster than the lower-end unit. We were able to increase response time past one second per query under heavy load (well beyond 300 queries per minute), but we did not see any drop-off that would suggest the device did not perform to specification.

The GB-1001 provides monitoring facilities (including graphs on queries per second), an event log detailing basic system activity, and a device health report. The device is also SNMP-capable and provides MIB for basic monitoring of device health, crawler status, index size and query rates.

The most valuable reports we found outlined the number of searches over time and the common keywords and queries. Many corporate webmasters pay a surprising lack of attention to search activity, despite the great insight it provides into customer intention, so we are glad to see Google making this data easily available to its appliance customers. For those looking for more than these standard reports, the GB-1001 offers search logs in a common log format, useful for crunching in web log analysis or standard reporting systems. We would add in this category some indication of user click rates on various search terms, although with a little bit of work you could collect that data.

Security on the GB-1001 is a mixed bag. Google states emphatically that the box is secured because it comes with a built-in firewall, allowing access on permitted ports only. Beyond this lone measure, we found a disturbing lack of security.

The security set-up for the GB-1001's administration environment is weak. It's strange that the device allows you to create users and delegate administrative authority, but the web-based administration system does not provide any enforcement on password strength or length, even allowing single-letter passwords. Couple this with the fact that the appliance does not limit password attempts, which means that it's vulnerable to brute-force password-guessing tools. The GB-1001 will note logon failures in its event log, but provides little to work with other than IP address and full-event logging. There are no SSL requirements for accessing the administrative back-end and no restrictions to IP range or domain.

The GB-1001 has its rough edges, notably in hardware design, administration and security. However, the overall ease of use and the power of the Google search algorithm dwarfs the limitations of the appliance. For companies looking for a powerful, yet easy to administer search facility, the Google Search Appliance gets a fairly high ranking.

Google Mini — a cheap GSA?

To many, it would seem the Google Mini appliance, at US$3,000 (NZ$4,258), offers many of the same features as the Google Search Appliance, but at about a tenth of the price. But, be warned: the Mini is limited in some pretty important areas.

The first difference lies in what the Mini can index. The device is limited to only 100,000 documents. In terms of crawling, the Mini uses the same Google algorithm as its big brother and can index the same 220 file types. However, the Mini is not able to negotiate nearly as many authentication schemes as the Google Search Appliance. It is limited to Basic Authentication and NT LAN Manager, so it might not be adequate for some intranet duties. It has no database integration or feed support.

In addition, the Mini does not support numerous collections. Instead, it supports sub-collections, which do not easily provide for different result pages. Results with sub-collections are calculated differently from collections under the Google Search Appliance. However, during testing we didn't find the results to be tremendously different, although this might vary depending on the document used and the degree of overlap of terms and content.

The Mini is not terribly fast or fault-tolerant. The device handles roughly one query per second, and you don't get a fault-tolerant RAID array. The Mini's snazzy blue paint job doesn't hide what appears to be a stock 1U clone, complete with a CD-ROM drive blocked by its faceplate. Like its big brother, we see hardware polish problems, such as the lack of a visible light on the front of the device to indicate the Mini is on.

Finally, the Mini also lacks most of the administration features of its more powerful sibling, including SNMP monitoring, and health and performance logging.

However, for all its differences, the Mini is similar to the Google Search Appliance. The device provides much the same degree of customisation, including KeyMatch, custom output formats, Synonyms and search result reporting.

Given its limitations, the Mini is a likely candidate for public sites and basic intranets. At this price, you can hardly buy a rack-mounted server let alone get a nice turnkey search facility. Even as a proof-of-concept project, the Mini might stand an evaluation by organisations looking to experiment with improved search, and it provides a great introduction to the technology contained in the more powerful Google Search Appliance.

Powell is the founder of PINT, a San Diego web development and consulting firm. He is also the author of numerous books on web development, including JavaScript: The Complete Reference, and Web Design: The Complete Reference. He can be reached at tpowell@pint.com

Join the newsletter!

Error: Please check your email address.

Tags Reviews ID

More about GoogleGSA GroupIBM AustraliaLANOracleSNMPSybase Australia

Show Comments
[]