Column: Staple of GIS world gains new significance

Metadata standards are the key to improving the usefulness of Internet search engines.

Metadata is information about information. More specifically, metadata is the term used to define descriptors of electronic documents, databases, spreadsheets, digital map layers and images. These descriptors can include key words, content, date of last update, who produced the original product, who updated it, a summary, or other pertinent information. Metadata, especially accurate and standardised metadata, is extremely important as companies begin the path to intranet implementation. Without a well-thought-out metadata programme, an intranet can become a data junkyard.

This point was well developed by Dr Lougie Anderson, director of Internet technologies engineering for Sequent Computer Systems, at a presentation in Auckland. Anderson noted that while it is all well and good to have access to corporate data over an intranet--or any type of network, for that matter--without a means to search and access the information, the system is essentially a waste of time and effort.

Metadata has long been a staple of geographic information system (GIS) design where data from different sources are combined to create new information. Only recently, with the maturation of intranet technology that allows the integration of disparate digital data resources, has the issue of metadata spread outside the GIS community.

Anderson pointed out that when Sequent started to build its intranet (even before the term was invented), it had no idea of its information resources, there was no method for tracking that unknown knowledge and even if it knew what it had, nobody was quite sure if it was useful or not. In short, without metadata, it was lost.

The solution, developed over two years, is a system called the Sequent Corporate Electronic Library (SCEL) which is now used by more than 1500 staff in 53 offices worldwide. In fact, the system has been so successful internally that Sequent has documented it, improved it and made it into a product called Knowledge Depot.

Knowledge Depot stores information about corporate data resources (metadata) and is based on HTML standards. Users access the system through a front-end Web browser, query the metadata database and receive a summary of the information resource, whether it is a Word document, Excel spreadsheet, image, or marketing brochure. The key, of course, is accurate discriptions--metadata.

Knowledge Depot is not far removed from Dunedin City's intranet, called KnowledgeBase. Unlike Sequent, Dunedin had a good handle on its database resources (probably due to IS manager Mike Harte's experiences with GIS) and was able to build a metadata index for its data holdings quickly and relatively easily. According to Harte, shifting the whole model of information value was a major undertaking. "In the past," says Harte, "information was seen as something to be kept as a vital asset. With our new approach, the dissemination of knowledge is seen as the goal, not the storage of that knowledge."

Oracle is also entering the metadata market with its ConText Option Linguistic Service for Oracle Universal Server. ConText can take a full article and reduce it to key words, a one-sentence summary, or a paragraph, regardless of the length of the original. For a data-intensive organisation (Oracle points out that the paper documentation for the Boeing 747 weighs more than the airplane itself), ConText could be a godsend.

This all ties into the World Wide Web as well. With more and more pages coming online every minute, indexing and searching becomes more problematic. When Web page authors begin to adopt standards (that'll be the day) for metadata, search engines will be able to do a better job and the usefulness of the Net will be enhanced. But without clear guidelines on metadata standards, this is probably just a pipedream. In any event, for a corporate intranet, metadata standards can and should be adopted.

