It's all just semantics really

Words are funny things. Having carefully made sure I used 'criteria' as a plural it took a learned and astute reader to point out I'd used it in the wrong context. I won't tell you his name because I don't want to encourage this kind of behaviour in my readers.

Words are funny things. Having carefully made sure I used "criteria" as a plural it took a learned and astute reader to point out I’d used it in the wrong context. I won’t tell you his name because I don’t want to encourage this kind of behaviour in my readers.

But it got me thinking about words and meanings and understanding and the problems I have online with search engines. Former colleague Aimee McClinchy (now swanning around in Europe on her OE) used to mock my expressions of delight whenever Google turned up a positive return, which was frequently. She’s too young to remember the bad old days of “your search has returned 1,225,000 hits” from Alta Vista or Lycos, none of which matched what you wanted.

I generally use two or three different phrases when testing a search engine. My name (ego surfing is what I live for), the All Blacks (with or without brackets, speech marks or capitals) and whatever else takes my fancy on the day. Most search engines demand a fairly broad starting point that you then narrow down, but many of them still don’t allow searches within results, which kind of defeats the purpose really. I look at a lot of new sites from New Zealand and around the world and most times it’s quicker to find a particular story I’m looking for using Google than it is to use the site’s engine.

No sooner had I finished replying to the above mentioned pedant than I got this month’s issue of Scientific American in the mail. The cover story is all about the semantic web, a future version of the WWW that has a great deal more in the way of smarts built in. If this comes to fruition, and I see no reason why it won’t, this will help searching the internet become as easy as thumbing through the index of a book.

The article is co-written by Tim Berners-Lee who is credited with inventing the web in all its glory. Far from resting on his laurels at the W3C, he’s been working on the problem of finding what you’re looking for when you search online. The article (which can be found at www.sciam.com) outlines three areas that need to be developed – XML, RDF and the idea of ontologies.

Taking the easy one first – XML, which as you all know stands for extensible mark-up language, although I’m yet to find a dictionary that has the word extensible in it. While HTML defines how a word is displayed, font, character size, colour and so on, XML is used to define what each word or phrase is – be it a phone number, name, address, shoe size or whatever.

This is the key then to producing a smarter web. Currently if I search for “photograph” the search engines won’t return photographs, they’re return websites with the word photograph in them. XML will allow a more useful search to take place, and that’s just the beginning.

RDF stands for resource description framework and if HTML is the ‘how’, XML the ‘what’ then RDF is all about the relationship between these items, such as “is the author of” or “is the sister of” and other items. So now we can not only search for Paul Brislen but we can limit it to searching for names rather than just the words, and we can search for the particular Paul Brislen who writes for Computerworld in New Zealand. If you need to find my mailing address you could search for “Paul Brislen” and “street address” and have a very good chance of coming up with the right answer.

The third part of the equation, ontology, is an extension of these two pieces and is yet to be fully developed.

“In philosophy, an ontology is a theory about the nature of existence,” says the article, which doesn’t really help much. Basically it’s a way of ensuring different terminology doesn’t stuff up your search – zip codes versus postal codes is the example used.

“The meaning of terms or XML codes used on a web page can be defined by pointers from the page to an ontology.” So an ontology scheme defining addresses will recognise when you type zip code you mean a US address and work accordingly.

The possibilities for this kind of web are endless. Agents can be developed to help you gather information that is relevant to you and your situation – I’ve hardly touched on that here but have a look at the original article for a broader view.

Brislen is a Computerworld journalist. Email Paul Brislen. Send letters for publication to Computerworld Letters.

Join the newsletter!

Error: Please check your email address.

Tags internetsearch engines

More about AltaGoogleLycosW3C

Show Comments

Market Place

[]