Rebooting HTML for the semantic web

Berners-Lee's latest project may be a tough one, says Neil McAllister

Making standards is hard work,” writes Tim Berners-Lee in a recent blog post. And he should know. The creator of the world wide web, Berners-Lee is responsible for developing and popularising some of the most significant open standards in computing history.

His current project, the semantic web, is an attempt to carry web standards to a level beyond anything we’ve known so far. Its goal is to transform today’s web into a semi-intelligent network of information resources, where machines will be able to analyse and understand the meaning of information, similar to the way humans do today. If successful, it will absolutely revolutionise information retrieval. And the key to its success is the rigorous application of standards.

But there’s a catch: it’s hard enough to get people to comply with the standards we have already.

HTML, in particular, has a troubled past. The headaches began in the bad old days of the browser wars, when competing browser makers would implement the specifications in dubious ways and add non-standard features to their software. Confounded by conflicting results, developers got into the habit of writing code that worked, no matter what sins against the standards they would have to commit.

Some years ago, the engineers at the W3C (world wide web consortium) reasoned that the best way to get back on track would be to start with web developers. Get developers to write HTML that adheres to the published standards, rather than relying on the behaviour of any one browser, and end-users would naturally gravitate toward browsers that did a better job of implementing the standards. In turn, this would create incentives for browser vendors to make standards compliance a top priority.

It was a logical enough plan. The folly lay in its execution. Because the way the W3C chose to reach developers was with — you guessed it — another standard.

Enter XHTML. A successor to the original browser markup language, XHTML combined the vocabulary of HTML with the syntax of XML.

XHTML actually has a lot going for it. Because of its strict syntax, it encourages more rigorous coding. It is also easy to validate using automated tools, so that web developers can know when they’ve made errors, as programmers do. What’s more, it encourages the use of CSS (cascading style sheets), which helps to keep actual web content separate from the details of how it is presented on-screen.

The problem? “The attempt to get the world to switch to XML ... all at once,” writes Berners-Lee, “didn’t work”.

Berners-Lee blames the browsers for not requiring well-formed code, but his colleague Wium Lie, CTO of Opera Software and inventor of CSS, believes there’s more to it than that. Lie suspects that XHTML is unpopular because it tends to “punish the good guys”, by being too rigid and unforgiving in its syntax.

So it’s back to the drawing board. In his blog post, Berners-Lee announced a brand-new working group within the W3C that would once again try to address the challenges and shortcomings of HTML, while working on the XHTML standards in parallel.

It’s a good step. But it does make me wonder about the future of Berners-Lee’s vision of the semantic web. The lesson learned from XHTML is that when it comes to standards, just because you build it doesn’t mean they will come.

And yet, XHTML is only the beginning of the standards compliance that the semantic web would require. If the semantic web is to succeed it will have to find ways to accommodate human nature, and not just good engineering.

Join the newsletter!

Error: Please check your email address.

Tags semantic webtechnologyhtml

More about inventorOpera SoftwareW3C

Show Comments

Market Place

[]