Who says SALT's bad for you?

Voice recognition technology has for years been on the horizon, with its promise of a more natural way of controlling computers. Is it destined to remain just out of reach, the perennial research subject with little application in the real world?

Many of us probably already do it. Sit, that is, muttering at our PC – but with little expectation of a response. Yet voice recognition technology has for years been on the horizon, with its promise of a more natural way of controlling computers. Is it destined to remain just out of reach, the perennial research subject with little application in the real world?

Perhaps not. I’m reluctant to be more definite than that, having already done my bit for building up expectations for voice recognition. A few years ago I wrote about a new release of a leading speech recognition application, and how it could “learn” voices in five minutes; I haven’t noticed it becoming a standard PC feature in the years since.

Things are happening, though, which should eventually move voice technology out of the lab on to the desktop – in some form or other. And that’s the interesting question: exactly what form will voice systems take when they move into the mainstream? (To be honest, I’ve no doubt they will; the issue is when.) They look to be taking a couple of forms thus far, but the signs are they’ll eventually merge into one.

The voice systems we’re already a little familiar with are the ones we encounter when using the phone. Here in Auckland the biggest taxi company has one. Quite frankly, I don’t trust it. When it’s 4.30 in the morning and I’m ordering a cab to catch a 6.30 flight, I like to be told by a human that the car’s on the way. The interactive voice response (IVR) system merely asks you to hang up when you’ve told it what you want, without confirmation. I’ve never given it the opportunity to let me down, insisting on talking to operator.

In Australia, Pizza Hut has implemented something similar. When it was commissioned at the start of the year the company’s general manager, Tony Lowings, was open about his misgivings.

"Given voice recognition is in its infancy and depends on the acceptance of consumers, we are not sure how many customers will use it. But we are fairly comfortable it will grow and, in a couple of years, voice recognition will be a significant part of the consumer landscape.”

Last week I listened to a salesman from Avaya, the supplier of the Pizza Hut system, describe how it’s intended to make the fast food experience faster. Pizza Hut’s sales data told it that 14% of customers always placed the same order. A PBX that diverted their calls to an interactive voice response system would speed the process for them, and relieve the load on the call centre that was busily accepting the rest of the 15,000 orders an hour that the company copes with at peak.

The Avaya salesman also gave examples of outbound voce response systems. One was for collecting traffic fines, to replace a manual process that cost $19 for each fine paid. The automated system managed to extract the money at $1.50 a pop.

These are telephony applications, however, and a far cry from voice-driven desktop computer applications. They would appear to be a much tougher nut to crack – when did you last interrupt a colleague in conversation with their PC (as opposed to hurling abuse at it)? What reason is there to believe that will change anytime soon? Simply, SALT.

SALT stands for Speech Application Language Tags and could be the key to the melding of desktop voice applications and telephony’s IVR systems. SALT’s backers – Microsoft, Cisco, Intel and scores of others who make the the SALT Forum – intend it as the basis for a speech interface to web information. It’s an extension to existing web programming models and markup languages such as HTML, XHTML and XML. It should let developers add voice to the web using familiar tools and techniques.

The SALT group wants this to be an open, royalty-free standard supporting speech access to web content through telephones, desktop and tablet PCs and PDAs. Version 1 has been submitted to the World Wide Web Consortium (W3C) for consideration by those developing standards for voice browsers and multimodal applications (that respond to voice, touch-screen or standard GUI commands).

Inevitably, though, there’s a parallel initiative, already involving the W3C. That’s the VoiceXML standard, for telephony applications such as directory access, call routing and cell centres. Whether the two will compete or complement each other isn’t clear as SALT can also be used to create telephony applications. It’s a problem for the W3C to sort out.

Hopefully, it’s not going to turn into one of those classic standards wrangles. At stake is a host of potential applications that cross over from call centres – as in the Pizza Hut example – to the web, in industries like banking and travel, for example. Pass the SALT, please.

Doesburg is Computerworld’s editor. Send letters for publication to Computerworld Letters.

Join the newsletter!

Error: Please check your email address.

Tags voice recognition

More about AvayaCiscoIntelMicrosoftPizza HutW3CWorld Wide Web Consortium

Show Comments
[]