A PhD student in computer science at the University of Waikato is working on ways to automatically "wikify" documents, by detecting the topics in the document and creating links to the appropriate Wikipedia articles.
David Milne, and his supervisor Professor Ian Witten, recently won an award for their paper at the Computers in Knowledge Management conference, held in California's Napa Valley. Close to 800 papers were submitted to the conference, which is organised annually by the Association of Computing Machinery (ACM).
Milne has used existing Wikipedia articles to "train" his software to make the same decisions as humans regarding what is important in any document, he says.
"Every single Wikipedia article is an example of how to cross-reference a document with Wikipedia,” he says. “That means we have millions of examples for how to do the job."
Automatic systems face several hurdles, he says. They have to resolve ambiguity; to decide, for example, if the word "kiwi" refers to the bird, the fruit or to New Zealanders. They must also allow for polysemy — where there are different terms with similar meanings, such as "the internet", "'the web" and "cyberspace". Finally, they have to decide which topics are important enough to link to — the ones that readers are likely to investigate further — and which should be ignored.
A demo of the software is available here.
Milne hopes his software will simplify the organisation and retrieval of documents.
"Normally search engines and other systems operate using words; with this software we're dealing with concepts," he says. "It would be good to be able to throw any document into a digital library and know that it will organise itself."
Join the Computerworld LinkedIn Group. This group is open to IT Leaders, MIS & IT Managers, Network & Infrastructure Managers who share insights, discuss challenges & wins and keep abreast of cutting edge technologies.