The University of Waikato can claim double the kudos from masters student Quan Sun’s win at the University of California San Diego student datamining competition.
Sun is not only a student at the university, but he used open source software developed there to claim his win in the elite “hard” category for graduate students.
In fact, says Sun, at least half the competitors in the competition used the software, called Weka, which he describes as the “Microsoft Word of data mining”.
See: Computerworld's open source special feature
Sun, a former web developer, won ahead of more than 300 entries from universities in North America, Europe, Asia and Australasia.
The competition was in four sections – with “easy” and “hard” options for both undergraduate and graduate students. All competitors were set the task of predicting anomalies in e-commerce transaction data.
Sun says figuring out the answer took him about a month, working on the data for two to four hours a day and brainstorming ideas with his wife, who’s a PhD student in engineering at Waikato.
In 2005, Weka software won the data mining and knowledge discovery service award from the Association for Computing Machinery’s special interest group on knowledge discovery and data mining. The software has been downloaded by more than one and a half million users worldwide.
Sun, also did his first degree at Waikato. He says his previous work as a web developer has given him good attention to detail, which is essential for datamining. He’s planning to continue on to doctoral study when he completes his masters later this year.
In 2006, Waikato Univerity’s commercialisation company, WaikatoLink, inked a deal with open source business intelligence software developer Pentaho, giving that company an exclusive licence to sell Weka’s analytical software as part of its business intelligence product. Contrary to some reports out of the US, the core developers remained employed by and based at Waikato University.
Pentaho also took control of Weka’s SourceForge software distribution site and bought rights to use the Weka brand under the deal.
“We will be working closely with them to integrate Weka with the Pentaho platform,” senior lecturer Mark Hall said at the time, adding that the university had reserved its rights to licence Weka for applications other than business intelligence.
Hall said the deal would enhance development of Weka as Pentaho would take over much of the burden for servicing users and allow the Waikato team to concentrate on development.
Hall now works for Pentaho out of the Waikato, says Sun’s supervisor, Dr Eibe Frank. Most of the core development of Weka is still at the university, he says, but some contributions come from outside.
More recent developments include more machine learning algorithms, better documentation and an interface to the R language, a statistical language that began its life at the University of Auckland.
R was developed at Auckland by Ross Ihaka and Robert Gentleman. Ihaka won a Pickering Medal from the Royal Society of New Zealand in 2008 for his work. When free software leader Richard Stallman, the developer the GPL licence, visited New Zealand last year, Ihaka was one of the people he planned a meeting with.
Weka produces a large number of spin-off projects. One of these involves developing algorithms to mine and analyse very large data sets, possibly even infinite data sets, as they stream on a network such as the internet.
Development of Weka began in 1993. It was the university’s first government-funded computer science research project, at the time under Professor Ian Witten. The aim was to produce a software workbench that would combine techniques from machine learning (a subfield of artificial intelligence) and data mining into one framework.
At the time, data mining software for research was in all sorts of different languages and formats. After development of the workbench and algorithms, Waikato’s research efforts moved to concentrate on applying machine learning to New Zealand data, such as large agricultural data sets.
In 1997 the workbench was redeveloped in Java to run on any hardware platform.