The story of R: a statistical tale with a twist
- 22 July, 2010 22:00
Next month the creators of R will receive the inaugrual Statistical Computing and Graphics Award from the American Statistical Association. It will be further recognition for Associate Professor Ross Ihaka who won the Pickering Medal in 2008. So what is R, how was it created, and why does it matter to so many in academia — and in business — around the world?
Head of statistics Professor Chris Triggs believes no one has achieved the kind of international recognition from the University of Auckland that Ross Ihaka and Robert Gentleman have received for R. It is quite a claim, but what makes R so important is that a very large number of new methods and developments in scienific research internationally are first implemented in R. Its commercial application is also widespread, for example R is used in programs that analyse speech patterns, supermarket purchases and human genome data. As Triggs explains:
“Let’s suppose I invent a new method of doing a particular calculation or analysing data from a particular problem. Nobody else is going to use it unless they are given the computer program that does the analysis. So as well as doing the theoretical development and validation, somebody making an innovation will write an R program or an R function to do the analysis.”
R’s development began in the early 1990s when Ross Ihaka returned to the University of Auckland after teaching at the some prestigious universities in the US – Berkeley, Yale and MIT.
He and fellow lecturer Robert Gentleman began experimenting with a statistical application they could use to teach their first-year students, which they named R. The name was a play on the first letter of both their names. It also referenced a popular statistiscal language developed in the 1980s called S and they liked the idea that it was impossible to copyright a single letter.
They began discussing their work with colleagues overseas and gradually R gathered momentum. “We developed a mailing list and people had themselves added to that and so up until about 1996 or 97 there were only 50 to 100 people on that list. So it was fairly small and that was probably appropriate because we were still working out bugs,” Ihaka says.
News of R spread by word of mouth, and realising they had a hit on their hands, Ihaka and Gentleman briefly contemplated applying for a patent before deciding that the best thing to do was to copyright it using a General Public Licence (GPL).
“We could have made it a commercial thing and five people would have used it. But given that we made it freely available, people added so much value to it. Unless we had actually made it free software, nobody would have contributed because they would be asking “well, where is my piece of the action’,” he says.
In 2000 they set up a non-profit foundation that companies utilising R in developing applications can contribute to. It helps fund the workshops and conferences that are held around the world to develop R. The list of donors named on the website includes Telecom New Zealand, AT&T Research and Google.
Professor Triggs says intrinsic to the success of R has been the open source model. “That is its real strength, that is why it is so useful. I make a development and a change, but I may only be looking at a restricted case or a small part of the problem. If I put my program out there, my function out there, then other people can pick it up and change it and extend it and make it more useful.”
Ihaka learned about the open source movement during his time at MIT. “That is really where free software came from, that is were Richard Stallman was and the free software foundation is still based in Cambridge I think. Those ideas were sort of hanging around in the air.”
When he returned home he didn’t sign an employment contract with the University of Auckland for at least four years, because the University claims all its researcher’s intellectual property. However, he came to an agreeement with the university that permits him to circulate free software. “I am able to disseminate my work which I think is a basic academic freedom.”
A couple of years ago Ihaka learned about Revolution Analytics. It is a US company that, according to its website; “is the leading commercial provider of software and support for the popular open source R statistics language. Revolution R products help make predictive analytics accessible to every type of user and budget.”
Ihaka claims the company has added enhancements to R, but is not revealing the source code as it is required to do under the GPL. “They have sort of GUI (Graphical UserInterface) development tools. Kind of front end so you can write stuff and I think they are doings some parallel type stuff but they won’t show us so we don’t know, even though they are required to by the licence,” he claims.
He would like to take legal action, but unless he gets financial backing from a corporation – he has approached a couple of Revolution Analytics’ competitors – it’s unlikely he will proceeed. “I am not the only developer who is unhappy about it, but on the other hand there is not a lot of incentive for us to pursue it. “We are not losing money because we don’t make any money off it. And its a fairly big hurdle to get that large sum of money, you need to have some monster corporate lawyer in the US go after them,” he says.
A spokesperson for the University’s deputy vice chancellor for research says it has no concerns about R being used and distributed under the terms of its present (GLP2) licence, and expects no return under those circumstances, but “we are investigating our position under circumstances where software is being used for commercial gain.”
Revolution Analytics CEO Norman Nie says the company has sought legal advice and does not believe it is in contravention of the GPL, because the company’s business model is a version of open source called Open Core. In an email to Computerworld, the company pointed out that it offers full-featured software free to the academic community and it cited a blog post by general legal counsel of the Open Source Initiative Mark Radcliffe, which supports Open Core.
There appears to be a lively discussion internationally about Open Core, a model that bases software on a core of free open-source code but charges for additional features in a conventional commercial way. Computer industry veteran Simon Phipps suggests in a post on his Computerworld UK blog that it plays on the idea of open source because businesses that acquire Open Core packages find they must pay to upgrade to an ‘enterprise version’, which is not open source.
Radcliffe counters Phipps by claiming that if Open Core is demonised it will dissuade venture capitalists from investing in open source companies. “If the Open Core model is no longer considered open source, the biggest losers will be the end users. They will lose the opportunity to benefit from that investment and that is certainly not consistent with the goals of open source.”
Meanwhile, Ihaka says he hasn’t been in contact with R’s co-creator since Robert Gentleman joined the board of Revolution Analytics in January. Gentleman was unable to respond to Computerworld’s questions due to legal reasons, but he retains strong links with the University of Auckland since moving to the US. He is employed at a biotech company Genentech and has been influential in developing Bioconductor, which uses R in computational biology.
Ihaka is moving on as well. Together with a doctorate student he is working on a new language that could go 1000 times faster than R. “That is another reason I am not all that interested in pursuing it [legal action against Revolution Analytics] because I can see R as being the past and not the future and I would rather put my energy into something new.”
Is this speed jump possible? Computerworld asked Professor Triggs. “Oh yes. At the time R was developed there weren’t the huge data sets that there are today ... so if we can keep the framework and syntax of R, but it can handle large data sets more efficiently, then there will be a huge speed up.”