David DeWitt's journey to becoming one of the world's leading academic experts on databases started off almost by accident.
"I had taken one database class in graduate school," DeWitt recalled. "That was enough that when I showed up as a new faculty member at the University of Wisconsin-Madison (in the mid-1970s), the chairman said, 'You're the new database guy.'"
DeWitt took the ball and ran with it. After three decades in the field, his resume includes the co-invention of three parallel databases, including one that was sold to NCR, publication of more than 100 technical papers and numerous awards and honours from his database peers.
DeWitt retired from the University of Wisconsin last year. But he's already returned, this time as a Microsoft Technical Fellow and head of a new database research centre located on the Madison campus and funded primarily by his new employer.
DeWitt talked about the centre during a keynote speech at the recent Professional Association for SQL Server's annual conference.
Last month, Microsoft demonstrated a feature that will let DBAs manage pools of hundreds of SQL Server databases at a time.
For DeWitt, the lab is an opportunity to do the same sort of research he has done for the past 32 years, but also see those results make their way into products, namely SQL Server, in a much shorter time frame.
It also gives him the financial backing that computer science academics, especially those in the database field, have lost in recent years.
"Researching query optimisation on parallel systems — this is not something you can go to NSF or DARPA and get money for anymore," DeWitt says. But he added that cutting-edge database research was already shifting away from academia to industry.
"In the old days, you could take a small group of grad students and build a state-of-the-art prototype of a database system," he said. "Systems are so complex these days, it's hard to make headway with only five grad students."
Also, "the smartest students from abroad don't come for their [computer science] PhDs anymore, they go and join investment banks," DeWitt continued. "So industry has really taken over a leadership role. It's one reason I left academia."
DeWitt would also love to taste some of the "success after success" of a good friend of his, database industry legend Michael Stonebraker.
A professor at both UC Berkeley and MIT, Stonebraker is generally credited with helping invent two seminal databases, Ingres and Postgres. The former underlies popular products such as Microsoft's SQL Server, Sybase's Adaptive Server Enterprise, Ingres' eponymous product, IBM's Informix and others, while the latter is an emerging open-source database.
Just as important, Stonebraker started companies that helped bring to market those databases, along with other lesser-known ones, such as his current venture, column-based data warehousing vendor Vertica Systems.
"My goal is to short-circuit the process from research to product line," says DeWitt, who notes that he works directly for the Data and Storage Division at Microsoft that produces SQL Server, not Microsoft Research. "We absolutely want to be more market-responsive and nimble."
The lab at Madison University will be named after Microsoft database researcher Jim Gray, who was lost at sea last year. Gray not only helped build products such as SQL Server, he cooperated with many in academia such as DeWitt, who considered Gray a close friend and mentor.
The Microsoft Jim Gray Systems Lab has three researchers today. "It will top out at between 10-15 people," DeWitt said. In general, research produced by the lab will be owned by the university, though Microsoft gets non-exclusive royalty-free access to the patents. However, research by grad students that doesn't draw upon Microsoft confidential materials will be owned by the students themselves.
DeWitt plans to initially focus on query optimisation.
"It's one area in which there's been very little progress" in the past three decades, he says. He plans to test an approach that "does a little optimisation, a little execution, and so forth."
If successful, this could make its way into upcoming Microsoft data warehousing appliances code-named Madison. Those appliances are due in the first half of 2010.
DeWitt's other big interest is in very large database clusters. As such, he has strong opinions about the MapReduce parallel data store used by Google to index the web.
In blog postings this spring DeWitt co-wrote with Stonebraker, the two called MapReduce a "sub-optimal ... not novel" type of database that lacks the features modern DBAs and developers take for granted and was unworthy of the hype it had received.
The blogs received heavy criticism, with most arguing that MapReduce isn't comparable to a standard database because it is optimised for a single task — quickly sifting through huge amounts of messy, unstructured data — which even the largest databases today are poor at doing.
As one commentator snarkily wrote: "I tried to have MapReduce babysit my kids, and I came back half an hour later to find that it was just sitting there crunching data, and wasn't watching them at all. This thing can't do anything at all... Also, compared to a standard hammer, this MapReduce thing is really crappy at pounding nails into things."
DeWitt is thick-skinned. He claims Oracle CEO Larry Ellison tried to have him fired from his university post in the 1980s after database performance benchmarks created by DeWitt showed Oracle lagging in key areas. "I don't think he quite understood the concept of tenure," he joked. So in response, DeWitt and Stonebraker blogged that "Just to let you know, we don't hold a personal grudge against MapReduce. MapReduce didn't kill our dog, steal our car, or try and date our daughters."
DeWitt concedes today that MapReduce "does scale pretty well". He hails its ability to continue queries without interruption if a particular server fails, which most clustered databases cannot do.
But he stands by his argument, which is that true relational databases "give you a lot more leverage and good features." And DeWitt says he is preparing to release research soon to back that up.