As a researcher at the University of California at Berkeley in the early 1970s, Michael Stonebraker co-created the Ingres and Postgres technology that underlies many leading relational databases today, including Ingres itself, Microsoft’s SQL Server, Sybase’s Adaptive Server Enterprise and IBM’s Informix.
But Stonebraker now argues that relational databases are “long in the tooth” and “should be considered legacy technology”.
In a recent entry in a new blog called The Database Column, Stonebraker also argues that today’s relational databases lag badly in performance behind a new wave of databases that flip database tables 90 degrees.
Column-oriented databases, such as the one built by Stonebraker’s latest start-up, Vertica Systems, store data vertically in table columns rather than in successive rows.
By putting similar data together, column-oriented databases minimise the time to read the disk, which can add up when executing large-scale calculations such as those typically done in a data warehouse.
Column databases “will take over the warehouse market over time, completely displacing row stores,” Stonebraker writes in the blog. “Since many warehouse users are in considerable pain (can’t load in the available load window, can’t support ad-hoc queries, can’t get better performance without a “fork-lift” upgrade), I expect this transition to column stores will occur fairly quickly,” he says.
Column-oriented database systems are not new. Sybase has successfully sold its column-based IQ database for years as a high-performance business intelligence solution.
BigTable, the database that Google built to handle a number of its applications, stores data in columns.
However, column-oriented databases remain a niche offering. In comparison, the leading players in the mainstream database market, which is estimated at US$15 billion (NZ$21 billion) annually worldwide, all rely on systems using row-based tables.
Organising data by rows does have its advantages. Writing data to disk in row format is faster than doing so by columns. That is key for high-transaction database applications where data is constantly being read and written to the database, though markedly less important for data warehouses, where data is typically written just once and accessed many times after that.
Stonebraker, who is a co-founder and chief technology officer of Vertica, claims that the start-up has other performance-boosting features, such as very aggressive data compression and a query executor that “runs against compressed data”.
As a result, “Vertica beats all row stores on the planet — typically by a factor of 50,” he says. “The only engines that come closer are other column stores, which Vertica typically beats by around a factor of 10.”
Stonebraker says other firms similar to Vertica can do just as well.
“In every major application area I can think of, it is possible to build a SQL database management system engine with vertical market-specific internals that outperforms the ‘one size fits all’ engines by a factor of 50 or so,” he says.