As organisations move into web services, they’ll need to store all the XML documents that users are creating in web interactions with business partners and customers.
As luck would have it, XML databases are on the way.
It’s almost a given that any new technology goes through four phases: vapourware, bleeding-edge implementation (with all the associated bugs), mature acceptance and then invisibility — either because it’s completely taken for granted, or it’s died.
XML as a document format/self-describing mark-up language has been around for a while now and is heading into the mature acceptance stage, although sceptics might argue that it’s not quite there yet, as in many cases XML implementations are still more talked about than actually, well, implemented. XML is playing a big part in the developing web services market, whereby applications can become readily accessible services over the internet.
Set to play a major role in the spread of the standard are XML databases, of which possibly the best known commercial products are Software AG’s Tamino or iPedo’s XML 3.0 product (though there are others, and open-source alternatives). They claim to store XML files natively — that is, not stuffing them into relational database formats.
These early forays aren’t going to be left in peace for long, however, as the big guns of IBM, Oracle and Microsoft are heading that way — at least as far as XML-enabled relational databases.
But whether they take the direct route of storing files natively or opt for a more hybrid approach — as happened with object-oriented databases — one thing is abundantly clear: customers want XML-capable databases. In 2000 the market grew worldwide from $US10 million to $US77 million, according to an IDC report. It is forecast to hit $700 million by 2004. Meta Group estimates that by 2003 about 65% of corporate data will be stored in an XML format. No wonder then that the trio of database giants want a slice of the XML database market. (Though let’s keep it in perspective: the database market as a whole last year generated $US8.8 billion in revenue, says Gartner.)
At the most basic level XML databases are simply databases capable of storing XML data or documents. Given that the amount of XML data produced is about to skyrocket, will they replace traditional relational databases and/or object-oriented databases? The short answer to this is no in both cases — they’re complementary technologies, says IDC analyst Anthony Picardi — the former are better suited for storing and processing XML documents, the latter for numbers and text. And because XML databases are not a magical panacea for all data storage needs — there are still plenty of other types of data out there — software vendors have taken different approaches to incorporating the data format.
If XML databases can be roughly divided into two categories, native XML databases and relational or object-oriented databases with XML extensions, neither is the final word in handling XML. In a native XML database the smallest unit of storage is usually a complete XML document — the equivalent of a row in a relational database. Relational databases with XML capabilities approach the issue differently, building XML capabilities on top of existing relational or object-oriented databases — the approach being taken by IBM, Oracle and Microsoft.
To date, relational databases with XML capabilities have usually relied on mapping different parts of an XML document to fields within a relational database, but this can be clumsy and inefficient. Mapping XML data to relational fields usually results in a large number of columns with null values — which wastes space — or a large number of tables, which is not efficient. Plus any failures to accurately map the XML structure correctly can potentially yield garbage documents.
Native XML databases, in theory, offer better support where XML documents are irregular or semi-regular. However, native XML databases do not currently support the SQL equivalent of “Update” or “Delete”, which makes maintenance a little cumbersome.
The XQuery language is coming close to being a recommended XML standard without these functions — although discussion groups on the W3C website have included several recent impassioned pleas to hold off making XQuery a standard until this functionality has been included.
Relational databases with XML support are better suited where XML data is fine-grained, regular and consistent in structure and content. Bear in mind that regardless of whether the requirement is for data-centric or document-centric XML storage, the big names are not the only place to evaluate potential solutions — there’s also a number of open-source solutions available.
However, the next generation of products promise the best of both worlds, with direct support for storing complete XML documents, and/or breaking the XML data down to individual components mapped to fields. This approach hopefully combines the strengths of relational databases (indices, SQL support and so on) with better support for irregular XML document types.
Oracle and IBM, as might be expected, have been making a fair bit of noise about their respective products offerings — along with a fair bit of high-brow muck-slinging about the relative technical capabilities of their products.
Both companies are set to issue new versions of their relational databases in the near future, with Oracle planning a late May release and IBM slating the next iteration of DB2 for sometime in the middle of the year.
Currently, Oracle’s database has full XML parsers, an XML schema processor and a SQL utility for managing XML data. But unified SQL queries are not possible with current iterations of the database, says Robert Shimp, a database product marketing executive at Oracle in the US.
Oracle has had basic support for XML since early 1999 but Oracle9i release 2 will be a “fully unified XML and relational database”, Shimp says. This means users will be able to store all the traditional transactional processing data as well as full XML documents.
“What you [will be able to] do is with a single SQL query access both the XML and relational data,” Shimp says. For example, a technical support person might field a call about a problem with a specific product. The support person might want to access information about the product as well as credit memos and internal product documents.
“You can look up that information simultaneously with a single query, whereas in the past you would have had to search different databases to find this information,” Shimp says.
Oracle’s XML work is based on the W3C XML schema data model, to provide its database customers with a standard way to function with applications, Shimp says.
A three-prong tack
As a part of its strategy for entering what it calls the next wave of data management, IBM is taking a three-faced approach and working to offer a database system that is capable of managing objects, relational data and XML documents.
Big Blue plans to extend the core database engine currently in DB2 to include support for XML, with technologies such as new index structures that relate to XML, according to Nelson Mattos, a US-based director of IBM’s information integration group.
Although IBM has supported both objects and relational data in DB2 for some time now, the addition of XML will enhance that support. “XML gives you a very flexible model to manage all the meta data around objects,” Mattos says.
Mattos says the idea is to make the core DB2 look like a relational database engine with XML capabilities from the perspective of applications looking for relational data, while making it look like an XML database with relational capabilities or an object database with relational capabilities from the perspectives of applications looking for those data types.
By supporting XML, relational and object data, IBM’s database will be able to interact with XML documents; structured information, such as rows and columns; and data written in object-oriented programming languages, namely Java and C++.
To that end, support for the W3C’s XML Query standard means that an XML application only needs to know XML Query to get at data residing in DB2.
Arming DB2 with these three faces will increase scalability and performance, while making DB2 better equipped as the anchor of IBM’s web services stack, including the capability to not only deliver data to web services, but also to consume web services.
Mattos says although this technology won’t emerge in the forthcoming version slated for mid-year, later this year IBM will make an early version available.
Brett MacIntyre, vice-president of the content and information integration software group at IBM US, says that content management, including the combination of structured and unstructured data, is at the core of the next wave of data management.
“For us, it’s about how we can put more room between us and Oracle and Microsoft,” MacIntyre says. While Oracle is working to store everything in the database, MacIntyre says, IBM is taking a more distributed approach. “Not everything can fit inside the database,” he said.
Microsoft, for its part, has been adding to its SQL Server 2000 database support for XML standards as they emerge, rather than issuing new versions of the database. The software giant’s plans for data management also expand beyond the relational database and include a whole host of Microsoft products.
“We have a vision of access to information no matter where it lives,” says Tom Rizzo, group product manager for SQL Server at Microsoft US. Rizzo says the evolutionary approach is what customers are asking for. “People don’t care whether data is structured, unstructured or semi-structured. People just want to be able to access and manipulate their data.”
Casement is a business manager for IDG Communications and a longtime database dabbler.