When the Minnesota Department of Agriculture needed to marry data from the state’s financial systems with its own laboratory management system for billing purposes, Kurt Wood considered using a data warehouse but baulked at the manpower required to create and maintain multiple copies of the data sets. He found an alternative in data federation.
Data federation, or enterprise information integration, is the ability to access multiple data sources with a single query. It works like a virtual data warehouse: the data being queried remains in its place, rather than being copied to a central repository, so companies don’t have to keep duplicate versions in sync. Under the covers, the software takes on tasks such as reconciling data formats, maintaining data integrity and building an aggregate view of information.
“We’re able to get real-time financial data from the state over the WAN, and we can manipulate it here,” says Wood, data and application manager at the Department of Agriculture. The agency doesn’t have to worry about incompatible formats, because IBM’s data federation software handles the translations required to link data from DB2 and Oracle databases; nor does it have to fuss with extraction, transformation and load (ETL) tools.
“We could have gone with the ETL method, or passing text files, but we’ve got just a small staff of programmers here,” Wood says. “When you do something like that, the data being transferred over can be old or someone forgets to do it, or the transfer doesn’t go right and needs to be redone. [So] we figured, why not just tap directly into the system?”
While it’s not a new concept, data federation is gaining attention as enterprises look to deploy service-oriented architecture (SOA) environments.
The demands of SOA are a good fit for data federation, says Ted Friedman, a research vice president at Gartner. “Just as you have application services and business services in an SOA, we believe you’ll also have data services that provide for common and consistent access to data, movement of data, transformation of data and so forth,” Friedman says.
The desire for more real-time reporting is also driving interest. With data federation technology, companies can tap into the most up-to-date information sources instead of querying a data warehouse fed by batch processes. A law enforcement agency could consolidate intelligence data from federal, state and local sources in a single view, for example, or a financial services firm could query a handful of operational systems to determine a customer’s current account status and pair it with historical information from a data warehouse.
A federated view of data is quicker to build and easier to modify than a data warehouse, advocates say. Plus, data federation allows for the integration of data from non-relational data sources, such as emails and text files.
A host of vendors offer data federation products, including integration veterans such as BEA Systems, IBM, Oracle and Sybase, as well as smaller specialists such as Attunity, Certive, Composite Software, MetaMatrix and Sypher-link. The products emerged as companies struggled to address a common problem in today’s enterprise environments: a proliferation of independent data sources, including application-specific repositories.
“Data is stored in databases only because some organisational manager got some money and built a database to achieve his mission,” says Moses Kamai, knowledge management practice leader in the national security division at Battelle. “These silos of information, specific to certain kinds of missions, over the years, have cropped up everywhere.”
Battelle, an R&D organisation based in Columbus, Ohio, develops new technologies and manages laboratories for industry and government clients. It’s a research-centric business, so Kamai knows first-hand the challenges of data integration and ownership.
“In the past we always had to build centralised databases. I could copy the data that someone else already owned, but that introduced the problem of desynchronisation. The other choice was to take total control of the data,” he says. In either case, melding disparate data sources to build a centralised repository was typically a multi-year effort.
These days Kamai finds a federated approach to data integration is much faster. In addition, because data stays in its place, there are fewer cultural and political hurdles to cross. A data owner can decide to make only selected parts of a data source open to a distributed query and keep other portions private. “That makes it palatable,” Kamai says.
Battelle uses software from Sypherlink to automate the critical data discovery and mapping process, determining the relationships and dependencies that exist among multiple data sources. Sypherlink’s technology “does nothing to change the principles of data management,” Kamai says.
“What it does change is the timeline, and it overcomes the social dynamics around data management and data ownership.”
Everything in its place
Data federation isn’t a cure-all, nor is the approach without challenges.
While it’s a fit for some reporting needs, most enterprises won’t depend solely on data federation. It may be more appropriate for an enterprise to run a query offline, using ETL tools and a data warehouse, if the data being searched doesn’t change frequently, for example. Data replication tools might be a better fit if a company needs to copy transactional data for a new application roll-out. “Federation alone cannot solve the full breadth of data integration needs,” Friedman says.
Data federation technologies will co-exist with data warehouses, says Richard Hedges, programme director of IBM’s information integration products. “Companies are still going to consolidate data, build warehouses and have marts. But they’re also going to want more flexibility, real-time data access and the ability to get at data sources that are not relational,” Hedges says.
Deployed selectively, data federation technologies can round out a portfolio of data management tools and complement such physical data integration options as the data warehouse. Although the means differ, the end-goal of all the tools is the same: better business intelligence.