Netezza app speeds bioinformatics data searches, queries

FRAMINGHAM (10/03/2003) - Data storage appliance vendor Netezza Corp. has introduced the Netezza Performance Server (NPS) data warehouse for bioinformatics.

Essentially, the NPS system lets a life science company build a "biologically aware" data warehouse that integrates sequence searches and comparisons within the same analytic database that is used for discovery tasks without the need for working with multiple copies of data.

Specifically, the company has integrated BLAST and defined genomic data types (i.e., large nucleotide and protein text types) that can be directly searched by a type of SQL JOIN query that supports NCBI BLAST. The result is a system that has the capacity to store terabyte-sized genomic databases with dedicated hardware and software to process sequence analysis SQL queries of such databases.

The company claims that its appliance's unique architecture eliminates some of the performance bottlenecks experienced today in bioinformatics searches of large databases. For instance, Netezza claims the NPS can do sequence similarity checking in times that are comparable to those achieved when using supercomputing clusters - all while offering the benefits of using an appliance (i.e., it's easy to manage and offers a low cost of ownership).

Netezza claims that its approach addresses the limiting factors experienced in many bioinformatics data warehouse applications today. For example, much of the effort in research today is placed on increasing the computing power available for bioinformatics work. But often, "the real problem is the data, not computer [capacity]," says Bill Blake, Netezza's senior vice president of product development.

How it works

At the heart of Netezza's approach is an architecture that addresses the limiting factors in many bioinformatics queries. Namely, in many searches, the retrieval of data off of storage systems and the partitioning of data so that a query may be processed slow down the entire process. Additionally, query performance slows as more simultaneous queries and more complex queries are made against a database.

The NPS addresses all of these issues using what Netezza calls an "asymmetric massively parallel processing" architecture. This architecture uses features of two other common architectures -- symmetric multiprocessing (SMP) and massively parallel processing (MPP).

For example, one bottleneck in many database systems is the challenge of handling large numbers of simultaneous queries or very complex queries. To deal with this performance-limiting issue, Netezza uses an SMP-based host to compile queries in parallel while supplying the processing power to sort and aggregate large sets of queries results.

Another factor that limits performance is the time it takes to simply move data. To deal with this, the NPS uses an MPP architecture to move data onto and off of multiple nodes (within the NPS) over which a large database is distributed. This deals with input/output performance issues commonly encountered when working with terabyte-sized databases.

Being an appliance, the NPS is designed to fit into existing life science infrastructures. The idea is to load the NPS up with data and keep on using all existing front-end database and analytical applications without having to modify the applications themselves.

The way Netezza accomplishes this is by using common and standards-based application programming interfaces (APIs) that allow applications to submit queries against the data stored on the NPS in the same way queries would be done with any other database system. APIs supported include SQL, ODBC, and JDBC.

The NPS is available now. Versions of the NPS line include models that support from 4.5 TB to 81 TB of total storage capacity. Pricing starts at US$622,000.

Join the newsletter!

Error: Please check your email address.

More about BioinformaticsNetezza

Show Comments