The National Archives and Records Administration (NARA) will end development of its electronic records archive (ERA) by the end of this year, in part due to a recent government report showing massive cost overruns and mismanagement on the project.
The purpose of the ERA, planning for which began in 2001, is to preserve large volumes of electronic records independent of their original hardware and software. The system, part of which is accessible to the public, includes federal records and databases, as well as a separate repository for presidential records, known as the EOP (Executive Office of the President) system.
In a report last month , the Government Accountability Office blamed schedule delays and current and projected cost overruns in the hundreds of millions of dollars on poor oversight and planning by the NARA. In 2005, the NARA awarded Lockheed Martin $317 million to build out the records archive.
Additionally, the GAO stated that the NARA has not established a sound baseline for measuring contractor performance, and the performance data measured against that flawed baseline is unreliable. "This hampers NARA's ability to produce reliable estimates of cost and completion," the report stated.
The cost of building a digital system to gather, preserve and give the public access to the records of the federal government has ballooned as high as $1.4 billion, and the project could go as much as 41% over budget, according to government auditors. "In contrast, the contractor's estimated cost overrun is $2.7 million," the GAO report stated. "Without more useful earned value data, NARA will remain unprepared to effectively oversee contractor performance and make realistic projections of program costs."
The GAO report showed that over the past decade, the NARA has repeatedly revised the ERA's program schedule and increased the estimated costs for completion from $317 million to $567 million.
"Until NARA addresses these underlying issues ... the ERA system, at full operational capability, will likely be deployed at least 67 months behind schedule (in March 2017)," the GAO said.
While the volume of data the electronic archive stores will continue to grow, the development of new archival systems to store that data will end this year.
On its Web site, the NARA claims that there are no "out-of-the-box" or "off-the-shelf" products available for the volume and complexity of records the agency must handle and that "therefore, the ERA system development is one of custom engineering and integration."
David Lake, ERA communications manager, said his agency has taken the GAO's report to heart but added that the the NARA's cost-overrun estimates are based in part on project development stretching out to 2017.
"By 2017, what we would have spent is pretty unclear," Lake said.
According to Lake, by the end of this year, the NARA will have spent an estimated $463 million on the ERA for development and program management costs.
"Part of the report dealt with EDM [electronic document management] practices," Lake said. "But we certainly agree with and have made corrective actions with methodologies with how to track and manage value management."
The ERA's data center is located at the Allegany Ballistics Laboratory in Rocket Center, W.Va.
To date, the ERA has ingested 90.5TB of data, the vast majority of which are records from former President George W. Bush's administration. That data includes paper and electronic documents, such as e-mail traffic and interoffice memos. By the beginning of 2012, the records system is expected to balloon to 650TB of data, most of which (488TB) will come from the 2010 U.S. Census.
Lake said there is "talk" of having the ERA store high-definition video taken of congressional sessions and other government events. "At some point, when those things come in, then you're talking exponential data growth," Lake said.
Over the next year, about 50TB of classified data related to the wars in Iraq and Afghanistan will also be ingested into the system, Lake said. Most of the rest will be taken up by congressional records. Neither the U.S. Census data nor the Iraq and Afghanistan war data will be accessible to the public, Lake said. The war data is classified, and detailed personal Census data remains sealed for 72 years.
The public also won't be getting access to a great deal of the photographic or video content through the ERA system. The public portal opened in December. "We did have some delays and difficulties along the way. So the public access piece didn't start until later," Lake said.
For example, much of the data in the ERA system has been available for a number of years and is in flat-file database form, such as agency financial records and casualty records from the Vietnam War, Lake said. Still, being able to access public county records online or historical documents such as the 1783 Treaty of Paris in its original form may intrigue some researchers.
Though the ERA has been under development since 2005, its search engine , called Vivisimo, is still in prototype mode and won't be live until the end of the year.
By the end of this year, about 30 of the largest federal government agencies, out of about 300 total, will be using the ERA. By the end of 2012, the remainder of the agencies will be required to use the ERA system to store their records. However, there's still much that needs to be done by then, such as the deployment of a system to handle classified records and a record scheduling system that would automate the transfer of both paper and electronic records over to the system.
"Though we've been able to manage up to this point to handle the relative trickle of records coming in, without ERA we'd drown in the volume of records we're seeing come in the door," Lake said. "So we're making sure we have all those nuts and bolts in place, as well as the ability to ingest those records and safely store them."
Lucas Mearian covers storage, disaster recovery and business continuity, financial services infrastructure and health care IT for Computerworld. Follow Lucas on Twitter at @lucasmearian or subscribe to Lucas's RSS feed . His e-mail address is email@example.com .
Read more about storage in Computerworld's Storage Topic Center.