UK National Archives seeks to unlock file formats

The UK will make terabytes of government data locked up in mostly Microsoft proprietary file formats available to the public in their original form.

The U.K. has embarked on a plan to make terabytes of government data locked up in mostly Microsoft Corp. proprietary file formats viewable to the public in their original form.

The National Archives, the repository for government records, has digital records in literally hundreds of esoteric file formats, however, the vast majority of data is stored in legacy Microsoft Office formats, said Natalie Ceeney, chief executive.

Changes in software and operating systems have made viewing those documents in their original format impossible, said David Thomas, director of technology and chief information officer for the National Archives.

"We're not building a museum of old computers here," Thomas said. "We want to make it [National Archive content] readable on current desktop technology."

Microsoft has offered its assistance for the National Archives to use Virtual PC 2007, a virtualization product that allows multiple OSes -- as well as legacy OSes such as Windows 95 -- to run on a single piece of hardware. Microsoft is also providing older versions of Windows OSes as well as Office applications for the project.

The technology would offer the public the advantage of viewing documents in the form they were created, which can add context and depth, Ceeney said.

The National Archives receives much of its government information through a secure intranet, and that data is backed up to tape, Thomas said. Tape storage is the cheapest and the most robust way to keep data, so there are no old floppy disks around, he said. So far, the National Archives has about 580 terabytes of digital data.

Eventually, the National Archives envisions a system when a citizen could use a computer at the National Archives running Virtual PC 2007 and view, for example, an older Microsoft Word document in its original form. A further step would be creating a way where people could do that over the Internet, he said.

At the event held Tuesday at the National Archives southwest of London, Microsoft didn't hold back in making a hard case for the default Office 2007 file format, Open XML (extensible markup language).

The format was approved in December 2006 as a standard by Ecma International, a European standards body. Microsoft has been trying to drum support for it, over rival Open Document Format, (ODF) an XML-based format used in free office suites such as Open Office.

For a long time, document file formats were mostly proprietary, and "we certainly won't the only ones who were doing it," said Gordon Frazer, managing director of Microsoft U.K.

Frazer stressed that Open XML is no longer a proprietary or binary format controlled by Microsoft, but Ecma. "We've worked very hard to embrace open standards," Frazer said.

Microsoft's attitude toward compatibility has been "a huge sea change" in recent years, said Adam Farquhar, head of e-architecture at the British Library, which is also working with Microsoft on digitizing books in its collection.

"Microsoft has taken tremendous strides forward in solving the problem [of compatible file formats]," Farquhar said.

(Additional reporting by Leo King of Computerworld UK, in London.)

Join the newsletter!


Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.
Show Comments