While embarking on a major project to digitise its collections, the National Library is also looking to the future with development of a system for curating web-pages.
“So much of our recent social history is on the web,” says developer Gordon Paynter, but curators wanting to preserve it have to date been using desktop tools that he says are “not reliable or scalable; they have no management control and no control over the bandwidth used.”
The last is a significant problem; curators sometimes find that a website they’ve requested is much larger than originally thought and slows down other applications trying to use the same internet connection.
Copyright restrictions also apply to some of the material archived and these must be carefully checked and permissions recorded.
The library decided to build an “enterprise class” system that will have these tools, and approached it as a joint development with the British Library.
Development has been going on in earnest since early last year; the first version of the product was released to open source in September.
Known simply as Web Curator, it “crawls” a defined region of the web based on a theme or a particular event or according to a regular schedule. Important events can influence scheduled acquisitions; some politically-themed sites, for example, might only be worth sampling twice a year, but during a general election they could be sampled daily.
The library is currently working on an upgrade to the tool, Version 1.2. “We’re setting up agencies to link in national and university libraries [in several countries] so they can test the tool with us before they consider installing it.”
Both the British Library and the National Library of Australia are currently using a more basic and, Paynter says, “unstable” web-page curation system called Pandas 2. Australia is working on an upgrade, Pandas 3. Paynter’s team are “optimistic” that the British Library will pick up on the joint British-NZ development long-term rather than going along the Pandas route preferred by Australia.