Oct 31 2006
Data Center

Record Keepers

Washington state builds the nation's first centralized, automated digital archive -- from the ground up.

IF ABRAHAM LINCOLN had written the Gettysburg Address on a notebook PC, would it still be around today?

That question, while rhetorical, helped guide the state of Washington as it built the nation’s first statewide, automated digital archive from the ground up. The collection, built with Microsoft tools and stored in a customized central facility located in Cheney, Wash., not only stores documents electronically and preserves them for the long term, but also provides scanned images of documents online for users to access worldwide (www.digitalarchives.wa.gov).

“It’s not just accumulating records,” explains Adam Jansen, digital archivist for the state of Washington. “That’s not magical.”

What is magical, he says, is that millions of records documenting the state’s history can be readily available 24 x 7 to Washington’s 6 million residents from their homes, offices and classrooms.

Those records include documents such as an 1854 Army Corps of Engineers transcontinental railroad survey. Students learning about the history of the railroad can look at full-scale scans of the map that proposed extending the railroad to the Pacific Ocean.

“Towns and cities lived or died [depending] on where that railroad went,” says Jansen. “These records have been stored away in the state archives with very little access.”

The records also include marriage certificates, which soldiers deployed in the Middle East can access to take care of personal business back home.

Jansen recalls one couple looking for proof of the existence of an alleged ancestor, a train conductor who was the namesake for the town of Connell, Wash. They had spent 15 years searching through county records, to no avail. After a quick query at the new state digital archive research room, they found a census entry showing he had lived in the next county over.

“They were ecstatic,” Jansen recalls. “They had been looking in the wrong place and had pretty much given up hope.”

Jansen and his team don’t get to hear many success stories like the one about Connell, since most people search the online archives with their own computers, rather than with the machines in the public research room. But he and his team know they’re making a difference by preserving important documents that record citizens’ rights with regard to land, marital benefits and ancestry.

“At the end of the day, my staff goes home knowing they made the world a better place,” Jansen says. “They preserved the rights of their children. That’s powerful stuff.”

Bringing History Into the Future

When he was the elected county auditor for Thurston County, Wash., it was Sam Reed’s job to collect and preserve property records, maps, deeds and other important documents. In 1995, his team started scanning county records so the original documents could go back to the owners, while the copies could be stored electronically and e-mailed when necessary.

“We’d been a county since 1852, and we had a huge amount of paper,” says Reed. “We were using up a lot of expensive office space just on paper.”

The state archives, however, would accept county records only on paper or film, so his office had to print copies of the e-documents and snail-mail them, just like in the old days.

Other state agencies were also catching on to the idea of electronic documents. But that actually exacerbated the problem because there was no plan for long-term preservation. State archivists estimate that about half of Washington’s records had been lost before the digital archives project began because records were stored on hardware or software that had been lost or corrupted.

“I realized we were losing a lot of history,” Reed says. “I thought, my goodness, our forefathers did such a wonderful job of keeping the history and storing it, and here we are losing it.”

After being sworn in as Washington’s secretary of state in January 2001, Reed set a priority to create a statewide digital archive. After getting legislative approval for the project, he hired Jansen to help make his vision a reality.

Jansen, who has a degree in archives and records administration but learned about technology as an audiovisual archivist at Microsoft, is the only information science specialist on staff at the digital archives building. The other four members of his team are software developers and network specialists. So he’s been teaching them about historical archiving.

“Part of the project was reinventing wheels, part of it was creating wheels that had never been created before and a big part of it for me was learning how to speak geek,” Jansen jokes. “My profession is going to be changed radically by the work we’re doing here.”

Making Progress

For three years, without fail, the digital archives team met every Friday morning for up to four hours to review the project’s progress, says Steve Excell, Washington assistant secretary of state.

“This has been a big part of our lives,” he explains. “The meeting kept everyone on top of the project. It kept it on budget and on schedule.”

Originally, the project team considered using content-management software to serve as the backbone for the archives. While that approach works well for organizations with a single technology platform, it didn’t work for a state that was trying to collect records from a half-dozen different county legacy systems with varying fields and imaging requirements, Excell says.

That was a big epiphany for the project team. The members realized they needed a system that would automatically convert records into the central archives format. “We couldn’t hire enough archivists to eyeball all the county records manually,” Excell recalls. “We also didn’t have the power to tell the local agencies that they had to drop their legacy systems and go with our system.”

The project team looked at federal, state and international archives, but no one was converting various systems to one format, Excell says. So the team turned its attention to the banking industry, which had been in the midst of heavy merger-and-consolidation activity. The big players in that field were employing Web services to enable the legacy systems from acquired banks to communicate in a unified environment.

Changing Direction

What the team learned from the banking industry changed the direction of the project. Instead of content-management applications, the team looked at Microsoft’s Web services software. The SQL Server seemed to be robust enough to handle the state’s mass storage requirements, and Microsoft’s BizTalk Server could serve as a sort of “Rosetta stone” by pulling in and translating data from the counties, Excell says.

The only human intervention required would be the setup: Someone had to gather the specs on each county system, then put together business rules so the Biz Talk Server could convert the county processes into the central system. Once that was done, the records from the counties would be automatically sent to the central archive.

On paper, the technology looked great, but the project team wanted to see it in action. In early 2004, it had Microsoft conduct a pilot in the vendor’s storage area network (SAN) laboratory, Excell reports.

Microsoft had to prove the technology could address three issues: that the SQL server was robust enough to handle the larger scales of data required for the archives; that the system could adequately present large online images such as maps, census sheets and deed books, which can be 20 to 24 inches tall and 14 to 16 inches wide; and that it could absorb different types of county records into the central database.

During the pilot, the project team pulled marriage records from three counties that were considered “beta” or test counties: Chelan, Snohomish and Spokane. To view large records online, they incorporated scanning software that downloads sections of large documents as users scroll through them.

The pilot went off without a hitch, and the project team was able to see the process in action. First, it duplicated the original image — whether an Excel, Word or any of about 100 other types of documents.

That file was made tamper-proof using hash-encryption technology and was preserved in the digital archives. The system then created a Web version in XML for the metadata and loaded the document image using scanning software.

“The pilot test was done very fast,” says Excell. “We did it in a couple of months, and we said, ‘Hey, this stuff works.’”

Fast Forward to the Future

Some of the work done in the pilot, such as pulling records from the beta counties, cut out steps in the final build and sped the project along, according to Excell. “We came in on time and on budget, but our programmers were sleeping on the floor of the archives [offices] until about two days before the opening,” he adds.

The physical and virtual grand opening of the digital archives was on Oct. 4, 2004, and it proved its value almost instantly. After two terms, Washington Gov. Gary Locke was getting ready to step down from office, and the archives team was able to preserve his gubernatorial records in the new system.

“It was a vindication for the whole project that we were able to capture that important piece of our state’s history,” Excell says.

The first floor of the Cheney facility stores the traditional paper archives, while the digital archives are upstairs, along with a public research room, training space, offices and a state-of-the-art computing center.

The digital archives were built using Microsoft tools, including Windows 2003 server, SQL Server 2000 database, Biz Talk Server 2004 and Visual Studio .NET 2003 integrated development environment. The platform also includes an EMC CLARiiON CX700 mass storage environment SAN with 4 terabytes of storage to record images and 1TB for the metadata. The team expects to expand to 20TB within months. “We’re accumulating a lot of information rapidly,” Jansen explains.

The project has been in the spotlight since its inception. The grand opening was attended by state, national and international officials from as far away as China, who were looking to emulate some components of Washington’s digital archives.

“It was a pleasure and an honor for me to attend the dedication of this great facility last year,” says Lewis Bellardo, deputy archivist of the United States and chief of staff, U.S. National Archives and Records Administration. “Washington state is doing pioneering work in the preservation of digital information.”

Recording Records

Washington’s digital archives collect a vast range of records, including the following:

• Marriage records

• Court records, including dockets and case files

• Land records

• Legislative documents, including state laws

• County commissioners’ proceedings, ordinances and resolutions

• Real and personal property tax records

• School district records

• Gubernatorial documents

• The state constitution

Users Fund Digital Archive

While the importance of Washington’s statewide digital archive is not lost on officials like Secretary of State Sam Reed, he readily admits it would never have stood a chance against a long list of other state priorities. “Just competing for bond money with universities and K-12 and the corrections system, we never would have gotten it,” Reed says.

So rather than asking the state to foot the bill, Reed’s team went to the users. Each county added a $1 surcharge on recorded documents, so that every marriage license, death certificate, land deed or prenuptial agreement chips away, dollar by dollar, at the cost of building the physical and virtual digital archives infrastructure, he explains.

With funding settled and the archives up and running, the benefits are becoming increasingly clear. For starters, there are the space savings. Storing millions of records in county offices can be cost-prohibitive.

Then there are the labor savings. An average of 800 people a day search through the 5 million-and-counting state digital archive records online. If that weren’t a self-serve function, it would either drain the staff’s time or prohibit citizens from accessing the records, says Washington Digital Archivist Adam Jansen.

Another huge benefit is the preservation of the original documents. Web versions of documents are available for duplicating, while the original records are stored in a dark, 60-degree Fahrenheit, low-humidity preservation environment in the state archives building, explains Washington Assistant Secretary of State Steve Excell. Storing electronic versions also ensures that the documents won’t be lost in a flood or fire.

The digital archives are also beneficial from a liability risk-management standpoint, Excell adds. In the early days of electronic documents, when agencies were sued, courts often cut them some slack if files were missing.

However, as electronic storage technologies become both more sophisticated and less costly, laws in both the public and private sectors regarding the preservation of records are getting more stringent.

“The courts are saying ‘No more excuses,’” Excell notes. “You have a responsibility to manage these records.”

The decreasing prices of storage also add to the overall cost-effectiveness of the digital archives project. “It’s a great time in history to preserve history because the technologies are available,” Excell adds.

The benefits of the digital archives are varied, but for Jansen, it all boils down to one document — the 1889 state constitution — which mandates that state records establishing citizens’ rights must be preserved. “We have a legal and moral obligation to save those records,” Jansen says.

Records from yesteryear also capture the changing times. For example, census records from the 1800s show that residents were asked whether they were deaf, dumb or blind, mulatto or Indian half-breed. “It’s the cold, hard slap of reality,” says Jansen. “You can read about it, but actually seeing it is a different story.”

It’s history, for better or for worse. “The digital archives are like a time machine,” Excell says.

Doing the Math

By the end of 2005, Washington state had 5 million records in its digital archives and an average of 800 citizens searching those records each day. Yet, the digital archives are staffed by just five employees.

Close

Become an Insider

Unlock white papers, personalized recommendations and other premium content for an in-depth look at evolving IT