Feb 15 2017

How State Archivists are Taking on Avalanches of Email Data

With state agencies generating terabytes of email data, state archivists must find ways to cull, store and manage millions of electronic messages.

When Vermont ushered in a new governor last month, the State Archives and Records Administration got something else: email messages from the previous administration that amounted to a terabyte of data.

The trove included about 80,000 instances of constituent correspondence alone, but the haul could have been even larger. During the transition, the records office worked with the government to eliminate as many “transitory” documents as possible — messages about lunch dates and carpools that fall outside the scope of an agency whose mission is to maintain records that “have continuing value” to the state and its citizens.

Not only can many of these transitory documents be deleted without violating record-keeping laws, but getting rid of them is also an important step in making sure that the public has ready access to more significant documents.

States Struggle to Find Effective Storage

“People always say storage is cheap, but storage isn’t cheap when you have to spend time trying to find something in it,” says Vermont State Archivist Tanya Marshall. “It’s cheap to store [emails], but it’s not cheap when you actually have to go through a management program to make them accessible. And there’s no point in storing them if people aren’t going to access them.”

While headlines about federal email storage swirled around last year’s presidential campaign, the issue also pops up at the state level. Email has been widely used for about two decades at this point, but states are still struggling to adopt solutions that allow them to store what’s important, delete what’s not and make messages easily accessible to the public.

In some cases, problems are obvious and egregious. Montana made headlines earlier this year, for example, when it was discovered that the state hadn’t stored a single email from any state agency in its archives. The revelation led to finger pointing among Montana officials as it became clear that the state had not only violated the law but also lost many years’ worth of public records.

Even states that make a good-faith effort to comply with open records laws and maintain comprehensive archives sometimes struggle to handle millions of email messages. The potential problems stemming from email mismanagement include lawsuits, the release of personal information about citizens and employees, and financial inefficiencies.

“There’s so much email,” says Barbara Teague, a program consultant with the Council of State Archivists, and former Kentucky state archivist. “You either have to set a policy that says, ‘we only want the email from the most important people,’ or you have to [ask agencies to] cull the email before they send it. Probably, in practice, when records get transferred at all, we’re getting too much. People aren’t necessarily sending things that are policy-related. They’re sending everything. And that’s better than not getting anything. But when you put it in your digital preservation system, somebody’s got to process it.”

Adapting (to) an Archival Solution

Mary Beth Herkert, director of Oregon’s State Archives Division, says that officials in some states feel “paralyzed,” looking for a perfect solution that will solve all their email problems. “There is no perfect solution,” she says. “You have to pick the best solution, and then be adaptable.”

Oregon implemented its electronic records system, from Hewlett-Packard, in 2007. The tool now houses about 9 million electronic records — including Word documents, Excel spreadsheets, photographs and emails — from more than 50 state agencies. Workers at state agencies file their email into different folders, helping to classify the documents and tag them with the information about retention time before they are automatically transferred to the state’s records system.

The solution has allowed Herkert’s office to be responsive to records requests. In particular, she recalls when a previous secretary of state was running for re-election, and the state received a request for all of the candidate’s email from the previous few years. “You get a request like that, normally it’s going to take weeks,” Herkert says. “We captured all of her emails as soon as she touched it. I was able to pull back 83,000 emails in 90 seconds.”

There’s No Avoiding the Junk Mail

User education, good policies and systems that ease email classification are important to making email archiving more manageable. Still, many states find it nearly impossible to avoid storing some unimportant messages. In going through emails from three previous governors’ administrations, Herkert says, her small staff of archivists found themselves wading through thousands of mass emails about things like food drives, along with “every bad water cooler joke” ever told.

“We’re trying to go through and delete what we don’t need,” Herkert says. “It’s a manual process, and we can only do it when we have time, so you end up with a lot of junk.”

“Is it perfect?” Herkert asks of her state’s system. “No. But it’s a lot better than doing nothing.”