Emerging Tech

Backing up data isn't enough anymore. What's needed are clear audit trails and a policy-based strategy for managing data throughout its life.

Alan Joch

The state of Michigan had a surefire way to safely store and manage its entire storehouse of data: Save every byte on high-speed storage arrays packed with the fastest Small Computer System Interface (SCSI) hard drives. “It was the Cadillac of storage,” recalls Storage Manager Rick Hoffman.

But that was four years ago. Today, Hoffman has doubts about the upkeep costs on this luxury model. “We quickly realized that along with our Cadillac, we needed some Chevys, too,” he says.

Hoffman isn’t the only one with frugality on his mind. A growing number of public sector IT managers who want to lower their storage costs are investigating new methods of data management.

One option is an information lifecycle management (ILM) approach that manages data based on automated policies for transferring data to several different types of storage media depending on the data’s age and importance. Thus, with a policy-based strategy, mission-critical information rides the Caddys, while archival data moves along more slowly in economy cars. “After we started implementing these principles, it seemed like such a basic strategy that I’m surprised it took so long for people to catch on,” Hoffman says.

So why isn’t everyone taking this approach? One reason is lingering confusion about how to implement it. Hoffman and other storage experts acknowledge that you can’t just flip a switch and instantly achieve cradle-to-grave data management. Instead, IT managers must navigate through a multistep process that intertwines data policies, software and hardware, each of which involves unique challenges.

Tall Hurdles

This isn’t an entirely new concept: CIOs have long sought to maximize their storage dollars in their quest for more efficiency. However, the current buzz springs from the concern within IT centers about how to keep costs down while the demand for data storage continues to grow. Michigan, for example, now stores 250 terabytes of information, an amount that’s increasing by almost 25 percent a year.

Added to this challenge are new data-retention regulations that are pushing states to store and protect some records for seven years or more. The regulations also require them to quickly locate a specific file, no matter its age, in response to a legal inquiry.

“The regulation that’s had the greatest impact on us has been HIPAA [Health Insurance Portability and Accountability Act],” says Dan Paolini, director of data management services in New Jersey’s Office of Information Technology. “The law makes no distinction between public and private entities for handling personal healthcare information, and we have obviously had to come into compliance.”

The Sarbanes-Oxley Act of 2002 boosts financial accountability for publicly owned corporations, and, though its provisions don’t directly apply to states, “they certainly provide guidance as to expectations,” Paolini observes. “We try to take them into account wherever possible.”

Unfortunately, ILM isn’t a quick fix for all of these challenges. In fact, some ILM applications are too immature, according to a number of CIOs. So far, the ILM software that Michigan’s Hoffman has test-driven for automating the migration of data from high- to low-cost storage devices falls short.

“The only solution that had all the right functionality would never handle an enterprise as large as ours,” he says. “All of the disk I/O [input and output traffic] went through a single point. In an enterprise, that would be a bottleneck and a single point of failure.”

Budget restraints are an additional roadblock for many. New Jersey has yet to implement a formal policy-based system even though its IT managers recognize the value. “It’s not a lack of understanding about the technology that keeps us from doing it, it’s the funding,” Paolini says.

Despite these hurdles, both Michigan and New Jersey are moving ahead and adopting ILM practices wherever possible.

New Jersey took a major step toward data control in 2000 when it formed the Data Management Council, an initiative of Steve Dawson, chief technology officer and CIO, that brought together executive branch representatives to develop an enterprisewide common data architecture.

“We now treat data as a resource, in the same way we treat people, money and buildings,” Paolini says. In addition, New Jersey built a data warehouse, a central storehouse of information that is available across state departments.

The value of these initiatives is what Paolini calls a “single version of the truth.” The centrally managed data warehouse helps assure that departments have up-to-date information and that various departments aren’t maintaining redundant records—thereby wasting storage space and creating unnecessary costs.

Prior to the innovations in 2000, New Jersey didn’t have a formal way of prioritizing data by age. “Departments pretty much shot from the hip,” Paolini recalls. “They kept data as long as they had space for it, and when they ran out of space, they dumped the older data.” The state’s data-warehouse environment provides more flexibility in handling aging data, thanks to the ability to create summary data. “We create summary records that stay in the database, and we purge the details by writing them off to tape,” he explains.

Paolini believes the data-management foundation now in place will ease New Jersey’s transition to automated policies if the money starts flowing in.

Implementing ILM

Some data management experts say that enterprises ready to travel the ILM road should consider five key steps before they begin.

First, classify data. “The ILM process starts with IT, line of business, legal and risk-management people getting together to determine what information is important based on applications, security or compliance,” explains Michael Peterson, program director of the Data Management Forum of the San Francisco-based Storage Networking Industry Association.

Michigan sorted its information according to business requirements and how it impacted daily operations. “If there’s a high probability for recall and the information needs to be available 24 x 7, then it stays in our level-one storage systems,” Hoffman says. Two other levels exist for older data that’s less likely to require retrieval.

Frontline data includes information from payroll, purchasing, driver’s license and taxation systems, as well as e-mails, which provide important running records of communications inside agencies and with the public. Inherent in the classifications is recognition that value is a relative term: Today’s critical data may lose importance as it ages.

The second step builds on the ILM classifications to establish criteria for data aging and retention. Here, data managers decide the rules for how data moves from the fastest to the slowest storage resources based on the relative importance of that information.

Data aging points to a fundamental reason why archiving is so hard to automate. Determining value is still best done by people—often people from several disciplines—not by software. “Our [end users] look at data from a business value to determine when something is no longer level one and we should move it,” Hoffman reports. “Based on their review, we move the data manually to a lower level.”

What’s the incentive to keep a close watch on what data remains in expensive high-level systems? Budget concerns. Departments have service-level agreements with the IT department and are charged more for higher-end data storage.

Third, evaluate software that will help automate the task of moving data down the storage hierarchy. Two types of applications are needed: data-management software, which essentially looks at the bits and bytes that are being stored with no regard to what they represent; and information- or document-management software, which understands the content represented by the data bits based on the applications, business practices and business processes with which it’s associated.

In the past, these two types of programs represented distinct segments of the software market. Today, however, organizations can choose a combined solution—an integrated data- and document-management product—thereby avoiding the integration issues of a best-of-breed approach.

Just as step one creates tiers of data, the fourth step creates tiers of storage hardware. The IT staff chooses the storage technology that is most appropriate for each data tier. At the highest tiers, the fastest performing and most expensive hard drives hold the data most frequently accessed, while at the lowest tier, slow but economical tape cartridges or hard drives hold archival data. Large organizations typically create three or four stratifications, says Jack Scott, senior partner at the Evaluator Group, a storage industry research firm in Greenwood Village, Colo.

Michigan’s three levels of storage are built around Fibre Channel storage area networks that hold different classes of drives, ranging from level-one SCSI hard drives to level-two Serial Advanced Technology Attachment drives. For archiving, the state government writes data to a Fibre Channel tape library at an offsite location.

The final step is to begin information lifecycle management—initially only in a small way. Bill North, director of storage-software research for IDC, a technology researcher in Framingham, Mass., cautions against unleashing your ILM strategy on the entire organization all at once.

Instead, hone your data-management skills in one key area that has potential for a quick payback, such as reducing costs by moving low-priority data from expensive storage systems to lower priced hardware. One good candidate is e-mail as it’s a ubiquitous management problem. “Use that as a pilot program,” North recommends, “then take what you learn there and use it as a foundation for additional ILM efforts.”

Data Storage Tiers

Tier One: Primary Storage

Data: Recent information from payroll, purchasing, driver’s license and taxation systems, as well as e-mails
Hardware: Production networked storage, including storage area networks and networked attached storage with high-speed Small Computer System Interface (SCSI) hard drives

Tier Two: Secondary Storage

Data: Files usually a month or older from primary government applications and e-mail servers that are no longer needed for daily operations
Hardware: Lower-cost online disk arrays built with Serial Advanced Technology Attachment or new serial SCSI drives

Tier Three: Near Offline Storage

Data: Less active records
Hardware: Offline tape cartridges in robotic libraries or “virtual tape” arrays of low-cost hard drives

Tier Four: Archival Storage

Data: Records required by law to be retained for a prescribed number of years in a form that prohibits revisions and editing
Hardware: Write-Once Read-Many tape or optical drives

Five Steps to Developing an ILM Policy

MANY ORGANIZATIONS adhere to the following steps when implementing an information lifecycle management policy:

1. Classify: Based on discussions involving IT, management, legal and other relevant departments, establish guidelines for ranking data importance by age and value to internal operations, constituents and compliance requirements.

2. Regulate: Using these classifications, set policies for how different data types will be stored, migrated from frontline to archival storehouses and eventually deleted.

3. Evaluate: Some policy-based software and appliances tier data according to rules developed by an organization that defines what data goes into which tier and when.

4. Establish: Create tiers of storage hardware, with the most expensive, high-performance equipment holding the frequently accessed data and the more economical, slower equipment holding the archival data.

5. Implement: Pilot an ILM policy for a single vertical application, such as
e-mail, and use lessons gained to speed larger rollouts.

Oct 31 2006