Jul 03 2007
Data Center

Bracing for the Unknown

State and local governments approach disaster recovery planning as a work in progress.

State and local government officials are free to choose when to develop their disaster recovery plans, but Mother Nature decides when they’ll need to do disaster recovery. Paul Tumminello, information security officer in the judicial administrator’s office at the Louisiana Supreme Court, experienced firsthand the wrath of Mother Nature.

“[Hurricane] Katrina hit about one year too early for us,” he says. Emergency preparedness had been a court priority and a disaster recovery plan was in the works, but not fully in place to handle the unprecedented natural disaster that unexpectedly hit the Gulf Coast in August 2005.

All things considered, Tumminello’s office fared pretty well compared with other information technology shops in the area. It was back online within a week. But it’s not a week that Tumminello would ever want to relive.

Three days after the storm, he was able to make his way to the Baton Rouge satellite office, where management had begun making recovery plans. One big problem was communications. Telephone landlines were down and cell phone service was spotty. As communications were practically nonexistent, Tumminello says, “All kinds of rumors were flying around, so we really didn’t know what we were in for.”

Finally, he was able to get in touch with a key team member, Robert Leali, who had experience with disaster planning from a previous position in the Florida Keys. Leali contacted the organization’s Internet service provider in Baton Rouge and arranged for colocation facilities to house the court’s physical hardware. The ISP also provided Internet connectivity to other recovery locations.

In the meantime, Tumminello worked with a team that traveled to New Orleans to see what they could salvage from the court. With some equipment in hand, it took Tumminello’s team, which included IT staff of the Office of the Clerk of Court, two days to reconfigure servers and set up the network. The organization was able to establish temporary office space with wireless connections and also allow staff members who evacuated to other states to connect remotely through virtual private network and remote desktop connections.

By the time the majority of employees found their way to the temporary base of operations in Baton Rouge, they were able to log in to the network as if they were back in New Orleans, according to Tumminello. Gradually, operations returned to normal at the Louisiana Supreme Court.

After that experience, the Judicial Administrator’s Office’s disaster recovery plan was put on a fast track. The office obtained space for a disaster recovery center in northern Louisiana and started coordinating efforts with the Clerk of Court’s IT department to colocate there. The two IT teams completed a small amount of renovation work — primarily upgrading cooling and power — prior to installation of the data recovery center. The disaster recovery site has been completed with both a failover disaster recovery plan and continuity of operations plan in place.

Gradual Improvement

Many state agencies and departments find that they have to live with compromises based on fiscal realities of the states’ budgets. A less-than- perfect plan is better than none, and most administrators hope that once they’ve established a good plan, they will be able to gradually improve it over the years.

The city of Plano, Texas, has a short- and long-term strategy. Currently, the city’s technology services department has a replicated system with the backup in the same room as the main system. The short-term goal is to move the backup to a second location, which is separated from the main center by two buildings, but is on the same power grid. “Just moving the backup system out of the room will give us a good deal of protection against many possible disasters,” says Dave Stephens, director of technology services. This first step should be completed by the end of the summer. The long-term goal is to move the disaster recovery center into a new emergency operations building.

To reduce costs and implementation time, Stephens says the city will move hardware and applications to the new center when they have to be upgraded. The city also developed a prioritized listing of services that enumerates the order in which department systems will be replicated in the new building. To create that list, Stephens and his staff met with city department heads. “We asked each of them what they would do in case of a disaster. If they had pretty good manual processes, they’d generally be noted as a lower priority than if they could not operate without IT,” Stephens says.

Stephens looks at the projects as a means for business continuity, not for cost savings. But he did recently have a chance to determine just how quickly he could realize a return on investment in the event of a disaster. A problem with a sprinkler system forced the courts to stop collecting revenue for two days. Extrapolating from data he gathered from that event, he figures that if the court system were down for one week, the revenue loss would equal approximately $260,000.

Testing Is Critical

IT leaders involved in disaster recovery strategies agree that testing is the key to success. It’s one thing to know in theory that the center will be viable, but says Bryan M. Herbert, IT director at the Fifth Circuit Court of Appeal in Gretna, La., “Testing assures us that it will work.” The Circuit Court developed its disaster recovery plan after Hurricane Katrina. It includes real-time data replication for critical applications, including e-mail, to redundant systems located in a court building in Shreveport.

Last summer Herbert tested the systems at the Shreveport location to make sure that they worked. This summer he plans a bigger test: “We’re going to shut down services here and cut over to the Shreveport location. If it works, users will experience only short-term outages,” he says.

An organization’s disaster recovery plan must be a constant work in progress. New technologies, services and regulations require constant upgrading. And you can never be too prepared.

How to Draft a Disaster Recovery Plan

1. Mitigate risk. Due to its critical nature, some functions — such as emergency services — must have automatic fail-over rather then disaster recovery. “The first step in designing a disaster recovery plan is to determine what areas need to be up all the time, what have to be back online quickly, and what can wait for a few days,” says William DiMartini, senior vice president at Atlanta, Ga.-based SunGard, which operates dedicated centers.

2. Consider the risks of data loss and downtime. This is an important distinction since some offices need to be available to citizens quickly, even if they will switch over to manual processes for a day or so. Other organizations may safely remain offline for a day or more without severely affecting customer service. But if the last backup before the disaster occurred on the previous night, for example, they would be in serious trouble.

3. Factor in dependencies. “Many organizations create elaborate and effective recovery plans for applications, but forget that without also recovering the dependencies, the function won’t work in its entirety,” says DiMartini. For example, police use local IT systems, state systems and the National Crime Information Center database.

4. Include a plan for moving people. If employees are to be moved to a hot site, determine how to get them there. Possibly include a plan for airlifting essential personnel.

5. Test vigorously. Many administrators who have designed well-thought-out plans for their organizations give testing short shrift — but at their own peril. New applications or even relatively minor upgrades can play a major role in degrading a network. DiMartini recommends that IT recovery plans be tested at least annually, with an eye on restoring the network and operating systems on the first test and working up to a full data center restoration.

6. Sustain the plan. Review the disaster recovery plan at least annually. New systems, new citizen services and new regulations may change the priorities of each function of the plan.

Dedicated Disaster Recovery Centers

While the Louisiana Supreme Court and the city of Plano are relying on building their own dedicated recovery centers, smaller organizations often opt to use dedicated disaster recovery centers. For example, the Teachers Retirement System of Georgia opted for a dedicated center, operated by SunGard, located about 30 miles from the TRS office. While the dedicated recovery center vendor provides some computer resources, such as PCs, the Teachers Retirement System had to buy a Fibre Channel-based storage area network and a 100-megabyte connection to the data center. It also purchased some rack-mounted servers.

But Greg McQueen, director of information technology, points out that a dedicated recovery center, hardware and network services, while essential, do not in themselves make a disaster recovery plan. You also have to consider business processes. “A very important aspect of any plan is the delineation of tasks we’ll have to perform and the sequence of applications that we will bring back online,” he says.

For example, McQueen’s first job will be to bring up the local area network and attach the PCs to it. Next, there is a sequence of applications: the first is a back-end system that tells the LAN how to operate; e-mail is next, followed quickly by the pension administration system, and then the accounting system. The sequence was determined through meetings with business managers. “Communications is paramount. That’s why the network and e-mail go up first. After that we have to bring our pension administration system online so we can generate benefit payments to our customers,” McQueen says. When the organization tested the system last spring, it took four hours to reestablish the network and business applications.

The project cost $300,000. But McQueen is not hoping to offset that through a return on investment. “How can you put a price on the ability to stay in business and provide the service we’re mandated to provide?” McQueen asks. He points out that TRS generates $180 million a month in pension payments “That’s people’s livelihood,” he says.

Close

Become an Insider

Unlock white papers, personalized recommendations and other premium content for an in-depth look at evolving IT