Oct 25 2021

Incident Response: The Steps to a Root Cause Analysis for State Government

Root cause analysis helps government agencies determine the underlying issues that led to an incident, cybersecurity or otherwise, and aids in preventing repeats.

The morning of Aug. 16, 2019 began like any other for Amanda Crawford and her team at the Texas Department of Information Resources.

Then, they got a call that a local government entity had been hit by a ransomware attack. Then another and another.

“We ultimately ended up at 23 different entities that were impacted by this same ransomware event that came through the same attack vector,” says Crawford, who is the executive director of the Texas DIR and the state’s CIO. “We were all over the Texas map.”

As the governor declared a state of emergency, Crawford’s team took action, identifying the scope of the attack and employing a strategy that would help to eventually identify the root cause of the problem.

What Is Root Cause Analysis?

Root cause analysis, sometimes called RCA, is a formal effort to determine and document the root cause of an incident, then take preventative steps to ensure the same issue doesn’t happen again, says Matt Mellen, director of the security operations center at Palo Alto Networks.

“It’s arguably the most important phase of the incident lifecycle, aside from containing and eradicating the threat,” he says. “No security organization wants to continually respond to the same threat over and over again.”

Chris Gerritz, co-founder and vice president of threat intelligence and response at Infocyte, who has worked with state government entities including Texas’s, says when a cybersecurity attack happens, the first step is triage: identifying what has been impacted and the scope of the problem. Once the threat is contained, root cause analysis can be conducted to identify the “beachhead,” the vulnerable server or computer in the network that was first hit.

“If I’m attacking a state agency that has 1,000 employees, I want to attack someone’s computer that’s really easy to attack” — an accountant who uses the same password for everything instead of the organization’s cybersecurity expert, for example.

RELATED: How can your agency craft an effective incident response plan?

“Root cause is really finding who that origination point was, where the entry point was into the network and where they jumped off from there, and kind of tracing that back so that we can say, ‘Ah, the attack came from the accountant’s computer. Oh, the accountant’s computer was hacked from a guy in Russia,’” Gerritz says.

In Texas, all of the local entities targeted by the ransomware attack in 2019 were found to have used the same managed service provider.

The root cause — or, at least, the big picture of it — came to light quickly, says Crawford, though she was unable to discuss details because the case is still under FBI investigation.

“One of the things that we discovered is that there were some basic cyber hygiene principles that were maybe not being followed that certainly led to a greater vulnerability,” she says.

KEEP READING: Get complimentary resources from CDW on building an incident response plan.

The Most Common Methods and Tools for Root Cause Analysis

There are a variety of tools that cybersecurity forensics professionals may use for root cause analysis to pull things like browser history and analyze a network’s logs, Gerritz says.

It’s possible for government agencies to find open-source tools, such as log2timeline. However, in state-level incidents it’s common to bring in professionals who use more sophisticated forensics kits that can extract evidence that could be admissible in court.

In Texas, the specific tools to perform a root cause analysis after a cybersecurity incident would depend on the type of incident that occurred and vary based on organization, Crawford says. Generally though, the investigation would start with gathering and analyzing the appropriate logs and use security information and event management, or SIEM, systems to enter logs.

Outside of the cybersecurity sphere, root cause analysis has broader applications in the state.

“It can be anything,” says Crawford. “You could think about it in any kind of contract where you’re expecting a delivery of services. Because at the end of the day, if you have a gap in service delivery, you want to know what the root cause of it was.”

DIVE DEEPER: What are the risk preparedness lessons government can take from the Oldsmar hack? 

Crawford says root cause analysis is often found as a part of contracts for all types of products and services. Her agency uses a shared technology services program, and in the event of a problem, the offending party — a vendor, for example — would be required to submit a root cause analysis and to show how it will improve for the future.

Arizona also uses root cause analysis across a wide range of state operations via the Arizona Management System, a tool to help agencies standardize processes, track data and, ultimately, improve, says Josh Wagner, director of the state’s government transformation office.

With a problem-solving motto of “Plan, do, check, act,” his office employs root cause analysis as part of the first “plan” phase. The state has numerous success stories using this process, which it details on its website.

“No matter what you’re looking at — whether someone got hacked or you want to automate a process or you want to move a service online — way too often, people jump directly to the solutions space. ‘Oh, this is the problem, I know what the problem is, so this is what we’re going to do about it,’” Wagner says. “It’s very, very important that regardless of where you’re trying to go, that you spend the time to understand the data to understand the gap that you’re trying to fill. The problem-solving methodology is something that we teach that gets applied, regardless.”

EXPLORE: Ransomware and phishing are still agencies’ top cybersecurity concerns.

The Steps Required to Complete Root Cause Analysis 

Root cause analysis is ultimately a documentation and communication process, Crawford says.

While her office uses a particular software, she says organizations should be able to use any tool that allows them to follow the same steps: “detailing the steps followed to resolve the incident; documenting the cause of the incident; categorizing and documenting the methods used to resolve the incident; documenting communications sent to the customer regarding resolution categorizing the actions to take to resolve the incident; and confirming the categorization of the incident is correct.”

“You could use things, for example, like audit tools — anything that’s going to document your formulaic process to get at the root cause of whatever happened,” she says.

It’s also a collaborative process between several key stakeholders, says Mellen. In the case of cybersecurity incidents, that’s usually the system owner and a security incident response team, which come together to answer key questions, such as what happened and why.

“It’s unreasonable to expect an organization to prevent a cybersecurity incident from happening again if it can’t agree on what enabled the threat to occur. Once the key enablers are documented, the preventive steps often naturally surface, allowing the organization to prioritize them,” Mellen says.

In this way, root cause analyses can be drivers of real change.

In Gerritz’s work with state agencies, he finds many don’t do root cause analysis because they don’t have cybersecurity professionals who know how to do it, and certain tools can be expensive to deploy.

“It’s something that we are preaching at Infocyte and whenever I talk to a state agency,” he says. “I promote the fact that you can’t just take every attack and replace the computer. You have to figure out where the holes in your network are and actually fill them so that you are reducing the attacks from happening again. And that does require them to go and have a process for incident response that includes root cause analysis.”

TommL/Getty Images