May 10 2023
Data Analytics

State and Local Agencies Integrate Disparate Data Sources with Data Fabric

The emerging data management design concept holds tremendous promise for breaking down silos.

Public sector organizations have spent the past few decades cramming data into every drawer, corner, closet, silo, warehouse and lake. Now, that information is drifting in the cloud or lurking in a windowless data center in a rural region somewhere.

The grand challenge has become how to make use of this material in a timely manner. Enter a new concept from Forrester Research: data fabric. It’s a way to link scattered information repositories, making them accessible to workers to solve everyday business problems. The idea has caught on quickly over the past 18 months.

What Is Data Fabric?

According to Gartner, “Data fabric is an emerging data management design that enables augmented data integration and sharing across heterogeneous data sources.” In a separate report, Gartner boils it down to helping organizations “see across silos that currently prevent us from sharing and finding information.”

Up to now, public organizations have cobbled together disparate data sources as best they can. The work is complex, code-intensive, time-consuming and often messy. Data fabric offers a new way to think about harnessing the mess through an architecture that can identify, catalog and translate between sources.

Click the banner below to receive curated content by becoming an Insider.

“State and government agencies are grappling with huge amounts of data,” says Michael Anderson, chief strategist for Informatica’s public sector business. “Most recognize that the data can help them make better decisions, and they’re also looking for cost reduction.”

Most important, the job of the fabric is to function invisibly. “We do the plumbing behind the walls to get you the data,” Anderson says. The required software doesn’t live onsite. It sits in the cloud churning away while workers access the information they need without headaches. 

Informatica’s software focuses on data integration, be it in warehouses or lakes, which are different. “A lake is like that kitchen drawer you put all your junk in,” Anderson says. A warehouse is far more organized.

Next, the company automatically cleans up and identifies the data for later retrieval. Once the groundwork is complete, the information is available for use.

Ideally, in the near future, a state worker at the Department of Motor Vehicles may pull up information about a driver from a dozen different sources without ever knowing it was knitted together. The objective is to make information retrieval easy.

“It’s a single point of entry into what are many complex interactions,” says Jason Adolf, industry vice president of global public sector for Appian.

LEARN MORE: How cities are improving data sharing across local government.

What Is Landed Data Fabric vs. Logical Data Fabric?

There are two approaches to data fabric architectures. Many vendors, including Informatica and Talend, store copies of retrieved information in centralized locations, where it gets cleaned up and then accessed. This “landed data fabric” approach differs from “logical data fabric,” where information stays in its original place.

Data management firm Denodo embraces the logical data fabric approach.

“Denodo brings computing to the data,” says Bill Sullivan, vice president and general manager of U.S. federal for Denodo. “Government is waking up to the cost of moving data in and out of the cloud.”

Denodo uses a data virtualization scheme, which leaves information at its source. According to the company, virtualized views allow workers to find data easily and understand its content, currency, quality and validity before using it. “We can cache data for specific inquiries without impacting an agency’s production system,” Sullivan says. The bottom line is that it’s cheaper and less disruptive.

Data.World also uses a logical fabric architecture. The company helps customers build enterprise data catalogs so information can be searchable and findable, says Juan Sequeda, principal scientist and head of Data.World’s artificial intelligence lab. It’s all about establishing and keeping track of an agency’s metadata.

“If you have a big bucket of data, you create a map of how things are connected,” Sequeda says. “You want to create a prioritized list” of information sources, which is fed into a knowledge graph.

Michael Anderson
State and government agencies are grappling with huge amounts of data. Most recognize that the data can help them make better decisions...”

Michael Anderson Chief Strategist in the Public Sector, Informatica

How Does Data Fabric Work with AI?

Knowledge graphs powered by machine learning use natural-language processing to construct a comprehensive view of different kinds of data through a process called semantic enrichment, according to IBM.

Once a knowledge graph is complete, it allows questions to be asked and answered. According to IBM, “While consumer-facing products demonstrate its ability to save time, the same systems can also be applied in a business setting, eliminating manual data collection and integration work to support business decision-making.”

In February, Data.World launched its AI lab to develop new automated search capabilities to aid in data discovery and governance. It’s using such AI tools as GPT-3, OpenAI’s controversial large language model.

“The integration with GPT-3 is just the first example of the many possibilities that lie ahead, and we look forward to continuing to work with our customers to explore new and exciting use cases for knowledge graphs,” said Data.World CEO and co-founder Brett Hurt in a press release.

EXPLORE: The pros and cons of cybersecurity insurance for municipalities.

How Can Data Fabric Help State and Local Governments?

Rochester, N.Y., is in the early days of building a data catalog.

“We are motivated mainly by compliance factors, like cybersecurity insurance requirements, and need a comprehensive data inventory,” says Kate May, director of IT operations for the city. She also notes that a recent financial audit pointed out some weaknesses: “We’ve been dinged in the past for not having as big a picture of our data as we should.”

May is now busy mapping the city’s data assets — all 300 of them — and figuring out who owns which. “We’re taking little bites at a time. We’ve got 35 kinds of data risk that we worry about,” she explains. The next step will be building a risk classification.

She’s also building a metadata profile with an assist from Data.World, which is providing implementation support. “They give us insight,” May says. Formerly, she says, the city kept all of its data management in-house: “It was an awful approach.”

May’s goal is to have 80 percent of Rochester’s data assets comprehensively documented with full metadata within two years. “This is work that is very slow and isn’t sexy,” she says.

gorodenkoff/Getty Images
Close

Become an Insider

Unlock white papers, personalized recommendations and other premium content for an in-depth look at evolving IT