What Is Data Quality?
“Data quality provides a measure of data integrity,” Sweden says. “Data quality assesses the level of data integrity by evaluating accuracy, completeness, reliability, validity and timeliness.” It’s a subset of attributes which make data integrity possible.
The Data Management Association defines data quality as “the degree to which data is accurate, complete, timely, consistent with all requirements and business rules, and relevant for a given use.”
“High-quality data eliminates incongruency across systems and departments and ensures consistent data across processes and procedures,” IBM notes. “Collaboration and decision-making among stakeholders are improved because they all rely on the same data.”
The city of Austin, Texas, recently discovered the challenges that come with a lack of data quality. A July 2023 internal audit found that “the data on the portal did not consistently match departments’ data sources. The discrepancies between data on the portal and in department sources varied from as few as two missing records to hundreds of thousands of missing records. This means community members and City decision-makers who use data from the portal may be getting information that is incomplete, inaccurate, or otherwise different from the data departments may use when making decisions.”
It turns out, organizations of all kinds believe their data integrity isn’t where it needs to be. In a survey conducted by Drexel University, only 34 percent of organizations felt their data quality was “high” or “very high.” Half of the respondents felt poor data quality is the leading challenge to data integrity.
How Do You Boost Data Integrity?
Improving data integrity is a gradual process. You first need to understand what you have, says Timothy Humphrey, chief analytics officer at IBM.
“In most institutions, your metadata is poor,” Humphrey says, because organizations inherit information over time. “You need to understand the map of your data universe.”
For example, what systems do they go through? What’s the lifecycle all the way to insight? Humphrey also advises taking an iterative approach to data mapping. “Go after the most challenging problems first.”