Data quality may sound simple—it’s either right or it’s wrong, right? The fact of the matter is, data (and especially big data) can have a variety of quality issues that extend beyond true or false. When it comes to assessing data quality, there are seven dimensions to consider.
1. Accuracy: The values contained in each field of the database record should be correct and accurately represent “real world” values.
Example: A recorded address should be a real address. Names should be spelled correctly.
2. Completeness: The data should contain all the necessary and expected information, and the scope of the data element should be understood by the user. No required elements should be missing or in an unusable state.
Example: If first and last name are required in a form, but middle name is optional, the form can still be considered complete if no middle name is entered.
3. Consistency: Recorded data should be the same throughout the organization and across all systems. Watch out for conflicting information between data sets, records, and systems.
Example: Data for a sale recorded in the company’s CRM should match data recorded in the financial software.
4. Conformity: Data should conform to certain standards of type, size, format, etc.
Example: All dates should be in mm/dd/yyyy format. Names should use only letters, not numbers or symbols.
5. Uniqueness: One real-world entity should correspond to only one thing in your data. Duplicate entries should be eliminated.
Example: If you have a company record with the name “Salesforce” and another with the name “SalesForce,” one record should be deleted (ideally the one that doesn’t reflect Salesforce’s preferred capitalization).
6. Integrity: The data should be valid across relationships, meaning that there are recorded relationships that connect all the data together. Note that unlinked records may introduce duplicate entries in your system.
Example: If you have an address recorded in your database, but there is no person, company, or other relationship associated with the address, the data is invalid. It is an orphaned record.
7. Timeliness: The data should be available when it’s expected and needed by the user. Whether data is timely depends on user expectations.
Example: A hotel booking site should update availability records in real time, while a billing system may only need to update once per day.
Using these seven dimensions, you can assess whether your data can suitably meet your needs. You may find, however, that your data does not meet these standards and you should be worried about the negative effects this could have on your company.
There are many ways that bad data can enter your system, but they all fall into two major categories: human error and systems challenges.
Simple human error is to blame for the majority of data quality issues. During the course of the workday, employees may mistakenly enter typos into the system, whether due to distraction, misunderstanding, or simple mistyping, and introduce inaccuracy into the system.
Similarly, employees may fail to follow company guidelines on data entry that lead to conformity or completeness problems. Or, multiple employees may record data for the same person or company, but enter that information differently, creating duplicates—in other words, a uniqueness issue.
Employees aren’t the only ones that can enter data incorrectly. If you have any data entered by customers, prospects, or others outside of your organization, they too can create problems. For example, if you use lead forms, a potential customer could misinterpret the intent of a field, enter their information slightly differently than the first time they filled out a lead form, etc.
The technologies we use on a daily basis can also cause problems with data quality. Most companies rely on multiple systems and software platforms to run their business, and if those systems don’t integrate properly, you’ll be dealing with multiple versions of the “truth” and creating consistency errors.
Any time you change systems—such as your CRM, marketing automation platform, or billing software—and have to migrate your existing data to the new platform, there is an inherent risk to data quality. Data can get lost or mixed up in the transfer. And since most system migrations rely on humans to do the final review, it’s possible for additional human error to creep in.
Another data quality problem that originates with the systems you use comes from platform updates. While the original creators of the platform typically understand why and how user data is managed, new or outside developers are often more focused on functions, not on the impact to the data. As changes are made, data integrity can be affected.
Lastly, the fact is that software isn’t perfect, especially in today’s world of highly interconnected systems. With such high complexity, there are increased opportunities for system errors to introduce data quality problems.
There is one other way that data quality can be impacted, without the fault of humans or systems: it decays into inaccuracy. A customer may move to a new address, your contact at a company might change jobs, etc. When each of these changes occurs, your once-good-quality data becomes outdated, poor quality data.
With so many ways that data quality can be impacted, it’s likely that your company suffers from some degree of bad data. A few small issues may not affect your business drastically, but if problems are widespread, or if your few errors are critical, the consequences can be significant.
Your decisions are only as good as the data they’re based on. As a business leader, you need good quality data so that you can make the right decisions for your business. Looking at the long-term consequences, your overall strategy will suffer, both when you create it and as you execute it.
One particularly dangerous aspect of poor data quality is the false sense of security it can impart. Extensive or serious data errors could blind you to problems in your business. Left unattended, those errors could lead to much bigger problems down the road.
Poor data quality can significantly reduce productivity, create inefficiencies, and increase operational costs.
On a day-to-day basis, employees have to accommodate known issues. For example, your sales manager may struggle to work through forecasts because they know the data in the CRM is incomplete. With good quality data, this task would be fairly simple; but with bad data to work from, the sales manager is forced to track down numbers that should be in the system, or produce a weak estimate for their reports.
When your employees have to deal with data dilemmas, it’s sometimes easier for them to make quick corrections than to solve the root of the problem, especially if they’re facing a deadline. This increases the likelihood that human error will introduce further errors into your system, and cause more problems down the line.
For your analysts, data scientists, and other knowledge workers, finding and resolving problems in data can take up a majority of their time—time that could be spent on higher-value tasks. A Forrester report indicates that “nearly one third of analysts spend more than 40% of their time vetting and validating their analytics data before it can be used for strategic decision-making.” The cost of that productivity loss can add up quickly.
Managing, and even simply coping with, bad data can have a significant impact on employee morale. Employees who were hired for high-skill work are unlikely to find satisfaction in manual data cleanup. Meanwhile, the frustration of dealing with inaccurate, incomplete, or inconsistent data makes work more difficult and less satisfying.
Further, when your data is inconsistent between systems, your company is dealing with multiple sources of “truth.” As a result, teams are likely to disagree on which system is correct and reliable, and it will be difficult to align employees to shared objectives.
The damage done by bad data isn’t limited within your organization. Quality issues with customer data impacts your clients as well. Lost reservations, billing errors, botched deliveries, and more can be frustrating for customers. And while the impact of fixing these errors will be felt in your customer service department, the real challenge will come when reviews from those frustrated customers begin to roll in.
Managing customer satisfaction is crucial because even one bad review can be damning: Research shows it takes 40 positive customer experiences to undo the damage of a single negative review. Reviews often play a huge role in potential customers’ decision-making process, so it’s critical to avoid the data quality issues that lead to poor customer experiences.
It’s clear that an organization plagued by data quality problems will suffer. But what is the impact to the bottom line? How do all of these issues add up?
According to Gartner, recent research has shown that organizations believe that poor data quality is responsible for an average of $15 million per year in losses. That is a drastic hit to business value.
Moving into the future, it’s unlikely that data quality concerns will be decreased. Even with the help of advanced data validation capabilities and machine learning, big data is going to continue getting bigger, and the systems that support it more and more complex. If businesses fail to get ahead of their data problems, they’ll soon be overrun by them.
You can’t improve data quality passively. The only way to avoid the potential damage bad data can cause is to proactively resolve existing errors in your data and systems, prevent future errors from being introduced, and change the way your organization’s attitude and culture surrounding data.
To avoid the pitfalls mentioned above and minimize the costs, you have to make an investment in data quality management. Depending on the extent of your problems, you may be able to find a data quality software solution that meets your needs, or it may be necessary to outsource the work to the experts.
Data quality software can audit your database to perform a variety of tasks, such as address validation, deduplication, profiling, match and merge, and more.
It’s not just about improving data quality and fixing existing errors, but how to prevent poor data quality in the first place. A data quality software can help catch new errors as they’re entered into your system, however, it’s better to prevent them altogether.
If your organization hasn’t set guidelines around data entry processes, or if your rules have become outdated as your company has grown, take some time to review the existing issues and see if there are guidelines you can create, or rules within the system that you can set up.
For example, if there are new fields you need your sales team to enter in your CRM, set a parameter to require those fields before a record can be saved. This will prevent incomplete records from being entered into your database.
For many organizations, improving data quality isn’t just about fixing the errors in the system. It’s about changing the data culture of your business.
The reality is that a commitment to data quality starts at the top. If your leadership team is not bought-in on improving data quality, you’re going to continue having problems. (Further, you’ll need their assistance as you reengineer processes and guidelines to prevent future data errors.) To get buy-in from your leadership team (or other stakeholders), follow these five steps from Gartner to create a business case for data quality improvement.
Once your leadership team and stakeholders are committed to improving data quality, you’ll begin to see it trickle down through the ranks. To help it along, there are several ways you can support your data culture:
It’s clear that poor data quality can have a serious and far-reaching negative impact on your company. But what do you stand to gain once your data is in tip-top shape?
First off, you can be confident in your data—as well as in the decisions you make and the strategy you build around it. And it isn’t just your leadership team. As your employees are able to trust the data they’re using, they’ll become more efficient and more able to innovate because they know the data isn’t just a house of cards.
Fewer data errors means you’re losing fewer resources to fix them. Team members who would normally take significant time out of their regular jobs to manage data quality will be able to keep their focus on the work that really matters. Putting aside the dull clean-up work gives them more opportunity to do what is truly fulfilling and beneficial for the company’s growth.
One of the biggest benefits of good-quality data is that it becomes a single source of truth that everyone agrees on and relies on. No more confusion, or picking sides, or pointing fingers—at least, not about which platform has the “right” number. Instead, your team can unite around the data and align to company-wide objectives, knowing that the numbers won’t lie.
Bad data can maim a business. But with good, clean data, your team can do better work. As you experience each of these benefits, your company will have the chance to truly flourish.