The Data Quality Chronicle

An event based log about a service offering

Category Archives: data quality metrics

Data Quality Polls: Troubled domains and what to fix

With which data domain do you have the most quality issues?

As expected, customer data quality remains at the top of list with regard to having the most issues. Ironically, this domain has been at the forefront of the data quality industry since its inception.
One reason for the proliferation of concerns about customer data quality could be its direct link to revenue generation.
Whatever the reason, this poll seems to indicate that services built around the improvement of customer data quality will be well founded.

What would you improve about your data?

Once again there are no surprises when looking at what data improvements are desired. Data owners seem to be interested in a centralized, synchronized, single view of their data, most notably customer.

The good news that can be gathered from these polls is that as an industry, data quality is focused on the right data and the right functionality.  Most data quality solutions are built around the various aspects of customer data quality and ways to improve it so there is a master managed, single version of a customer.  The bad news is we’ve had that focus for quite some time and data owners are still concerned. 

In my opinion, this is due to the nature of customer data.  Customer data is at the core of every business.  It is constantly changing both in definition and scope, it is continuously used in new and complex ways, and it is the most valuable asset that an organization manages.

One thing not openly reflected in these polls is that it is likely that the same issues and concerns that are present in the customer domain are also present in the employee and contact domains.  However, they tend not to “bubble up” to the top of list due to lack of linkage to revenue and profit.

I’d encourage comments and feedback on this post.  If we all weigh in on topics like this, we can all learn something valuable.  Please let me know your thoughts on the poll results, my interpretation of the results and opinions.


Data Quality Basic Training

Recently a reader asked me if I had any posts on “data quality basics”.  Turns out, I didn’t.  So I’ve decided to put together a series of posts that covers what I feel are the basic essentials to a data quality program.

[tweetmeme source=”dqchronicle” only_single=false]

The many sides of data quality

It is generally accepted in the data quality world that there are  seven categories by which data quality can be analyzed.  These include the following:

  1. Conformity
  2. Consistency
  3. Completeness
  4. Accuracy
  5. Integrity
  6. Duplication
  7. Timeliness
  • Conformity – Analyzing data for conformity measures adherence to the data definition standards.  This can include determining if data is of the correct type and length
  • Consistency – Analyzing data for consistency measures that data is uniformly represented across systems.  This can involve comparing the same attribute across the various systems in the enterprise
  • Completeness – Analyzing data for completeness measures whether or not required data is populated.  This can involve one or more elements and is usually tightly coupled with required field validation rules
  • Accuracy – Analyzing data for accuracy measures if data is nonstandard or out-of-date.  This can involve comparing data against standards like USPS deliverability and ASCII code references
  • Integrity – Analyzing data for integrity measures data references that link information like customers and their addresses.  Using our example, this analysis would determine what addresses are not associated with customers
  • Duplication – Analyzing data for duplication measures the pervasiveness of redundant records.  This involves determining those pieces of information that uniquely define data and identifying the extent to which this is present
  • Timeliness – Analyzing data for timeliness measures the availability of data.  This involves analyzing the creation, update or deletion of data and its dependent business processes

Read more of this post

Begin at the end … ensuring data quality success!

Due to the fact that data is there before a data quality project, and it is there after a data quality project, data quality is not as clear an impact on the business as a traditional application development project.  This is particularly true of customer data management oriented data quality projects where the primary objective is to “de-dup” or consolidate the data.  Afterall, in the end there is just less data. 

When this is looked upon purely from a software perspective there’s not much difference.  Sure, there are cost savings associated with the reduction in the storage requirements.  There might even be some increased performance in dependent applications due to the reduced volume.  However this is hardly a justification for the investment that a typical data quality initiative requires.  This is particularly inconvenient considering most of the investment is in software and other technology related resources.

However consider the impact of a data quality project which consolidates customer data from a business perspective and see a different side of things.  Consider the benefits of less, unnecessary, possibly inaccurate customer data.

  • fewer mailings to reach the same customer providing a direct cost savings
  • fewer mailings to reach the same household providing a direct cost savings
  • fewer mailings required overall providing a direct cost savings
  • fewer failed mailing attempts due to address validation providing a direct cost savings
  • fewer customer service requirements due to single view of the customer providing a direct cost savings
  • more accurate perspective of customer product portfolio providing a direct increase in marketing opportunities

Now (re)consider the substantial impact that can be realized from a consolidation effort.  Furthermore as long as data quality initiatives are implemented into ongoing operational data services, these cost reductions extend into the future producing benefits in the long term.  This further justifies the cost of implementing data quality services into an organization as a long term solution. 

This is why it is critical to the success of a data quality project to have clear goals that are aligned with a business initiative. 

However this is not the end of the line when it comes to ensuring success.  To do this you have to start with a goal like the ones listed above and define ways in which these types of goals can be measured. 

For example the first bullet point is a data quality goal tied to the business initiative of reducing duplicate customer data. To support this a data quality matching process can be defined that uses criteria to identify redundant customer transactions and consolidate them into a survivor record.  The affect the data quality initiative has on this business process can be measured in terms of the reduction in total mailings required to complete a marketing campaign.  More importantly, it can be measured in terms of a reduction in total dollars required to fund the new and more concise direct mailing campaign.  Now the data quality process and its results can be linked directly to a reduction in budget.  Clearly metrics like these make it obvious that a data quality initiative that merely reduces data has a tremendous amount of value. 

If you define a list like this with business stakeholders driving the process, before the data quality project is implemented, there will be a clear path to success as well as an easy way to quantify it once the solution is deployed!