The Data Quality Chronicle

An event based log about a service offering

Category Archives: data quality polls

Data Quality Polls: Troubled domains and what to fix

With which data domain do you have the most quality issues?

As expected, customer data quality remains at the top of list with regard to having the most issues. Ironically, this domain has been at the forefront of the data quality industry since its inception.
One reason for the proliferation of concerns about customer data quality could be its direct link to revenue generation.
Whatever the reason, this poll seems to indicate that services built around the improvement of customer data quality will be well founded.

What would you improve about your data?

Once again there are no surprises when looking at what data improvements are desired. Data owners seem to be interested in a centralized, synchronized, single view of their data, most notably customer.

The good news that can be gathered from these polls is that as an industry, data quality is focused on the right data and the right functionality.  Most data quality solutions are built around the various aspects of customer data quality and ways to improve it so there is a master managed, single version of a customer.  The bad news is we’ve had that focus for quite some time and data owners are still concerned. 

In my opinion, this is due to the nature of customer data.  Customer data is at the core of every business.  It is constantly changing both in definition and scope, it is continuously used in new and complex ways, and it is the most valuable asset that an organization manages.

One thing not openly reflected in these polls is that it is likely that the same issues and concerns that are present in the customer domain are also present in the employee and contact domains.  However, they tend not to “bubble up” to the top of list due to lack of linkage to revenue and profit.

I’d encourage comments and feedback on this post.  If we all weigh in on topics like this, we can all learn something valuable.  Please let me know your thoughts on the poll results, my interpretation of the results and opinions.


Data Quality Poll Results: Edition #1


Whenever people participate in a polling exercise they provide a valuable perspective to the poller.  With that in mind, I’d like to thank all that voted in the first set of The Data Quality Chronicle Polling exercise.  It’s been an exciting exercise that I’m looking forward to expanding on in the future.

Basic Poll Design

A polling section of the blog was setup in order to solicit information from the readers on various topics. In an attempt to gain a picture of the larger data quality picture, the first edition of the polling contained very basic questions. The questions were as follows:

  1. What is the biggest challenge in data quality?
  2. What is the biggest opportunity in data quality?
  3. What is your data quality tool of choice?

[tweetmeme source=”dqchronicle”]

Data Quality Challenges

With regard to the biggest challenge in data quality, three answers were pre-defined along with an “other” option for poll participants to fill in their own answer. The three pre-defined answers were as follows:

  1. Access to the data
  2. Comprehensive domain expertise of all the data involved
  3. Duplication of critical data

Poll Results

The results of the polling regarding data quality challenges are depicted in Figure 1 below.

Figure 1 Biggest Data Quality Challenge Results

Data Quality Opportunities

With regard to the biggest opportunities in data quality, three answers were pre-defined along with an “other” option for poll participants to fill in their own answer. The three pre-defined answers were as follows:

  1. Increased accuracy and confidence in critical data
  2. Compliance improvements
  3. De-duplication of master data

Poll Results

The results of the polling regarding data quality opportunities are depicted in Figure 2 below.

Figure 2 Biggest Data Quality Opportunities Results

Data Quality Tool of Choice

With regard to the data quality tool of choice, three answers were pre-defined along with an “other” option for poll participants to fill in their own answer. The three pre-defined answers were as follows:

  1. Trillium
  2. Informatica Data Quality
  3. DataFlux

Poll Results

The results of the polling regarding data quality tool of choice are depicted in Figure 3 below.

Figure 3 Data Quality Tool of Choice Results


Even though the sample size was relatively small, I feel like there are some strong conclusions that can be drawn when reviewing this simple polling exercise.

The two strongest themes regarding data quality challenges seem to be that data quality is an important aspect of enterprise data management and that critical data to the enterprise is often duplicated.  This seems to be a common theme in the writings I have observed from many of the leading data quality professionals.

When it comes to summing up the biggest opportunity in implementing a data quality initiative the strongest theme was an increased accuracy and confidence in critical data.  This result seems to correlate strongly with the results regarding data quality challenges where data quality is defined as an important aspect of enterprise data management. 

In reviewing the results of the data quality tool of choice, my observations of the data quality software market were reinforced.  With so many vendors offering data quality solutions, no one tool dominates the market.  While Trillium  received the most votes; IBM, Omikron, and Datanomic were also popular choices.  For that matter the decision to “build over buy” was just as popular.

Again, I’d like to thank those that participated in this first edition of data quality polling at The Data Quality Chronicle! 

Keep checking back to participate in our second edition coming soon …

First Edition – An Introduction

As the name implies, I want this blog to be a chronicle of data quality.  The process of data quality not just the concept.  The process from project kick-off to implmentation and each step in between. 

I intend on recording my experiences on data quality initiatives in order to present a body of evidence regarding the opportunities and challenges that exist as part of data quality initiatives.   I’ll also be polling the community in an attempt to gather a comprehensive list these opportunities and challenges.  With each post I’ll try to, both directly and indirectly, address one of the items on the list.  The intent will be to gather enough information to provide a guideline or map to help those interested in data quality improvements.
Before we dive head-long into this effort, let me take a few minutes to explain how I got here.  I am a technology follower who has been fortunate enough to embed technology into my life as a career.  I have been a consultant to Fortune 500 clients for over 12 years spanning all aspects of software development from quality assurance testing to business intelligence.  My latest technology fascination involves master data management and data quality.  I’m also a devote follower of internet technologies, specifically as it relates to the internet as a platform for learning/intelligence and data management.   The organization of unstructured data and its use in corporate strategy will definitely make the list!  My primary role on initiatives is as an analyst.  At times I focus on business problems and facilitate solutions.  Other times I am a data miner, searching for answers to critical questions.
Stay tuned for my next post! In the meantime, stop by the polling page and participate!