The Data Quality Chronicle

An event based log about a service offering

Category Archives: customer data integration

Data Quality Polls: Troubled domains and what to fix

With which data domain do you have the most quality issues?

As expected, customer data quality remains at the top of list with regard to having the most issues. Ironically, this domain has been at the forefront of the data quality industry since its inception.
One reason for the proliferation of concerns about customer data quality could be its direct link to revenue generation.
Whatever the reason, this poll seems to indicate that services built around the improvement of customer data quality will be well founded.

What would you improve about your data?

Once again there are no surprises when looking at what data improvements are desired. Data owners seem to be interested in a centralized, synchronized, single view of their data, most notably customer.

The good news that can be gathered from these polls is that as an industry, data quality is focused on the right data and the right functionality.  Most data quality solutions are built around the various aspects of customer data quality and ways to improve it so there is a master managed, single version of a customer.  The bad news is we’ve had that focus for quite some time and data owners are still concerned. 

In my opinion, this is due to the nature of customer data.  Customer data is at the core of every business.  It is constantly changing both in definition and scope, it is continuously used in new and complex ways, and it is the most valuable asset that an organization manages.

One thing not openly reflected in these polls is that it is likely that the same issues and concerns that are present in the customer domain are also present in the employee and contact domains.  However, they tend not to “bubble up” to the top of list due to lack of linkage to revenue and profit.

I’d encourage comments and feedback on this post.  If we all weigh in on topics like this, we can all learn something valuable.  Please let me know your thoughts on the poll results, my interpretation of the results and opinions.


Data Quality Basic Training

Recently a reader asked me if I had any posts on “data quality basics”.  Turns out, I didn’t.  So I’ve decided to put together a series of posts that covers what I feel are the basic essentials to a data quality program.

[tweetmeme source=”dqchronicle” only_single=false]

The many sides of data quality

It is generally accepted in the data quality world that there are  seven categories by which data quality can be analyzed.  These include the following:

  1. Conformity
  2. Consistency
  3. Completeness
  4. Accuracy
  5. Integrity
  6. Duplication
  7. Timeliness
  • Conformity – Analyzing data for conformity measures adherence to the data definition standards.  This can include determining if data is of the correct type and length
  • Consistency – Analyzing data for consistency measures that data is uniformly represented across systems.  This can involve comparing the same attribute across the various systems in the enterprise
  • Completeness – Analyzing data for completeness measures whether or not required data is populated.  This can involve one or more elements and is usually tightly coupled with required field validation rules
  • Accuracy – Analyzing data for accuracy measures if data is nonstandard or out-of-date.  This can involve comparing data against standards like USPS deliverability and ASCII code references
  • Integrity – Analyzing data for integrity measures data references that link information like customers and their addresses.  Using our example, this analysis would determine what addresses are not associated with customers
  • Duplication – Analyzing data for duplication measures the pervasiveness of redundant records.  This involves determining those pieces of information that uniquely define data and identifying the extent to which this is present
  • Timeliness – Analyzing data for timeliness measures the availability of data.  This involves analyzing the creation, update or deletion of data and its dependent business processes

Read more of this post

On Cloud 9!

Apologies …

I’ve been in the clouds lately, in more ways than one.  I’ve been on the road performing another data quality assessment on an island in the Pacific.  This translates into the fact that I’m gaining status on multiple airlines and becoming increasingly appreciative of noise canceling technologies.

[tweetmeme source=”dqchronicle” only_single=false http://]

I’m also gaining an appreciation for another technology, cloud based data quality solutions!  I am leveraging Informatica’s latest data quality platform, IDQ v9.  IDQ v9 brings to mind a favorite 80’s commercial of mine where peanut butter and chocolate are combined into one tasty treat!  For sure there is a little PowerCenter in your Data Quality and a little Data Quality in your PowerCenter …

50,000 foot level view

I won’t try to cover the upgrade in one post, but rather just wet your appetite on what’s inside the wrapper.  We can binge on the details in the coming months.  For now let me just highlight what I feel are the most significant enhancements of v9.

  1. Quick start solution: the cloud based solution of v9 almost eliminates the previous installation requirements of IDQ 8.x
  2. Data Explorer and Data Quality are now one product: This cuts installation and repository management by 50% (at least)
  3. PowerCenter integration means ETL and DQ have tied the knot!: Now you can stop using TCL & SQL scripting and leverage PowerCenter’s integration components.  This includes the consolidation component, a particularly important component to master data management and customer data integration initiatives
  4. Inline data viewing: now you can unit test your transforms without a full run on your mappings, saving time and increasing productivity
  5. Inline data profiling: now you can report on data quality processing without leaving your development environment and share it with client via a URL!

As I mentioned, I’ll dive deeper when I’m not in the middle seat of row 42 somewhere over the Pacific.  For now my big take-aways are the time-saving features that are almost everywhere and the integration with PowerCenter takes data quality integration to the next level.

No more afternoons installing client and scripting repositories.  No more SQL development to valid and analyze results.  No more TCL script (I love that!).  No more SQL consolidation nightmares!

Up to speed

As for a learning curve on the new look and feel?  It took me a few hours, of which most were productive hours I was able to leverage into real work.  Hopefully I can translate what I’ve learned and cut your learning curve down with posts in the coming months. For now, the drink cart is coming my way and I’ve got cash handy!

Check back next month when I go over how I was able to deliver more analysis in a shorter time frame and look good doing it!