Normally I use this blog to tell stories of a very technical nature. However, a recent experience has led me in a very different direction. I want to talk informally about what it takes to prepare for a data quality project.
[tweetmeme source=”dqchronicle” only_single=false http://wp.me/prD2R-ep]
First and foremost, there needs to be a catalyst for starting a data quality project. Typically this usually takes the form of anecdotal tales of woe from data consumers trying to use enterprise data. From these anecdotes can be extracted a series of problem areas that need to be addressed. For instance, if marketing director complains that direct marketing campaign costs are rising due to mailing multiple mailers to one address, then there is the identification of the need to perform household analysis and consolidation. In a large enterprise, interviews need to be conducted to gather these stories and reduce them to a set of data quality operations.
This is in contrast to kicking off a data quality project without capturing these requirements. Data quality is many things to many organizations and requires focus to be most effective. Failure to log the details on the various anecdotes is a failure in being prepared to start a data quality project.
Another essential aspect of preparing for a data quality project is defining what data is in scope. This requirement is highly coupled with the reasons for undetaking the effort. In other words, if the marketing department is the business unit questioning the usability of the data then it is logical to focus on the data that deals with customer and customer contact details.
This is in contrast to defining a source system as the in-scope parameter of the data quality project. I’ve rarely come across an enterprise source system that consisted of less than a thousand tables with countless data elements. Failure to detail the data elements in question is a failure in being prepared to start a data quality project.
A final essential requirement to any data quality project is a data quality environment which consists of an application server with the required software and data connections. I can’t say how many times this requirement is overlooked. Due to typcial enterprise requisition processes, this step needs to be initiated prior to engaging a team to conduct data quality effort. I’ve spent weeks waiting for hardware to do the analysis on and even longer waiting for the data. In some cases, this hardware can reside outside of the organizational structure. If this is the case, it is most advisable to obtain the required permission to have the data exist outside the corporate firewall.
In short, before you decide to start a data quality project know why, what, and where you are going to do the analysis. I know this may sound obvious but, believe me, it is not. I’ve spent so much time tracking these three pieces of criteria that I have started development on a data quality project checklist that will be included in various project initiation documents such as proposals and statement of work.
Let this post be a warning signal to corporate sponsors and data quality practioners alike when you decide to start a data quality project, get ready ’cause here I come!