Error: Twitter did not respond. Please wait a few minutes and refresh this page.
An event based log about a service offering
Normally I use this blog to tell stories of a very technical nature. However, a recent experience has led me in a very different direction. I want to talk informally about what it takes to prepare for a data quality project.
[tweetmeme source=”dqchronicle” only_single=false http://wp.me/prD2R-ep]
First and foremost, there needs to be a catalyst for starting a data quality project. Typically this usually takes the form of anecdotal tales of woe from data consumers trying to use enterprise data. From these anecdotes can be extracted a series of problem areas that need to be addressed. For instance, if marketing director complains that direct marketing campaign costs are rising due to mailing multiple mailers to one address, then there is the identification of the need to perform household analysis and consolidation. In a large enterprise, interviews need to be conducted to gather these stories and reduce them to a set of data quality operations.
Implementing an enterprise data quality program is a challenging endeavor that can seem overwhelming at times. It requires coordination and cooperation across the technology and business domains along with a clear understanding of the desired outcome. A data quality program is fundamental to numerous other enterprise information management intiatives, not the least of which are master data management and data governance. In fact, you’ll recognize some of the same best practices from those disciplines.
[tweetmeme source=”dqchronicle” only_single=false http://wp.me/prD2R-e7]
Gaining business level sponsorship for the data quality program is essential to its success for many reasons. Not the least of which is the fact that poor data quality is a business problem which negatively impacts business processes. Sponsorship from the business provides a means for the communication of these problems and impacts. Business level sponsorship should be built upon data quality ROI and provide the direction for what data resides in the scope of the data quality program.
Data stewards are business level resources that represent individual data domains and provide relevant business rules to form remediation steps. They also develop business relevant metrics and targets that enable the measurement of quality and trend reporting that establishes the rate of return on existing remediation techniques.
Data quality is not a one-and-done project. It is a cycle of activities that need to be continuously carried out over time. Enterprise data tends to be an evolving and growing asset. Assigning a leader, or set of leaders, to a data quality program ensures the data quality cycle of activities maintains consistency as the data landscape undergoes this evolutionary growth.
Defining service level agreements with the business data stewards provides a basis for operational prioritization within the data quality program. As new data defects are discovered it will be critical to determine which to schedule for implementation. Service level agreements provide direction from each business unit and enable appropriate change management scheduling.
Common data management practices such as data migration and data archive scheduling need to be taken into account when determining when and what data to assess and remediate. Profiling and assessing data which is scheduled for archival would be an egregious misappropriation of resources. By aligning the data quality program with these types of data management activities this type of mistake can be avoided.
A proactive to data quality increases data consumer/business confidence, reduces costs associated with unplanned data correction activities and fines associated with failure to meet regulatory compliance. A proactive approach also establishes, with the business, a level of domain expertise that fosters the necessary buy-in. Fundamental to this concept are data quality assessments and data quality trend reporting (score-carding).
With the maturation of the data quality vendor market, it is now possible to implement enterprise capable data quality software that offer a full range of features to management data defect identification, remediation and reporting. Automated tools are more comprehensive, consistent, portable, and include built-in modules, such as address validation services, which reduce code development.
Metrics, and their associated targets, are the cornerstones to developing an assessment and remediation process that fosters affective change. The definition of metrics and their targets needs to be centered on data elements that are essential to the core set of business processes. Targets should be divided into three groupings which are reflective of ability to support these processes. At a base level these groupings should be “does not support business function”, “minimally supports business function”, and “supports business function at a high level”.
With numerous applications consuming and delivering data across the enterprise, it can be a daunting task to decide where to start correcting quality deviations. In an effort to reduce this complexity, it is a best practice to institute data quality activities where the data originates. The origin point of data is commonly referred to as the system of record.
This practice not only provides an answer of where to begin implementing data quality practices, it also proactively addresses the issue of defect proliferation. This ensures that these activities are not duplicated numerous times and are implemented in a consistent manner. As a consequence, data needs to be measured for quality upon creation and/or migration into the data landscape.
Focusing data remediation efforts on mission critical data is the control that ensures a return on the investment of the program. Identifying the data that support core business functions requires careful examination of the process and participation from the business data stewards. Often times this process also requires prioritization of critical elements in order to schedule remediation efforts. The identification of this data is vital to the success of the data quality program.
While there are many more best practices in the data quality domain, these ten form a solid foundation for the implementation of a data quality program. This practices, as you may notice, are more focused on establishing a data quality program rather than the remediation efforts within the program. In a future post, I’ll examine some common remediation techniques which are universal to data quality programs.
On a recent engagement I was tasked with performing extensive data discovery on a large amount of data in various systems. While the normal practice of a data quality initiative is to work toward an established business goal, in this case we were not immediately sure what that goal would include. The impact of that condition is that we were effectively “fishing” to determine where the data stood. In essence, we were building a “current state” definition of the data which could then be used to determine what types of data quality goals needed to be established.
Data Discovery is useful in determining a current state of the data environment
As a result of the discovery process, we were able to build profiles of tables that included various aspects of the attributes like data types, field lengths and patterns, and uniqueness. With standard patterns, lengths and types established, outlier reports were created identifying which attributes required data cleansing and more stringent data governance. From this analysis, the framework of a data quality program began taking shape.
Data Discovery is useful in developing a data quality framework
Even though the data discovery played an essential role in the development of the current state assesment and framework of the data quality program, it’s important to realize that it did not provide everything required to develop these. While data discovery can describe numerous aspects of the current state of the data, it cannot determine what the optimal state should be. Discovery does not have a vision of what should be, it can only describe what is. Data discovery should not be viewed as a way to replace engaging the business about how they want the data to look. It is a data tool that can help data quality practioners get up-to-speed qucikly on the current state of the data landscape. At best, it helps the data quality practioner make suggestions about what types of data cleansing might be required.
Data Discovery is not able to determine the optimal state of the data environment
Data Discovery is not a replacement for business knowledge and vision
While this post is not a particularly detailed post, I feel like it is an important topic to cover and discuss. If you listen to the sales hype, it is easy to get the impression that data discovery is your answer to having meetings with the business in order to build data quality goals. This is such a dangerous prospect that I have begun to state this at the beginning of all my data discovery conversations. Don’t get me wrong, I value data discovery tools. I recognize their importance. However, I’m a data quality guy and not a bsuiness owner.
Does a business owner care about data patterns and lengths? I doubt it. It’s critical to be able to present this type of information in a business context that means something to a business owner or business user. Ultimately this conversation starts with some type of business goal. For instance, when performing data discovery on email addresses it would be more effective to explain how direct marketing will be affected due to the fact that 10% of the data cannot be used to send electronic marketing materials rather than 10,000 values do not contain an “@” symbol.
In summary, data discovery is useful in gaining insight as to the current state of the data landscape, however, data discovery is not a “silver bullet solution” to data quality, master data mangement or data governance initiatives.