Error: Twitter did not respond. Please wait a few minutes and refresh this page.
An event based log about a service offering
As expected, customer data quality remains at the top of list with regard to having the most issues. Ironically, this domain has been at the forefront of the data quality industry since its inception.
One reason for the proliferation of concerns about customer data quality could be its direct link to revenue generation.
Whatever the reason, this poll seems to indicate that services built around the improvement of customer data quality will be well founded.
Once again there are no surprises when looking at what data improvements are desired. Data owners seem to be interested in a centralized, synchronized, single view of their data, most notably customer.
The good news that can be gathered from these polls is that as an industry, data quality is focused on the right data and the right functionality. Most data quality solutions are built around the various aspects of customer data quality and ways to improve it so there is a master managed, single version of a customer. The bad news is we’ve had that focus for quite some time and data owners are still concerned.
In my opinion, this is due to the nature of customer data. Customer data is at the core of every business. It is constantly changing both in definition and scope, it is continuously used in new and complex ways, and it is the most valuable asset that an organization manages.
One thing not openly reflected in these polls is that it is likely that the same issues and concerns that are present in the customer domain are also present in the employee and contact domains. However, they tend not to “bubble up” to the top of list due to lack of linkage to revenue and profit.
I’d encourage comments and feedback on this post. If we all weigh in on topics like this, we can all learn something valuable. Please let me know your thoughts on the poll results, my interpretation of the results and opinions.
Implementing an enterprise data quality program is a challenging endeavor that can seem overwhelming at times. It requires coordination and cooperation across the technology and business domains along with a clear understanding of the desired outcome. A data quality program is fundamental to numerous other enterprise information management intiatives, not the least of which are master data management and data governance. In fact, you’ll recognize some of the same best practices from those disciplines.
[tweetmeme source=”dqchronicle” only_single=false http://wp.me/prD2R-e7]
Gaining business level sponsorship for the data quality program is essential to its success for many reasons. Not the least of which is the fact that poor data quality is a business problem which negatively impacts business processes. Sponsorship from the business provides a means for the communication of these problems and impacts. Business level sponsorship should be built upon data quality ROI and provide the direction for what data resides in the scope of the data quality program.
Data stewards are business level resources that represent individual data domains and provide relevant business rules to form remediation steps. They also develop business relevant metrics and targets that enable the measurement of quality and trend reporting that establishes the rate of return on existing remediation techniques.
Data quality is not a one-and-done project. It is a cycle of activities that need to be continuously carried out over time. Enterprise data tends to be an evolving and growing asset. Assigning a leader, or set of leaders, to a data quality program ensures the data quality cycle of activities maintains consistency as the data landscape undergoes this evolutionary growth.
Defining service level agreements with the business data stewards provides a basis for operational prioritization within the data quality program. As new data defects are discovered it will be critical to determine which to schedule for implementation. Service level agreements provide direction from each business unit and enable appropriate change management scheduling.
Common data management practices such as data migration and data archive scheduling need to be taken into account when determining when and what data to assess and remediate. Profiling and assessing data which is scheduled for archival would be an egregious misappropriation of resources. By aligning the data quality program with these types of data management activities this type of mistake can be avoided.
A proactive to data quality increases data consumer/business confidence, reduces costs associated with unplanned data correction activities and fines associated with failure to meet regulatory compliance. A proactive approach also establishes, with the business, a level of domain expertise that fosters the necessary buy-in. Fundamental to this concept are data quality assessments and data quality trend reporting (score-carding).
With the maturation of the data quality vendor market, it is now possible to implement enterprise capable data quality software that offer a full range of features to management data defect identification, remediation and reporting. Automated tools are more comprehensive, consistent, portable, and include built-in modules, such as address validation services, which reduce code development.
Metrics, and their associated targets, are the cornerstones to developing an assessment and remediation process that fosters affective change. The definition of metrics and their targets needs to be centered on data elements that are essential to the core set of business processes. Targets should be divided into three groupings which are reflective of ability to support these processes. At a base level these groupings should be “does not support business function”, “minimally supports business function”, and “supports business function at a high level”.
With numerous applications consuming and delivering data across the enterprise, it can be a daunting task to decide where to start correcting quality deviations. In an effort to reduce this complexity, it is a best practice to institute data quality activities where the data originates. The origin point of data is commonly referred to as the system of record.
This practice not only provides an answer of where to begin implementing data quality practices, it also proactively addresses the issue of defect proliferation. This ensures that these activities are not duplicated numerous times and are implemented in a consistent manner. As a consequence, data needs to be measured for quality upon creation and/or migration into the data landscape.
Focusing data remediation efforts on mission critical data is the control that ensures a return on the investment of the program. Identifying the data that support core business functions requires careful examination of the process and participation from the business data stewards. Often times this process also requires prioritization of critical elements in order to schedule remediation efforts. The identification of this data is vital to the success of the data quality program.
While there are many more best practices in the data quality domain, these ten form a solid foundation for the implementation of a data quality program. This practices, as you may notice, are more focused on establishing a data quality program rather than the remediation efforts within the program. In a future post, I’ll examine some common remediation techniques which are universal to data quality programs.
There are plenty of aspects of address validation to write about.
Validating addresses can be done with many different tools, each with their own specific details on how to do it. There are various ways to validate address within each tool to produce different outcomes. And there are various ways to manage and integrate this data back into enterprise, operational systems.
While all of this content is very helpful and important to convey, it is my belief that it is essential to understand, write about and discuss why address validation is important to an organization and their master data management (MDM) efforts.
Here are a few reasons why address validation is so important to any organization (feel free to comment and suggest others!):
I’m sure there are other reasons, but the key message to convey is that validating address information is a critical component to business operations throughout the enterprise.
Now that we’ve covered why it is important to validate address data, let’s concentrate on how. While there are many tools on the market with which to validate addresses, I use Informatica’s v9 product which integrates AddressDoctor v5 to accomplish validation.
In order to validate address data with Informatica there are three required components:
As far as address inputs are concerned, no surprises here. Typcial address information such as street number and name, city or locality, state or province and postal code can be passed into the GAV module. Of these, the street number and name along with the postal code are required.
The Address Doctor address validation service performs several types of address processing which are beneficial to address validation and address matching. Among these are:
Delivery address validation involves verificaiton, and if necessary correction, of the street number, street name and any sublocation information. This process ensures that the address is valid and deliverable via USPS standards. This process is a must for marketing and billing operations that want to ensure mailings reach their destination.
Formatted address validation takes the various inputs provided and arranges them into the standard mailing format. By standard mailing format, I mean the way the information should be presented on a mailing envelope. This process is particularly useful to marketing operations to process “raw” address data into suitable mailing data. Refer to the illustration below for an example of formatted address validation.
When it comes to address validation perhaps the most important result is the verification of the deliverability of the address. The GAV offers a way to investigate and report on this critical aspect of an address. Below is an illustration of address validation results including address deliverability. As part of this process there are match codes provided that range from validating the address as deliverable to stating the address could not be validated or corrected. Refer to the illustration below for an example of delivery address validation.
Since we are performing address validation in support of a master address management initiative, it is best practice to use a mapplet to perform the validations. A mapplet is a reusable object containing a set of transformations that you can use in multiple mappings. Any change made to the mapplet is inherited by all instances of the mapplet. In this way, mapplets are an object oriented approach to performing data quality.
The basic requirements for a mapplet are an input, a transformation and an output. In our address validation example the transform is the GAV previously mentioned. Below is an illustration of a basic address validation mapplet.
Here is a step-by-step process on how to create such a mapplet.
Step 1) Define your input information
Step 2) Define transform properties
2a) Add the Global Address Validation component
2b) Define transform input parameters
1. Right click on the transform and select the Properties option
2. Select the Templates option from the Properties tab on the left
3. Click on the (+) icon next to the Basic Model option
3a. Click on the (+) icon next to the Discrete option
4. Select the following input parameters from the Discrete option: StreetComplete1, StreetComplete2, PostalPhraseComplete1, LocalityComplete1, Province1
2c) Define the transform output / validation parameters
1. Click on the (+) icon next to the Address Elements option
2. Select the following attributes from the Address Elements option: SubBuildingNumber1, SubBuildingName1, StreetNumber1, StreetName1, StreetPreDescriptor1, StreetPostDescriptor1, StreetPreDirectional1, StreetPostDirectional1, DeliveryAddress1, DeliveryAddress2
3. Select the following attributes from the Last Line Elements option: LocalityName1, Postcode1, Postcode2, ProvinceAbbreviation1
4. Select the following attribute from the Status Info option: MatchCode
5. Select the following attribute from the Formatted Address Line option: FormattedAddressLine1, FormattedAddressLine2
3) Define an Output Write object
1. Select the Output Component from the Transformation Palette
2. Right click on the write object and select the Properties option
3. From the Properties tab, select the Ports option
4. Right click on the first available row and select New from the menu options
5. Enter the name of the output field
4) Connect the Validation component to the Write Output
In this step you connect the output ports from the address validation component to the write output object. This is a simple drag-and-drop step connecting the appropriate ports to each other. If you keep your names consistent it’ll help keep you sane.
Now you’ve learned how to create an address validation mapplet! This is a great step toward building a consistent, repeatable address validation process which is the key to implementing a master address data management program!