The Data Quality Chronicle

An event based log about a service offering

Category Archives: MDM

Data Quality Polls: Troubled domains and what to fix

With which data domain do you have the most quality issues?

As expected, customer data quality remains at the top of list with regard to having the most issues. Ironically, this domain has been at the forefront of the data quality industry since its inception.
One reason for the proliferation of concerns about customer data quality could be its direct link to revenue generation.
Whatever the reason, this poll seems to indicate that services built around the improvement of customer data quality will be well founded.

What would you improve about your data?

Once again there are no surprises when looking at what data improvements are desired. Data owners seem to be interested in a centralized, synchronized, single view of their data, most notably customer.

The good news that can be gathered from these polls is that as an industry, data quality is focused on the right data and the right functionality.  Most data quality solutions are built around the various aspects of customer data quality and ways to improve it so there is a master managed, single version of a customer.  The bad news is we’ve had that focus for quite some time and data owners are still concerned. 

In my opinion, this is due to the nature of customer data.  Customer data is at the core of every business.  It is constantly changing both in definition and scope, it is continuously used in new and complex ways, and it is the most valuable asset that an organization manages.

One thing not openly reflected in these polls is that it is likely that the same issues and concerns that are present in the customer domain are also present in the employee and contact domains.  However, they tend not to “bubble up” to the top of list due to lack of linkage to revenue and profit.

I’d encourage comments and feedback on this post.  If we all weigh in on topics like this, we can all learn something valuable.  Please let me know your thoughts on the poll results, my interpretation of the results and opinions.


Master Data Management: Address Validation Series: Address Validation


There are plenty of aspects of address validation to write about. 

Validating addresses can be done with many different tools, each with their own specific details on how to do it.  There are various ways to validate address within each tool to produce different outcomes.  And there are various ways to manage and integrate this data back into enterprise, operational systems. 

While all of this content is very helpful and important to convey, it is my belief that it is essential to understand, write about and discuss why address validation is important to an organization and their master data management (MDM) efforts.

Here are a few reasons why address validation is so important to any organization (feel free to comment and suggest others!):

  1. Without valid address information, return mail can impact an organizations bulk mail status and lead to increased mailing costs
  2. Without valid address information, billing operations generate negatively impacting cycles of billing collections and corrections which has a negative impact on revenue assurance
  3. Without valid address information, marketing campaigns are not fully leveraged
  4. Without valid address information, marketing techniques like house-holding cannot fully realize their potential
  5. Without valid address information, customer care operations are impeded
  6. Without valid address information, customer perception and the customer experience is negatively impacted
  7. Without valid address information, an organization is open to federal regulatory fines resulting in the failure to honor contractual obligations
  8. Without valid address information, shipping operations experience failures which generate negatively impacting cycles
  9. Without valid address information, supply chain management efforts are comprimised leading to reduce costs and increase effectiveness
  10. Without valid address information, asset management efforts for business models built on locational awareness, like housewares rental providers and home security providers, are seriously comprised

I’m sure there are other reasons, but the key message to convey is that validating address information is a critical component to business operations throughout the enterprise.


Now that we’ve covered why it is important to validate address data, let’s concentrate on how.  While there are many tools on the market with which to validate addresses, I use Informatica’s v9 product which integrates AddressDoctor v5 to accomplish validation.

The basics

In order to validate address data with Informatica there are three required components:

  1. An input containing address information
  2. The Global Address Validation (GAV) component
  3. An output to write the original and validated address information

Address Inputs

As far as address inputs are concerned, no surprises here.  Typcial address information such as street number and name, city or locality, state or province and postal code can be passed into the GAV module.  Of these, the street number and name along with the postal code are required.

Global Address Validation

The Address Doctor address validation service performs several types of address processing which are beneficial to address validation and address matching.  Among these are:

  1. Delivery Address validation
  2. Formatted Address validation
  3. Mailability validation
Delivery Address Validation

Delivery address validation involves verificaiton, and if necessary correction, of the street number, street name and any sublocation information.  This process ensures that the address is valid and deliverable via USPS standards.  This process is a must for marketing and billing operations that want to ensure mailings reach their destination. 

Delivery Address Sample

Formatted address validation takes the various inputs provided and arranges them into the standard mailing format.  By standard mailing format, I mean the way the information should be presented on a mailing envelope.  This process is particularly useful to marketing operations to process “raw” address data into suitable mailing data.  Refer to the illustration below for an example of formatted address validation. 

Formatted Address Sample

When it comes to address validation perhaps the most important result is the verification of the deliverability of the address.  The GAV offers a way to investigate and report on this critical aspect of an address.  Below is an illustration of address validation results including address deliverability.  As part of this process there are match codes provided that range from validating the address as deliverable to stating the address could not be validated or corrected.  Refer to the illustration below for an example of delivery address validation.

 Address DPV Sample

 Address Validation Mapplet

 Since we are performing address validation in support of a master address management initiative, it is best practice to use a mapplet to perform the validations.  A mapplet is a reusable object containing a set of transformations that you can use in multiple mappings.  Any change made to the mapplet is inherited by all instances of the mapplet.  In this way, mapplets are an object oriented approach to performing data quality.

The basic requirements for a mapplet are an input, a transformation and an output.  In our address validation example the transform is the GAV previously mentioned.  Below is an illustration of a basic address validation mapplet.

 Address Validation Mapplet

 Here is a step-by-step process on how to create such a mapplet.

Step 1) Define your input information

  1.  1.  Click on the  input component on the component toolbar
  2.  2. Right-click on the input component and click on the properties option
  3.  3.  In the ports section of the properties window click on the new icon
  4.  4.  Define the port name, data type, precision and scale







Step 2) Define transform properties

 2a) Add the Global Address Validation component

2b) Define transform input parameters

            1. Right click on the transform and select the Properties option

           2. Select the Templates option from the Properties tab on the left

           3. Click on the (+) icon next to the Basic Model option

           3a. Click on the (+) icon next to the Discrete option

           4. Select the following input parameters from the Discrete option: StreetComplete1, StreetComplete2, PostalPhraseComplete1, LocalityComplete1, Province1

2c) Define the transform output / validation parameters

           1. Click on the (+) icon next to the Address Elements option

           2. Select the following attributes from the Address Elements option: SubBuildingNumber1, SubBuildingName1, StreetNumber1, StreetName1, StreetPreDescriptor1, StreetPostDescriptor1, StreetPreDirectional1, StreetPostDirectional1, DeliveryAddress1, DeliveryAddress2

           3. Select the following attributes from the Last Line Elements option: LocalityName1, Postcode1, Postcode2, ProvinceAbbreviation1

           4. Select the following attribute from the Status Info option: MatchCode

           5. Select the following attribute from the Formatted Address Line option: FormattedAddressLine1, FormattedAddressLine2

3) Define an Output Write object

          1. Select the Output Component from the Transformation Palette

          2. Right click on the write object and select the Properties option

          3. From the Properties tab, select the Ports option

          4. Right click on the first available row and select New from the menu options

          5. Enter the name of the output field

4) Connect the Validation component to the Write Output

 In this step you connect the output ports from the address validation component to the write output object.  This is a simple drag-and-drop step connecting the appropriate ports to each other.  If you keep your names consistent it’ll help keep you sane.



Now you’ve learned how to create an address validation mapplet!  This is a great step toward building a consistent, repeatable address validation process which is the key to implementing a master address data management program!

Master Data Management: Address Validation Series


Address information, in particular customer address information, is a core asset of any business.  It plays a pivotal role in two fundamental business operations; revenue assurance and revenue generation.

Without valid, deliverable customer address information collecting payment for services or products is often a process that, at best, requires repetitive efforts that cost the business labor and resources (and dollars).  At worst, the process fails to collect, creating an obvious issue costing time and resources (and dollars).

Without valid, deliverable customer address information marketing to existing and potential customers is not possible and will, again, cost the business labor and resources (and dollars). 

[tweetmeme source=”dqchronicle” only_single=false]


So what exactly needs to be validated in order to prevent the failure of revenue collection and generating events?

While it is not harmful to have the full compliment of customer address data collected, stored, and validated, there are a few pieces of address information that are essential. 

Postal Code is an absolute must have in order to ensure the mailer be delivered.  Postal Code is the core element that the United States Postal Service uses to route mail.  Without it deliverability is unachievable.

Street Number and Street Name are also essential pieces of information to collect and validate in order to ensure mail deliverability.  Logically, this information is required to know where within the postal code to deliver the mail.

It is also required, where applicable, to collect and validate additional address location information such as apartment number, suite number, building number, etc.  This enables getting the mail to correct destination within a multi-dwelling residence.

In my opinion, this is the required data to validate and ensure deliverability.  Most address validation services can derive accurate and valid city and state information from the postal code which can be augmented and utilized moving forward.


Who should be responsible for address validation?

As I eluded to earlier, address information is a corporate asset which plays a pivotal role to many essential business operations.

For this reason, address validation belongs in a centralized group made up of representatives from those dependent parties.  In other words, address validation is the responsibility of a corporate data governance group that is aware of all the required aspects of useful address data management.

Typically there are, at a minimum, two levels to this group.  On one level there are business stakeholders that manage and advocate functional business requirements involving address information.  On the other level are the data processors that manage data sourcing, scrubbing, validation and integration of the address information.

Due to the technical requirements in managing such information, Information Technology should be responsible for management of the physical data stores that house the address information.  However, it is crucial to note that this management is around the software and hardware resources that house the data. 

It is imperative that data ownership be the responsibility of the business owners on the governance group.


Although there are various implementations of MDM, I believe address information belongs in a a centralized hub that feeds dependent systems clean and valid address data.  This model ensures the delivery of consistent and valid address throughout the enterprise. 

This centralized hub needs to be managed in such a way that it is independently supported ensuring failover, redundancy and archival.  This eliminates the failure scenarios described earlier that interfer in revenue collection and generation.


How often does address data need to be validated?

There are various factors such as an annual change of address of  17% and quarterly marketing campaigns that influence when address information should be validated.  In the end, the answer to when should address data be validated depends on the lowest level of granualarity that the data is used to support business operations that either collect or generate revenue.

If marketing conducts campaigns on a quarterly basis but billing occurs monthly, than validating address data should be done on a monthly basis to support accurate and efficient billing operations.


How can address validation be implemented in order to support all the benefits described?

In order to validate address information on a periodic basis, manage it across various dependent business units and integrate it into a centralize hub you need to be able to develop validation routines, business rules, a mechanism for business stakeholder review and integration routines that can be executed in a scheduled format.

Within the domain of address validation there are several varieties of output.  For instance, it possible to develop an address validation process that transforms address information into the correct formatted address lines that would appear on the envelope.   Another implementation could be the parsing, augmenting and obtaining validation status of address information.  Yet another implementation could be to take the address information input and transform it into the valid delivery address information.

With various business units consuming address information, there will likely be various business unit specific rules to process the address information.  For example, marketing operations might require the “vanity” city name be specified.  Vanity city names are usually preferred by customers due to their perception and reputation.  One such example of a vanity address is using Beverly Hills over the validate city name of Los Angeles.  However, billing operations may not have the same requirement.  In this case, and others like it, you need an address validation process that enables the building of business specific rules that can handle variability on the same data element.

In order to enable business stakeholder ownership and help business users define and validate data specific rules, you need to have a mechanism that presents data to these users.  Since these business stakeholders are not typically technically inclined, this mechanism needs to be built in such as a way that minimizes technical effort and enables data review and validation.

Ultimately this address information needs to be integrated into a centralized hub and distributed to the various consuming applications.  This dictates the need for enterprise capable data extraction and load features such as scheduling, monitoring and tracking.

How do you deliver on such a complex set of criteria?

It’s a challenge.  In fact, it’s such a broad topic with many details that it is not feasible to do in one blog post. I plan on addressing (no pun intended) each of these areas in more detail over the coming weeks. 

So stay tuned to The Data Quality Chronicle for more!