First I’d like to thank all of those who submitted blog entries to the August edition of the IAIDQ Festival del IDQ Bloggers. I enjoyed reviewing the submissions and look forward to hosting more IAIDQ related material.
This month I’d like to talk about my recent experienecs with some of the data quality features of Microsoft’s Dynamics CRM package and how to put them to use in the typical enterprise environment.
One of my favorite features is one that attempts to prevent duplication from occurring by executing a match scan on create or update of a record. One of the reasons this is a favorite of mine is that it is proactive and aims to prevent duplication where every data quality experts recommends; at the beginning. One of the issues I have seen with this feature is that in high data volumes it causes significant delays in record creation.
The feature can be enabled by checking on the “When a rcord is created or updated” option in the Duplicate Detection Settings panel of the Data Management console.
Data Management Console
In the event that duplicates do get created in your data, Microsoft Dynamics CRM also has some reactive features that allow you to identify these records and consolidate them.
One of these features are duplicate detection rules which are a set of criteria used for matching records. For instance, it is very common for organizations to accumulate more than one record for a single customer. In this example, a duplicate detection rule can be built to identify all customer transactions that have a match on customer’s first and last name as well as their zip code. As you may already know, I am a diligent advocate of requiring more than first and last name to truely identify an individual. Based on research regarding change of address (COA), it makes practical sense to limit this criteria to a geographic area like those from which postal codes are generated. You may also want to throw in more qualifying data elements such as street name and number but for the purposes of this posting, we’ll stick to a customer’s full name and zip code.
This rule will identify each set of records that share the same values for these three data elements. It is possible to require an exact match on the data values or a substring of the characters of the values. Again, I am an advocate of using partial matches on data like last name due to the frequency of data entry errors.
Once you create your match criteria you’ll want to intialize, or publish, the rule. You can publish a rule, with the proper permissions, by clicking on the greeen arrowed icon labeled “Publish” on the tool bar after the rule is saved. Be forewarned some rules take quite a long time and a lot of resources to publish so you may want to perform this action as part of your off-hours operations.
Once the rule is published, it is time to schedule when it will be utilized. This is done by building a duplicate detection job which includes the start time, a setting for execution reoccurance, and an option to provide an email for notification of when the job finishes. The following is a snapshot of the interface for developing detection jobs.
Duplicate Detection Job
Once you have your rules and jobs created, you completed the basic steps to remove duplicates from your data. After the job completes and you receive your email you’ll want to review the duplicate matches.
This can be done by opening the duplicate detection job from the System Jobs queue and double-clicking on it. Once the job is open you’ll see an option labeled “View Duplictes”.
Next month, I’ll dive deeper into the details on how to remove the duplicates with a posting on the merge feature. I hope this was informative and enough to get most of you started. I’ll address detailed questions if you have them, so please feel free to comment!