IBM Match 360 uses matching algorithms to resolve data records into master data entities. Data engineers can define different matching algorithms for each entity type in their data. The matching algorithms can then analyze the data to evaluate and compare records, and then collect matched records into entities.
There are two common reasons to run matching on your data:
- For record deduplication and entity resolution, the matching process analyzes your data to determine whether any duplicate records exist in your data. Suspected duplicate records are merged into master data entities to establish a single, trusted, 360-degree view of your data.
- To create other types of entity associations, the matching process analyzes your data to collect records into entities that represent different kinds of groupings, such as a household.
Watch the following video to see how to use IBM Match 360 to set up a matching algorithm for a customized data model.
This video provides a visual method to learn the concepts and tasks in this documentation.
In this topic:
- Matching to create more than one type of entity
- The matching process
- Resiliency rules
- Components of the matching algorithm
Matching to create more than one type of entity
IBM Match 360 matching algorithms are driven by the entity type of the associated data. You can define more than one entity type for each record type in the data model. For each entity type, configure and tune its corresponding matching algorithm to ensure that IBM Match 360 creates entities that meet your organization's requirements.
A single record can be part of more than one separate entity. If your data model includes more than one entity type, you can run different types of matching across the same data set. For example, consider a data set that includes person records from across your enterprise. If the Person record type includes definitions for a Person entity type and a Household entity type, then you can run the Person matching algorithm for entity resolution and deduplication, and also run the Household matching algorithm to create entities made up of person records that belong to the same household.
The matching process
The matching engine goes through a defined process to match records into entities. The matching process includes three major steps:
-
Standardization. During this step, the algorithm standardizes the format of the data so that it can be processed by the matching engine.
-
Bucketing. The algorithm sorts data into various categories or "buckets" so that it can compare like-to-like pieces of information.
-
Comparison. The algorithm compares data to determine a final comparison score. The algorithm then uses the comparison score to determine whether the records are a match.
Each of these steps is defined and configured by the matching algorithm.
Resiliency rules
You can use the IBM Match 360 API to configure resiliency rules that limit how the matching algorithm responds to record data changes.
With no resiliency rules in place, there are a number of possible entity linking changes that can occur when a master data record gets added, updated, or deleted:
-
If a new record gets added, it can:
- Join an existing entity.
- Cause two or more existing entities to join together by acting as a glue record.
- Form a new singleton entity.
-
If a record gets updated, it can:
- No longer belong to its current entity and become new singleton entity.
- No longer beleong to its current entity and join another existing entity.
- Cause its current entity to split into multiple entities.
- Cause other entities to join the existing entity by acting as a glue record.
- Cause no changes to entity composition.
-
If a record gets deleted, it can:
- Cause its singleton entity to also be deleted.
- Cause its current entity to be split.
By defining resiliency rules, data engineers can configure how the IBM Match 360 matching engine responds to each of these scenarios. The matching engine controls its linking behavior to align with the resiliency rules that you have configured. By configuring resiliency rules, you can limit entity merges and splits, meaning that you can have more stable entity composition.
Define resiliency rules by using the
API. If a given rule is set to resiliency_rules
, then the corresponding entity linking scenario will not complete its usual entity linking changes.FALSE
To get the current set of resiliency rules, run the following API command:
GET /mdm/v1/resiliency_rules
To update the resiliency rules, run the following API command with an updated payload:
PUT /mdm/v1/resiliency_rules
{
"link_resiliency_rules": {
"records": {
"person": {
"add": {
"join_existing_entity": "true/false",
"merge_entities": "true/false"
},
"update": {
"record_becoming_singleton": "true/false",
"join_existing_entity": "true/false",
"original_entity_split": "true/false",
"merge_entities": "true/false"
},
"delete": {
"singleton_entity_deletion": "true",
"original_entity_split": "true/false"
}
}
},
"entities": {
}
}
}
Components of the matching algorithm
Three main types of components define an IBM Match 360 matching algorithm:
Standardizers
As the name suggests, standardizers define how data gets standardized. Standardization enables the matching algorithm to convert the values of different attributes to a standardized representation that can be processed by matching engine.
The matching algorithm uses multiple standardizers. Each standardizer is suited to process specific attribute types found in record data.
Standardizers are defined by JSON objects. Each standardizer's JSON object definition contains three elements:
-
- A label that identifies this standardizer.label
-
- Theinputs
list has one element, which is a JSON object. That JSON object has two elements:inputs
andfields
:attributes
- The list of fields to use for standardization.fields
- The list of attributes to use for standardization.attributes
-
- A list of JSON objects in which each object represents one step to be run during the standardization process of the associated standardizer. Each object in thestandardizer_recipe
list consists of four main elements:standardizer_recipe
- A label that identifies this step in the standardizer recipe.label
- The internal method used. This element is just for reference and must not be edited.method
- A single element of theinputs
list defined one level higher.inputs
- A list of the fields to be used for this step. This is generally a subset of all the fields defined within thefields
list one level higher. Not every step needs to process all of theinputs
fields.inputs
- The name of aset_resource
type customizable resource used for this step.set
- The name of amap_resource
type customizable resource used for this step.map
Depending on the behavior of a step, there might be more configuration elements that are required in the corresponding JSON object.
Preconfigured standardizers
The following standardizers are ready to use in IBM Match 360. The preconfigured standardizers are also customizable.
Person Name standardizer
This standardizer is used to standardize Person Name attribute values. It contains the following recipes, in sequence:
- Converts the input field values to use their uppercase equivalents.Upper case
- Converts UNICODE input characters to equivalent English alphabet characters. Optionally, define the map in the IBM Match 360 resources.Map character
- Tokenizes the input field value into multiple tokens, based on the defined list of delimiters.Tokenizer
- Parses the input field values to different tokens, depending on the predefined values in the IBM Match 360 resources. For example, you can use this recipe to parse suffix, prefix, and generation values into appropriate fields.Parse token
- Discards tokens that are outside a given length range. Minimum and maximum values are defined in the IBM Match 360 resources.Length
- Removes anonymous input values, as configured.Stop token
- Selects a subset (or all) of the tokens as the standardized data to use in bucketing and comparison.Pick token
The Person Name standardizer uses the following Map resources by default:
- Converts UNICODE input characters to equivalent English alphabet characters.map_character_general
- Parses suffix, prefix, and generation values into appropriate fields.person_map_name_alignments
The Person Name standardizer uses the following Set resources by default:
- Removes anonymous person name values.person_set_name_aname
Organization Name standardizer
This standardizer is used to standardize Organization Name attribute values. It contains the following recipes, in sequence:
- Converts the input field values to use their uppercase equivalents.Upper case
- Converts UNICODE input characters to equivalent English alphabet characters. Optionally, define the map in the IBM Match 360 resources.Map character
- Removes unwanted input characters from name values.Stop character
- Generates nicknames or alternate names for the given input and stores the information in a separate new internal field.Map token
- Tokenizes the input field value into multiple tokens, based on the defined list of delimiters.Tokenizer
- Removes anonymous input values, as configured.Stop token
- Generates an acronym for the given organization name and stores the information in a separate new internal field. This acronym value is used during comparison to handle abbreviated names.Acronym
- Selects a subset (or all) of the tokens as the standardized data to use in bucketing and comparison.Pick token
The Organization Name standardizer uses the following Map resources by default:
- Converts UNICODE input characters to equivalent English alphabet characters.map_character_general
- Generates nicknames or alternate names for the given input.org_map_name_cnick_name
The Organization Name standardizer uses the following Set resources by default:
- Removes anonymous organization name values.org_set_name_aname
Date standardizer
This standardizer is used to standardize Date attribute values. It supports many different date formats and contains the following recipes, in sequence:
- Converts slash characters (Map character
) to dash characters (/
).-
- Converts date inputs in different formats to a standardized format.Date function
- Removes anonymous date values, as configured.Stop token
- Parses the input field values to different tokens, depending on certain regular expressions. For example, you can use this recipe to parse a full date input into day, month, and year tokens.Parse token
- Selects a subset (or all) of the tokens as the standardized data to use in bucketing and comparison.Pick token
The Date standardizer uses the following Map resources by default:
- Converts slash (map_character_date_separators
) or any other separator characters to dash characters (/
).-
- Parses the input date value to internal fields, namelymap_date_tokens_year_month_day
,birth_year
andbirth_month
, based on regular expressions.birth_day
The Date standardizer uses the following Set resources by default:
- Removes anonymous date values.set_date_date
Gender standardizer
This standardizer is used to standardize Gender attribute values. It contains the following recipes, in sequence:
- Converts UNICODE input characters to equivalent English alphabet characters. Optionally, define the map in the IBM Match 360 resources.Map character
- Converts the input field values to use their uppercase equivalents.Upper case
- Removes anonymous input gender values, as configured.Stop token
- Converts input token values to equivalent values, as configured in the IBM Match 360 resources.Map token
- Parses processed field values to an appropriate internal field.Parse token
- Selects a subset (or all) of the tokens as the standardized data to use in bucketing and comparison.Pick token
The Gender standardizer uses the following Map resources by default:
- Converts UNICODE input characters to equivalent English alphabet characters.map_character_general
– Maps different input gender values to standard values.map_gender_gender
- Parses the input token value to internalmap_gender_tokens_gender
field based on regular expression.gender
The Gender standardizer uses the following Set resources by default:
- Removes anonymous input gender values.set_gender_anon_gender
Address standardizer
This standardizer is used to standardize Address attribute values. Addresses can have several different formats, depending on the locales. This flexibility requires complex processing to convert addresses to a standardized form. The Address standardizer contains the following recipes, in sequence:
- Converts the input field values to use their uppercase equivalents.Upper case
- Converts UNICODE input characters to equivalent English alphabet characters. Optionally, define the map in the IBM Match 360 resources.Map character
- Converts input token values to equivalent values, as configured in the IBM Match 360 resources. For example, "United States of America", "United States", and "US" can all be mapped to "USA". This mapping is common for country and province/state field values. In addition, delimiter characters configured in the resource are mapped to the space character.Map token
- Tokenizes the input field value into multiple tokens, based on the defined list of delimiters.Tokenizer
- Removes anonymous input values, such as postal codes, as configured.Stop token
- Allows only the defined list of values for a given field. For example, you might define a list of postal codes that are allowed during standardization. Input values that are not in the allowed list will be removed.Keep token
- Parses the input field values to appropriate internal fields depending on certain regular expressions and predefined values, as configured in the resources. You can use this recipe to truncate a given token to a certain length by using regular expressions. You can also define different alphanumeric pattern sets in the form of regular expressions to allow only certain patterns.Parse token
- Joins two or more fields together to create a new combined value, assigned to an internal field. For example,Join fields
andlatitude
field values can be joined together to form a new internal field calledlongitude
.lat_long
- Selects a subset (or all) of the tokens as the standardized data to use in bucketing and comparison.Pick token
The Address standardizer uses the following Map resources by default:
- Converts UNICODE input characters to equivalent English alphabet characters.map_character_general
- Converts input country values to equivalent values.map_address_country
- Converts input province and state values to equivalent values.map_address_province_state
- Maps delimiter characters configured in the resource to the space character.map_address_delimiter_removal
- Converts input address token values to equivalent values.map_address_addr_tok
- Parses the input fieldmap_address_tokens_unit_type_and_number
based on regular expression to internal fields, namelyresidence_number
andunit_type
.unit_number
- Parses the input fieldmap_address_tokens_street_number_name_direction_type
based on regular expression to internal fields, namelyaddress_line1
,street_number
,street_name
, anddirection
.street_type
- Parses the input fieldmap_address_tokens_sub_division
based on regular expression to the internal fieldaddress_line2
.sub_division
- Parses the input fieldmap_address_tokens_pobox_type_and_number
based on regular expression to internal fields, namelyaddress_line3
andpobox_type
.pobox
- Parses the input value of themap_address_tokens_city
field based on regular expression.city
- Parses the input value of themap_address_tokens_province
field based on regular expression to the internal fieldprovince_state
.province
- Parses the input value of the fieldmap_address_tokens_postal_code
based on regular expression to the internal fieldzip_postal_code
.postal_code
- Parses the input value of the fieldmap_address_tokens_country
based on regular expression.country
- Parses the input value of fieldmap_address_tokens_latitude
based on regular expression to the internal fieldlatitude_degrees
.latitude
- Parses the input value of the fieldmap_address_tokens_longtitude
based on regular expression to the internal fieldlongitude_degrees
.longitude
The Address standardizer uses the following Set resources by default:
- Removes anonymous input values forset_address_postal_code
.zip_postal_code
Phone standardizer
This standardizer is used to standardize Phone attribute values. It contains the following recipes, in sequence:
- Removes unwanted input characters from phone values.Stop character
- Removes anonymous phone values, as configured.Stop token
- Parses input phone numbers with different formats from different locales into a common format. This recipe can be configured to remove area codes and country codes from phone numbers. It can also retain a certain number of digits in a standardized phone number.Phone
- Parses processed field values to an appropriate internal field depending on certain regular expressions, as configured in the resources.Parse token
- Selects a subset (or all) of the tokens as the standardized data to use in bucketing and comparison.Pick token
The Phone standardizer uses the following Map resources by default:
- Parses phone values to an internal field based on regular expressions.map_phone_tokens_phone
The Phone standardizer uses the following Set resources by default:
- Replaces all characters that are not alphanumeric. Enables you to specify regular expressions.set_character_phone
- Removes anonymous phone values.set_phone_anon_phone
Identification standardizer
This standardizer is used to standardize Identification attribute values. It contains the following recipes, in sequence:
- Converts UNICODE input characters to equivalent English alphabet characters. Optionally, define the map in the IBM Match 360 resources.Map character
- Converts the input field values to use their uppercase equivalents.Upper case
- Removes unwanted input characters from identification values.Stop character
- Removes anonymous input values, as configured.Stop token
- Converts input token values to equivalent values, as configured in the IBM Match 360 resources.Map token
- Parses processed field values to an appropriate internal field depending on certain regular expressions, as configured in the resources.Parse token
- Selects a subset (or all) of the tokens as the standardized data to use in bucketing and comparison.Pick token
The Identification standardizer uses the following Map resources by default:
- Converts UNICODE input characters to equivalent English alphabet characters.map_character_general
- Converts input token values to equivalent values.map_identifier_equi_identifier
- Parses processed field values to an appropriate internal field depending on certain regular expressions, as configured in the resources.map_identifier_tokens_identification_number
The Identification standardizer uses the following Set resources by default:
- Removes non-alphanumeric input characters from identification values. Enables you to specify regular expressions.set_character_identification_number
- Removes anonymous identification values.set_identifier_anonymous
Email standardizer
This standardizer is used to standardize Email attribute values. It contains the following recipes, in sequence:
- Converts UNICODE input characters to equivalent English alphabet characters. Optionally, define the map in the IBM Match 360 resources.Map character
- Converts the input field values to use their uppercase equivalents.Upper case
- Removes anonymous input values, as configured.Stop token
- Converts input token values to equivalent values, as configured in the IBM Match 360 resources.Map token
- Parses processed field values to an appropriate internal field depending on certain regular expressions, as configured in the resources.Parse token
- Selects a subset (or all) of the tokens as the standardized data to use in bucketing and comparison.Pick token
The Email standardizer uses the following Map resources by default:
- Converts UNICODE input characters to equivalent English alphabet characters.map_character_general
- Converts input token values to equivalent values.map_non_phone_equi_non_phone
- Parses the input fieldmap_non_phone_tokens_non_phone
based on regular expression to the internal fieldsemail_id
andemail_local_part
.email_domain
The Email standardizer uses the following Set resources by default:
- Removes anonymous email values.set_non_phone_anon_non_phone
Entity types (bucketing)
Within a single matching algorithm, each record type can have multiple entity type definitions (
JSON objects). For example, in an algorithm defined for a person record type, you might need to create more than one entity
type definition, such as person entity, household entity, location entity, and others.entity_type
Each entity type can be used to match and link records in different ways. An entity type defines how records are bucketed and compared during the matching process.
Each entity type definition (
) in the matching algorithm has several JSON elements:entity_type
-
- Records that have a comparison score lower than the clerical review threshold are considered as non-matches.clerical_review_threshold
-
- Records that have a comparison score higher than the autolink threshold are considered to be strong enough matches that they are automatically matched.auto_link_threshold
-
- This section contains the definition of the bucket generators configured for an entity type. There are two types of bucket generators: buckets and bucket groups.bucket_generators
-
Buckets involve bucketing for only one attribute. Each
definition includes four elements:bucket
- A label that identifies the bucket generator.label
- A value that defines the size of large buckets. Any bucket hash with a bucket size greater than this value is not considered for candidate selection during matching.maximum_bucket_size
- For buckets, theinputs
list has only one element, which is a JSON object. That JSON object has two elements:inputs
andfields
:attributes
- The list of fields to use for bucketing.fields
- The list of attributes to use for bucketing.attributes
- A bucket recipe list defines the steps for the bucket generator to complete during the bucketing process. Eachbucket_recipe
list has a number of subelements:bucket_recipe
- A label that identifies the bucket recipe element.label
- The internal method used. This element is just for reference and must not be edited.method
- A single element of theinputs
list defined one level higher.inputs
- A list of the fields to be used for this bucket. This is generally a subset of all the fields defined within thefields
list one level higher.inputs
- The minimum number of tokens to use when the recipe is forming a bucket hash.min_tokens
- The maximum number of tokens to use together when the recipe is forming a bucket hash.max_tokens
- A limit on the number of bucket hashes for a single record that get generated out of a bucket generator. If a record generates a lot of bucket hashes, only the number of hashes set by this element get picked up.count
- The sequence number for a bucket group that produces a bucket hash. Intermediary steps or recipes would not be assigned a sequence number.bucket_group
- Specifies whether the tokens are sorted in lexicographical order when multiple tokens are combined to form a bucket hash.order
- A value that defines the size of large buckets. This element is the same as the one defined at the bucket generator level; also having it at the bucket recipe level gives you finer control over large individual buckets.maximum_bucket_size
-
Bucket groups involve bucketing for more than one attribute. Each
definition includes five elements:bucket_group
- A label that identifies the bucket generator.label
- A value that defines the size of large buckets. Any bucket hash with a bucket size greater than this value is not considered for candidate selection during matching.maximum_bucket_size
- For bucket groups, theinputs
list has more than one JSON object element. The JSON objects each have two elements:inputs
andfields
:attributes
- The list of fields to use for bucketing.fields
- The list of attributes to use for bucketing.attributes
- A bucket recipe list defines the steps for the bucket generator to complete during the bucketing process. Eachbucket_recipe
list has a number of subelements:bucket_recipe
- A label that identifies the bucket recipe element.label
- The internal method used. This element is just for reference and must not be edited.method
- A single element of theinputs
list defined one level higher.inputs
- A list of the fields to be used for this bucket. This is generally a subset of all the fields that are defined within thefields
list one level higher.inputs
- The minimum number of tokens to use when the recipe is forming a bucket hash.min_tokens
- The maximum number of tokens to use together when the recipe is forming a bucket hash.max_tokens
- A limit on the number of bucket hashes for a single record that get generated out of a bucket generator. If a record generates many bucket hashes, only the number of hashes set by this element get picked up.count
- The sequence number for a bucket group that produces a bucket hash. Intermediary steps or recipes would not be assigned a sequence number.bucket_group
- Specifies whether the tokens are sorted in lexicographical order when multiple tokens are combined to form a bucket hash.order
- A value that defines the size of large buckets. This element is the same as the one defined at the bucket generator level. Being able to define it at the bucket recipe level gives you finer control over large individual buckets.maximum_bucket_size
- The name of aset_resource
type resource used for a bucket recipe.set
- The name of amap_resource
type resource used for a bucket recipe.map
- If this recipe produces new fields after it completes bucketing functions on the input fields, this element contains a list of the names of the generated fields.output_fields
- A bucket group recipe section is typically used for defining buckets that consist of more than one attribute. Every element of abucket_group_recipe
list is a JSON object defining the construct for a single bucket group.bucket_group_recipe
- The
list withininputs
has more than one element, which means it refers to more than one attribute defined in thebucket_group_recipe
array one level higher.inputs
- The
element is a list of lists. Every inner list of fields is associated with the respectivefields
list.attributes
andmin_tokens
lists have more than one element, with each element corresponding to respectivemax_tokens
list.attributes
- The
Note:In some bucketing recipe definitions, there is a property that is named
. By default, its value issearch_only
. If set tofalse
, this property indicates that a bucket or bucket group is used only for probabilistic search scenarios and is not used for entity resolution (matching) scenarios.true
-
-
- Definitions of the comparison methods that are configured for an entity type. Eachcompare_methods
JSON object consists of definitions of variouscompare_methods
methods. The matching algorithm adds up the scores from eachcompare
method definition to get the final comparison score. Eachcompare
method's JSON object contains three elements:compare
- A label that identifies thelabel
method.compare
- A list of comparators that form a comparison group. Every element in this array represents one comparator, meant for one type of matching attribute. The matching algorithm considers the maximum of the scores from all the comparators in amethods
list as the final score from this comparison group. Each comparator definition includes two elements:methods
- For comparators, theinputs
list has only one element, which is a JSON object. That JSON object has two elements:inputs
andfields
:attributes
- The list of fields to use for comparison.fields
- The list of attributes to use for comparison.attributes
- This list is used mainly for defining the comparison steps. Typically, there is only one JSON element in this array, representing only one step for doing the comparison. This step has five elements:compare_recipe
- A label that identifies the comparison step.label
- The internal method used. This element is just for reference and must not be edited.method
- A single element of theinputs
list defined one level higher.inputs
- The fields to be used for this comparison out of all of the fields that are defined in thefields
list one level higher.inputs
- The name of a customizable comparison resource used for this comparison step.comparison_resource
- Each comparison that is done by a comparator results in a number score from 0 to 10. This number is called the distance or dis-similarity measure. A distance of 0 indicates that the values being compared are exactly the same. A distance of 10 indicates that they are completely different. Corresponding to the 11 distinct values (0 - 10), 11 weights are defined for each comparator. After calculating the distance, the compare method determines the corresponding weight value from the weights list, resulting in the total comparison score. Data engineers can customize the weights as needed, based on the data quality, distribution, or other factors.weights
-
- The record filtering element enables the matching engine to select records for matching based on their entity types. Each record filter definition contains one element:record_filter
-
- Includes or excludes records from matching consideration based on specific conditions. This element contains one JSON object with a key-value pair.criteria
The key of the
JSON object is an attribute name. It can be either of the following:criteria
- The
system attribute.record_source
- A user-defined custom attribute of a simple attribute type (string).
- The
The value of the
JSON object is another JSON object containing one element, which can be either of the following:criteria
- An array of string values. Records that include any of these values will be considered during matching.allowed
- An array of string values. Records that include any of these values will not be considered during matching.disallowed
-
-
- Source-level thresholds enable you to define autolink and clerical review thresholds on a source-to-source basis. Source-level thresholds override the default global threshold values. Each source-level threshold configuration contains a collection of sources with optional source-specific default thresholds or a collection of source-to-source threshold pairs that enable you to define different thresholds for each source. For more information, see Configuring source-specific matching thresholds in the Advanced matching algorithm tuning topic.source_level_thresholds
Bucketing resources
The bucketing definitions use the following Map resources by default:
- Generates nicknames or alternate names for a given person name input.person_map_name_nickname
- Generates nicknames or alternate names for a given organization name input.org_map_name_cnick_name
The bucketing definitions use the following Set resources by default:
- Removes anonymous person name values.person_set_name_bkt_anon
– Removes anonymous organization name values.org_set_name_acname
Comparison functions
Comparison functions, sometimes called comparators, are one of the key components of the matching algorithm. Comparison functions are used by the matching engine to compare record data during the matching process. Essentially, record matching involves comparing different types of attributes between different records’ data.
For many of the commonly used attribute types in the person, organization, and location domains, the IBM Match 360 matching engine includes preconfigured comparison methods.
In IBM Match 360, comparison functions use an approach to comparison known as feature vectors. There are different customizable feature definitions in IBM Match 360 that are used for different comparison functions. Each comparison results in a measure of distance (a vector) that shows how dissimilar two given attribute values are.
In the matching algorithm, each discrete distance value is given a weight that determines how strongly to consider that value. The weight combines with the distance to produce a comparison score. The matching algorithm adds all of the comparison scores together to arrive at a final comparison score for the overall record-to-record comparison.
About features
A feature represents the fine-level details of a comparison function. Different types of attributes use different types of similarity checks, meaning that their features vary as well.
Feature definitions dictate the types of internal functions used for each comparison function. Examples of internal functions include exact match, edit distance, nickname, phonetic equivalent, or initial match.
Comparison resources
Each comparison method includes resources that contain the details of its internal comparison operations.
Each of the default comparison types has its own resources. See each comparison type for details of the associated resources.
For comparisons on custom attribute types that have a matching type of
, the generic comparison method includes the following resources:generic
- In the generated algorithm, the name format of this resource iscompare_spec_generic
.recordType_entityType_compare_spec_generic
Person name comparisons
Different fields within a person name attribute are handled differently. For fields like prefix, suffix, and generation values, exactness or non-matching is checked. Other fields such as given name, last name, and middle name primarily use the following features:
- Exact match
- Nickname match
- Edit distance
- Initials match
- Phonetic matching
- Misplacement of tokens
- Extra tokens
- Missing values
The person Name comparison method includes the following resources:
– In the generated algorithm, the name format of this resource isperson_compare_spec_name
. For example:recordType_entityType_ compare_spec_name
.person_person_entity_compare_spec_name
Organization name comparisons
For organization names, there is typcally one field that contains the entire business name. That field is compared using primarily the following features:
- Exact match
- Nickname match
- Edit distance
- Initials match
- Phonetic matching
- Misplacement of tokens
- Extra tokens
- Missing values
For organization names, the acronyms and nicknames are also compared for exactness.
The organization name comparison method includes the following resources:
- In the generated algorithm, the name format of this resource isorg_compare_spec_name
.recordType_entityType_ compare_spec_name
Date comparisons
For dates, there are typically three fields to compare: day, month, and year.
The
field is compared using the following features:year
- Exactness
- Edit distance
- Non-matching
- Missing
The
and day
fields are compared using the following features:month
- Exactness
- Non-matching
- Missing
The date comparator also checks to see if the
and day
fields have been transposed due to locale differences in date formatting.month
The Date comparison method includes the following resources:
- In the generated algorithm, the name format of this resource iscompare_spec_date
.recordType_entityType_ compare_spec_date
Gender comparisons
The gender attribute is compared using the following features:
- Exactness
- Non-matching
The gender comparison method includes the following resources:
- In the generated algorithm, the name format of this resource iscompare_spec_gender
.recordType_entityType_ compare_spec_gender
Address comparisons
Different fields within an address attribute are handled differently.
Fields like country, city, province/state, and subdivision are compared using the following features:
- Exactness
- Equivalency
- Edit distance
- Non-matching
- Missing
Postal code fields are compared using the following features:
- Exactness
- Edit distance
- Non-matching
- Missing
Fields like street number, street name, street type, unit number, and direction are compared using the following features:
- Exactness
- Equivalency
- Initials match
- Edit distance
- Non-matching
- Misplacement of tokens
- Missing
The address comparison method includes the following resources:
- In the generated algorithm, the name format of this resource iscompare_spec_address
.recordType_entityType_ compare_spec_address
Phone comparisons
Phone number attributes are compared using the following features:
- Exact match
- Edit distance
- Non-matching
The phone comparison method includes the following resources:
- In the generated algorithm, the name format of this resource would becompare_spec_phone
.recordType_entityType_ compare_spec_phone
Identifier comparisons
Identification number attributes are compared using the following features:
- Exact match
- Edit distance
- Non-matching
The identifier comparison method includes the following resources:
- In the generated algorithm, the name format of this resource iscompare_spec_identifier
.recordType_entityType_ compare_spec_identifier
Email comparisons
Email attributes consist of two parts: the unique ID (before the @ symbol) and the email domain (after the @ symbol). Both the ID and domain parts are compared, separately, using the following features:
- Exact match
- Edit distance
- Non-matching
The outcome of the two comparisons are combined in a weighted manner to produce an overall comparison score.
The email comparison method includes the following resources:
- In the generated algorithm, the name format of this resource iscompare_spec_email
.recordType_entityType_ compare_spec_email
Edit distance
The IBM Match 360 matching engine calculates edit distance as one of the internal functions during comparison and matching of various attributes. Edit distance is a measurement of how dissimilar two strings are from each other. It is calculated by counting the number of changes required to transform one string into the other.
There are different ways to define edit distance by using different sets of string operations. By default, IBM Match 360 uses a standard edit distance function that is publicly available in literature. As an alternative, you can choose to use a specialized IBM Match 360 edit distance function.
-
The standard edit distance function provides better performance of the matching engine. For this reason, it is the default comparison configuration for all attributes except for the Telephone attribute type.
-
The specialized edit distance function is built for hyper-precision use cases. This option takes into consideration typos or similar-looking characters, such as 8 and B, 0 and O, 5 and S, or 1 and I. When there is a mismatch in two compared values based on similar-looking characters, the assigned dissimilarity measure is less than what would be assigned by a standard edit distance function. As a result, these types of mismatches are not penalized as strongly by the specialized function.
Important: The specialized edit distance function includes some complex calculations. As a result, choosing this option has an impact on system performance during the matching process.
For information about customizing your matching algorithm, including using the API to customize the edit distance, see Customizing and strengthening your matching algorithm.
Learn more
- Data concepts
- Matching your data to create master data entities
- Customizing and strengthening your matching algorithm
Parent topic: Managing master data