Match types for the One-source Match stage

The One-source Match stage match type determines the relationship among the match passes, the records that are processed by each of the passes, and how the groups are formed.

When you use the One-source Match stage, you select one of the following match types.
  • Dependent. Passes in a one-source dependent match process the data sequentially. In each pass, groups are built around master records. The groups that are formed in all the passes for the same master record are combined to create the final group for the master. Each duplicate record in a group matches the group master record in one of the match passes. The master records and nonmatched records from a pass are made available for the subsequent pass. Duplicates are taken out of consideration so that they are not assigned to more than one group. Existing master records are given priority in group construction in subsequent passes.
  • Independent. Each pass in a one-source independent match processes all the input records. Like the one-source dependent match type, in each pass, groups are built around master records. But because each pass processes all records, a record can be a member of a group from more than one of the passes. (Similarly, a record can be a master in a group that was built in one pass while being a duplicate in a group that was built in another pass.) The groups from all the passes are merged so that groups that have a record in common form a single group. If record A is in a group with record B and record B is in a different group with record C, then those two groups are merged so that records A, B, and C are all in the same group. (A record ends up in no more than one group.) Groups are merged until all groups that have records in common are merged. At the pass level, the relationship that determines group membership is that of records matching a master record. However, for the merge process, the relationship is one of group membership. Thus, members in a group can be connected by a chain of relationships and do not necessarily all match a common master.
  • Transitive. Like one-source independent matches, each pass in a one-source transitive match also processes every record. But unlike a one-source independent match, the one-source transitive match type does not create pass-level groups. Instead, all record pairs that score above the match cutoff are used to produce the groups. Creating pass-level groups would discard the information that a record pair's score was above the match cutoff in a pass if each record ends up in a different group. The one-source transitive match type does not discard that information. It builds groups so that all records that score above the match cutoff in any pass are in the same group. For example, if record A and record B scored above the match cutoff in a pass and record B and record C scored above the match cutoff in a pass (possibly the same pass), then records A, B, and C are added to the same group. (A record ends up in no more than one group.) Like one-source dependent matches, members in a group can be connected by a chain of relationships and do not necessarily all match a common master. But the one-source transitive chain can extend further because it uses all the pairs that score above the match cutoff.

In most cases, choose the dependent match type, because you want duplicates removed from consideration so that they do not match to other records in subsequent passes.

However, the independent option is useful when you want to link people or organizations regardless of address. For example, you can link together all the locations where a doctor practices.

The transitive option is useful if you want to account for inconsistent data entry in fields that assist in duplicate identification, for example, date of birth and driver's license numbers.

An example of processing for the dependent and independent match types

The following example shows how to use the independent match type with the One-source Match stage. The table shows four records that describe the same person. You require that all records concerning the same person match without regard to address.
Table 1. Four records that describe the same person
Record Name Address Tax ID
1 William Nickson 123 Rodeo Drive 123456789
2 Bill Nixon 123 Rodeo Drive  
3 B Nickson 978 Sunset Blvd. 123456789
4 Nickson 456 Western Ave. 123456789
The matching process using this data yields different results depending on the match type that you choose:
  • Dependent
    • The first pass blocks and matches on Name and Address. Records 1 and 2 are considered a matched pair. Records 3 and 4 are considered nonmatched records.
    • If Record 2 (without the TaxID) is selected as the master, and Record 1 is considered a duplicate, then Record 1 is not available for the second pass.
    • If the second pass blocks and matches on Name and TaxID, then only Records 3 and 4 match. The result is two groups of matched records: Records 1 and 2, and Records 3 and 4.
  • Independent
    • The first pass results are the same as the dependent match. Records 1 and 2 are considered a matched pair. Records 3 and 4 are considered nonmatched records.
    • If Record 2 (without the TaxID) is selected as the master record in the second pass, the duplicate record, Record 1, is also compared to the rest of the records. When you block on Name and TaxID, records 1, 3, and 4 match. Since Record 1 matched Record 2 in the first pass, the output is one group with all four records linked.

An example of processing for the dependent and transitive match types

The following example shows how to use the transitive match type option with the One-source Match stage. The table shows six records that show a difference of one day between records of the same family name. You require that records of the same family name match if the dates have a difference of one day or less.
Table 2. Records that show a difference of one day between records of the same family name
Record Family name Date Given name
qsMatch
Type
qsMatch
Weight
qsMatch
Pass
Number
qsMatch
SetID
qsMatch
DataID
5 Clifford 19530831 Benn MP 0 1 5 5
7 Clifford 19530829 George DA 0 1 5 7
6 Clifford 19530830 George DA 0 1 5 6
8 Clifford 19530731 Thomas MP 0 1 8 8
9 Clifford 19530801 David DA 0 1 8 9
10 Clifford 19530802 David DA 0 1 8 10
The matching process that uses this data yields different results depending on the match type that you choose:
  • Dependent
    • The first pass blocks on Family Name and matches on Date by using a date tolerance of one day. Records 5 and 6 are considered a matched pair.
    • If Record 5 is selected as the master record, Record 6 is not available for the second pass and no other records match.
  • Transitive
    • The first pass blocks on Family Name and matches on Date by using a date tolerance of one day. Records 5 and 6 are considered a matched pair.
    • If Record 5 is selected as the master record, Record 6 is available for subsequent passes and is compared to the rest of the records. Records 6 and 7 are considered a matched pair. Because Record 5 matched Record 6 in the first pass, the result is one group in which all three records are linked. Records 5 and 6 are matched. Records 6 and 7 are matched. Therefore Records 5, 6, and 7 are within the same match set.