The function of a Merge node is to take multiple input records
and create a single output record containing all or some of the input fields. This is a useful
operation when you want to merge data from different sources, such as internal customer data and
purchased demographic data.
You can merge data in the following ways.
Merge by Order concatenates corresponding records
from all sources in the order of input until the smallest data source is exhausted. It is important
if using this option that you have sorted your data using a Sort node.
Merge using a Key field, such as Customer
ID, to specify how to match records from one data source with records from the other(s).
Several types of joins are possible, including inner join, full outer join, partial outer join, and
anti-join.
Merge by Condition means that you can specify a
condition to be satisfied for the merge to take place. You can specify the condition directly in the
node, or build the condition using the Expression Builder.
Types of joins
Copy link to section
When using a key field for data merging, you may find it useful to spend some
time thinking about which records will be excluded and which will be included. There are a variety
of joins, which are discussed here. The two basic types of joins are referred to as inner and outer
joins. These methods are frequently used to merge tables from related datasets based on common
values of a key field, such as Customer ID. Inner joins allow for clean merging and
an output dataset that includes only complete records. Outer joins also include complete records
from the merged data, but they also allow you to include unique data from one or more input tables.
The types of joins allowed are described in greater detail as follows.
An inner join includes only records in which a value for the key field is
common to all input tables. That is, unmatched records won't be included in the output dataset.
A full outer join includes all records, both matching and nonmatching,
from the input tables. Left and right outer joins are referred to as partial outer joins.
A partial outer join includes all records matched using the key field as
well as unmatched records from specified tables. (Or, to put it another way, all records from some
tables and only matching records from others.) You can select tables (such as A and B shown here)
for inclusion in the outer join under the Merge options of the node properties. Partial joins are
also called left or right outer joins when only two tables are being merged. Since SPSS Modeler
allows the merging of more than two tables, we refer to this as a partial outer join.
An anti-join includes only unmatched records for the first
input table (Table A shown here). This type of join is the opposite of an inner join and doesn't
include complete records in the output dataset.
For example, if you have information about farms in one dataset and
farm-related insurance claims in another, you can match the records from the first source to the
second source using the Merge options. To determine if a customer in your farm sample has filed an
insurance claim, use the inner join option to return a list showing where all IDs match from the two
samples.
Figure 1. Sample output for an inner join merge
Using the full outer join option returns both matching and nonmatching records
from the input tables. The system-missing value ($null$) will be used for any
incomplete values.
Figure 2. Sample output for a full outer join merge
A partial outer join includes all records matched using the key field as well
as unmatched records from specified tables. The table displays all of the records matched from the
ID field as well as the records matched from the first dataset.
Figure 3. Sample output for a partial outer join merge
If you're using the anti-join option, the table returns only unmatched records
for the first input table.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.