Lookup file set | IBM Cloud Pak for Data as a Service

Lookup file set

Use the Lookup file set to create a file set or reference one for a lookup.

Use a Lookup file set to create a new file set or reference one for a lookup. A Lookup file set can have a single input link or a single output link. The output link must be a reference link. The Lookup file set can be configured to execute in parallel or sequential mode when used with an input link.

When creating Lookup file sets, one file will be created for each partition. The individual files are referenced by a single descriptor file, which by convention has the suffix .fs.

When you use a Lookup file set as a source for lookup data, there are special considerations about column naming. If you have columns of the same name in both the source and lookup data sets, the source data set column will go to the output data. If you want this column to be replaced by the column from the lookup data source, you need to drop the source data column before you perform the lookup (you could, for example, use a Modify stage to do this).

Double-click the Lookup file set to open the properties panel. The panel has up to three tabs, depending on whether you are reading or writing a data set:

Stage tab
Input
Output

Stage tab

You can specify the following Advanced properties:

Execution Mode. The connector can execute in parallel mode or sequential mode. In parallel mode the contents of the data set are processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced tab. In Sequential mode the entire contents of the data set are processed by the conductor node.
Combinability mode. This is Auto by default, which allows IBM® DataStage® to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
Preserve partitioning. You can select Propagate, Set or Clear. If you select Set file read operations will request that the next stage preserves the partitioning as is. Propagate takes the setting of the flag from the previous stage.

Input

The Input tab allows you to specify details about how the Lookup file set writes data to a file set. The Lookup file set can have only one input link.

Properties:

Lookup Keys

Specifies the name of a lookup key column. The Key property can be repeated if there are multiple key columns. The property has two dependent properties:

Case Sensitive
This is a dependent property of Key and specifies whether the parent key is case sensitive or not. Set to True by default.
Keep
This specifies whether the key column needs to be propagated to the output stream after the lookup, or used in a subsequent condition expression. Set to False by default.

Target:

Lookup file set: Defines the file set. You can type in a pathname of a file set descriptor file (by convention ending in .fs).

Lookup range:

Specifies whether a range lookup is required. A range lookup compares the value of a source column to a range of values between two lookup columns. Alternatively, you can compare the value of a lookup column to a range of values between two source columns.

When you specify range lookup, you usually must also set the property Allow Duplicates in the Options category, to True. You allow duplicates so that primary keys can return multiple rows.

This Range Lookup property has the following dependent properties:

Bounded Column
Specifies the input column to use for the lookup file set. The lookup table is built with the bounded column, which will then be compared to a range of columns from the source data set. This property is mutually exclusive with Lower Bound and Upper Bound. It has two dependent properties:
- Case Sensitive
  This is an optional property. Specifies whether the column is case sensitive or not. Set to True by default.
- Keep
  Specifies whether the column needs to be propagated to the output stream after the lookup, or used in a subsequent condition expression. Set to False by default.
Lower Bound
Specifies the input column to use for the lower bound of the range in the lookup table. This cannot be the same as the upper bound column and is mutually exclusive with Bounded Column. It has two dependent properties:
- Case Sensitive
  This is an optional property. Specifies whether the column is case sensitive or not. Set to True by default.
- Keep
  Specifies whether the column needs to be propagated to the output stream after the lookup, or used in a subsequent condition expression. Set to False by default.
Upper Bound
Specifies the input column to use for the upper bound of the range in the lookup table. This cannot be the same as the lower bound column and is mutually exclusive with Bounded Column. It has two dependent properties:
- Case Sensitive
  This is an optional property. Specifies whether the column value is case sensitive or not. Set to True by default.
- Keep
  Specifies whether the column needs to be propagated to the output stream after the lookup, or used in a subsequent condition expression. Set to False by default.

Options:

Allow duplicates: Set this to cause multiple copies of duplicate records to be saved in the lookup table without a warning being issued. Two lookup records are duplicates when all lookup key columns have the same value in the two records. If you do not specify this option, the system issues a warning message when it encounters duplicate records and discards all but the first of the matching records.

Disk pool: This is an optional property. Specify the name of the disk pool into which to write the file set. You can also specify a job parameter.

Output

The Output tab allows you to specify details about how the Lookup File Set connector references a file set.

Properties:

Lookup file set: This property defines the file set that the data will be referenced from. You can type in a path name of a file set descriptor file (by convention ending in .fs).