0 / 0
File set

File set

Use a file set as a source or target. You can read data from or write data to a file set.

The file set can have a single input link, a single output link, and a single rejects link. It only executes in parallel mode.

IBM® DataStage® can generate and name exported files, write them to their destination, and list the files it has generated in a file whose extension is, by convention, .fs. The data files and the file that lists them are called a file set. This capability is useful because some operating systems impose a 2 GB limit on the size of a file and you need to distribute files among nodes to prevent overruns.

The amount of data that can be stored in each destination data file is limited by the characteristics of the file system and the amount of free disk space available. The number of files created by a file set depends on:

  • The number of processing nodes in the default node pool
  • The number of disks in the export or default disk pool connected to each processing node in the default node pool
  • The size of the partitions of the data set

Unlike data sets, file sets carry formatting information that describe the format of the files to be read or written.

Double-click the file set to open the properties panel. The panel has up to three tabs, depending on whether you are reading or writing a data set:

Stage tab

You can specify the following Advanced properties:
  • Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the contents of the data set are processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced tab. In Sequential mode the entire contents of the data set are processed by the conductor node.
  • Combinability mode. This is Auto by default, which allows IBM DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
  • Preserve partitioning. You can select Propagate, Set or Clear. If you select Set file read operations will request that the next stage preserves the partitioning as is. Propagate takes the setting of the flag from the previous stage.

Input tab

The Input tab allows you to specify details about how the file set writes data. The file set can have only one input link. See Input tab for additional information.

Output tab

The Output tab allows you to specify details about how data is read from a file set. See Output tab for additional information.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more