File set in DataStage
Use a file set as a source or target. You can read data from or write data to a file set.
The file set can have a single input link, a single output link, and a single rejects link. It only executes in parallel mode.
IBM® DataStage® can generate and name exported files, write them to their destination, and list the files it has generated in a file whose extension is, by convention, .fs. The data files and the file that lists them are called a file set. This capability is useful because some operating systems impose a 2 GB limit on the size of a file and you need to distribute files among nodes to prevent overruns.
The amount of data that can be stored in each destination data file is limited by the characteristics of the file system and the amount of free disk space available. The number of files created by a file set depends on:
- The number of processing nodes in the default node pool
- The number of disks in the export or default disk pool connected to each processing node in the default node pool
- The size of the partitions of the data set
Unlike data sets, file sets carry formatting information that describe the format of the files to be read or written.
Stage tab
- Execution Mode. The stage can execute in parallel mode or sequential mode. In parallel mode the contents of the data set are processed by the available nodes as specified in the Configuration file, and by any node constraints specified on the Advanced tab. In Sequential mode the entire contents of the data set are processed by the conductor node.
- Combinability mode. This is Auto by default, which allows IBM DataStage to combine the operators that underlie parallel stages so that they run in the same process if it is sensible for this type of stage.
- Preserve partitioning. You can select Propagate, Set or Clear. If you select Set file read operations will request that the next stage preserves the partitioning as is. Propagate takes the setting of the flag from the previous stage.
Input tab
The Input tab allows you to specify details about how the file set writes data. The file set can have only one input link. See Input tab (DataStage) for additional information.
Output tab
The Output tab allows you to specify details about how data is read from a file set. See Output tab (DataStage) for additional information.