Sharing DataStage artifacts with all IBM Cloud Object Storage containers

IBM Cloud Object storage is used to store IBM® DataStage® artifacts such sequential files, data sets, and file sets. Set up IBM Cloud Object Storage to store these artifacts. After the IBM Cloud Object Storage container is set up, it can be accessed across different runtime containers and used by different stages in your data flows.

On the Cloud, DataStage jobs can run in different runtime containers. If the DataStage artifacts such as sequential files, data sets, and file sets are written to a local disk of those containers, they will not be accessible for other jobs that might be in other containers. So, these artifacts are written to IBM Cloud Object Storage, which is accessible from any of the containers.

DataStage on Cloud reads and writes the following DataStage artifacts that are stored in IBM Cloud Object Storage:

Sequential Files (text/binary)
Data sets (binary)
File sets (text)
Lookup file sets (text)
Schema files (text)
Range map files (binary)

The artifacts are automatically stored in the following bucket structure:

DataStage/datasets
DataStage/files
DataStage/schema

Data sets, file sets, and Lookup file sets

Data sets, file sets, and Lookup file sets are created by IBM DataStage when you are working with a data flow. Data sets, file sets, and lookup file sets are stored as descriptor files. These files contain information about where the actual data is located, as well as the data file names and their locations.

All of the descriptor files are written to the DataStage/datasets/ directory. All of the data files that belong to these data sets, file sets, or lookup file sets get stored in the DataStage/data/ directory. The names and paths of the descriptor files cannot be prefixed with cos://. The prefix is not supported.

Sequential Files

All of the sequential files that are created by using the Sequential File stage are stored in and read from the DataStage/files/ directory. For example, DataStage/files/sequential_file.txt. File sets and Lookup file sets are some of the files that are created by the Sequential File stage. If the path to the sequential file starts with “cos://”, then the file is created in the top-level directory in the Cloud Object Storage bucket.

Schema files

Schema files are read and written by IBM DataStage flows from the DataStage/schemas/ directory, unless the file path to the files starts with “cos://”. If the path starts with “cos://”, then the files will be in the top-level directory in the Cloud Object Storage bucket. For example, you would specify schemafile.txt to access that particular file under the directory DataStage/schemas/.

Schema files are created manually and are uploaded and read from stages. From the options section in the stage editor, you can specify the location of a schema file that you want to use in a stage.

The following stages can read schema files from the IBM Cloud Object Storage:

Row Generator
Sequential File
Fileset
Column Import
Column Export
Transformer

File pattern

File patterns that start with a common prefix name are supported. All other file patterns are not supported.