0 / 0
Amazon S3 connector (DataStage)

Amazon S3 connector (DataStage)

Use the Amazon S3 connector in DataStage®to connect to the Amazon Simple Storage Service (S3) and perform various read and write functions.

Prerequisite

Create the connection. For instructions, see Connecting to a data source in DataStage and the Amazon S3 connection.

DataStage properties

In the Stage tab Properties section, select Use DataStage properties to access properties that are specific for DataStage. These properties provide more features and granular control of the flow execution, similar to the "optimized" connectors.

If you select Use DataStage properties and the file is CSV format, the column values must have double quotation marks around them. If any customization is needed, use the connector File format properties to change the file format to Delimited. Then, select the field delimiter, row delimiter, quote character, and escape character.

Clear Use DataStage properties to access the Table format property selections.

Configuring the Amazon S3 connector as a source

The available properties for the Read mode depend on whether you select Use DataStage properties.

Configure the read process for when you select Use DataStage properties (default).
Table 1. Reading data from Amazon S3 with "Use DataStage properties" selected
Read mode Procedure
Read a single file Specify the bucket name that contains the file, and then specify the name of the file to read.
Read multiple files
  1. Specify the bucket name that contains the files.
  2. In the File name field, specify a prefix that the files you want to read must have in their file path.

    For example, if you enter transactions as the prefix, the connector reads all the files in the transactions folder, such as transactions/january/day1.txt, and a file named transactions.txt.

List buckets No additional configuration is needed.
List files
  1. Specify the bucket name that contains the files.
  2. Optional: In the File name field, specify a prefix that the files you want to read must have in their file path.

    For example, if you enter transactions as the prefix, the connector lists all the files in the transactions folder, such as transactions/january/day1.txt, and a file named transactions.txt.

    If you do not specify a file name prefix, all the files in the bucket container are listed.

Configure the read process for when you clear Use DataStage properties.

Table 2. Reading data from Amazon S3 with "Use DataStage properties" not selected
Read mode Procedure
Read a single file Specify the bucket name that contains the file, and then specify the name of the file to read.
Read binary data Specify the bucket name that contains the file, and then specify the name of the file to read.
Read binary data from multiple files by using wildcards Specify a wildcard character in the file name for binary data. For example, in File name write test.*.gz.

If you use this option, you can read multiple binary files one after another, and each file will be read as a record.

If you select Read a file to a row, you must provide two column names in the Output tab of the source stage:

  • The first column must be a string data type. This column is for the file name.
  • The second column must be a binary data type. This column is for the file. The binary column precision value must be greater than or equal to the maximum file size.
Read multiple files by using regex expression Specify the bucket name that contains the files. You can use a Java regex expression for the file name.

Examples

  • ^csv_write_datatypes_h.[0-9]$
  • csv_write_datatypes_h.[^12]
Read multiple files by using wildcards Specify an asterisk (*) to match zero or more characters. For example, specify *.txt to match all files with the .txt extension.

Specify a question mark (?) to match one character.

Examples

  • csv_write_datatypes.*
  • ?_abc_test*

Configuring the Amazon S3 connector as a target

The available properties for the Write mode depend on whether you select Use DataStage properties.

Configure the write process for when you select Use DataStage properties (default).
Table 3. Writing data to Amazon S3 with "Use DataStage properties" selected
Write mode Procedure
Delete a file
  1. Specify the bucket name that contains the files or select Create bucket.
  2. In the File name field, specify a file name to delete.
Write to a file
  1. Specify the bucket name that contains the files.
  2. Select Append unique ID to append a unique set of characters to identify the bucket to the bucket name that is created.
  3. In the File name field, specify a file name to write to.
  4. Choose one of three options in If file exists: Do not overwrite file, Fail, or Overwrite file.

Configure the write process for when you clear Use DataStage properties.

Table 4. Writing data to Amazon S3 with "Use DataStage properties" not selected
Write mode Procedure
Delete a file
  1. Specify the bucket name that contains the files.
  2. In the Table action choose one of three options: Append, Replace, or Truncate.
  3. In the File name field, specify a file name to delete.
Write to a file
  1. Specify the bucket name that contains the files or select Create bucket.
  2. In the Table action choose one of three options: Append, Replace, or Truncate.
  3. In the Table format choose one of three options: Deltalake, Flat file, or Iceberg. If you choose Flat file the Partitioned option is available, which enable to write the file as multiple partitions.
  4. In the File name field, specify a file name to write to.
Write binary data
  1. Specify the bucket name that contains the files or select Create bucket.
  2. In the Table action choose one of three options: Append, Replace, or Truncate.
  3. In the File name field, specify a file name to write to.
Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more