0 / 0
Input tab

Input tab

The Input tab allows you to specify details about how data is written to a file set.

Below is a description of each property on the input tab:
File set
This property defines the file set that the incoming data will be written to. You can type in a path name of, or browse for a file set descriptor file (by convention ending in .fs).
File set update policy
Specifies what action will be taken if the file set you are writing to already exists. Select:
  • Create (Error if exists)
  • Overwrite (default)
  • Use Existing (Discard records). Keeps existing files listed in a descriptor file (for example, datasetname.ds or filesetname.fs) but discards the old records. You receive an error if the data set with different schema already exists.
  • Use Existing (Discard schema & records). Keeps existing files listed in a descriptor file (for example, datasetname.ds or filesetname.fs) but discards the old schema and records.
Target additional properties:
File set schema policy
Specifies whether the schema should be written to the file set. Choose from Write or Omit. The default is Write.
Clean up on failure
This is set to True by default and specifies that the stage will delete any partially written files if the stage fails for any reason. Set this to False to specify that partially written files should be left.
Single file per partition
Set this to True to specify that one file is written for each partition. The default is False.
Reject mode
Allows you to specify behavior if a record fails to be written for some reason. Choose from Continue to continue operation and discard any rejected rows, Fail to cease reading if any rows are rejected, or Save to send rejected rows down a reject link. Defaults to Continue.
Disk pool
This is an optional property. Specify the name of the disk pool into which to write the file set. You can also specify a job parameter.
File prefix
This is an optional property. Specify a prefix for the name of the file set components. If you do not specify a prefix, the system writes the following: export.username, where username is your login. You can also specify a job parameter.
File suffix
This is an optional property. Specify a suffix for the name of the file set components. The suffix is omitted by default.
Maximum file size
This is an optional property. Specify the maximum file size in MB. The value must be equal to or greater than 1.
Schema file
This is an optional property. By default the File Set stage will use the column definitions defined on the Columns tab and formatting information from the Format tab as a schema for writing the file. You can, however, specify a file containing a schema instead (note, however, that if you have defined columns on the Columns tab, you should ensure these match the schema file). Type in a path name.

Format

If you do not alter any of the Format settings, the files are produced in the following format:

  • Files comprise variable length columns contained within double quotes.
  • All columns are delimited by a comma, except for the final column in a row.
  • Rows are delimited by a UNIX newline.

The following sections list the Property types and properties available for each type.

Record level

These properties define details about how data records are formatted in the flat file. Where you can enter a character, this can usually be an ASCII character or a multi-byte Unicode character (if you have NLS enabled). The available properties are:
  • Fill char. Specify an ASCII character or a value in the range 0 to 255. You can also choose Space or Null from a drop-down list. This character is used to fill any gaps in a written record caused by column positioning properties. Set to 0 by default (which is the NULL character). For example, to set it to space you could also type in the space character or enter 32. Note that this value is restricted to one byte, so you cannot specify a multi-byte Unicode character.
  • Final delimiter string. Specify a string to be written after the last column of a record in place of the column delimiter. Enter one or more characters, this precedes the record delimiter if one is used. Mutually exclusive with Final delimiter, which is the default. For example, if you set Delimiter to comma and Final delimiter string to `, ` (comma space - you do not need to enter the inverted commas) all fields are delimited by a comma, except the final field, which is delimited by a comma followed by an ASCII space character.
  • Final delimiter. Specify a single character to be written after the last column of a record in place of the field delimiter. Type a character or select one of whitespace, end, none, null, tab, or comma. See the following diagram for an illustration.
    • whitespace. The last column of each record will not include any trailing white spaces found at the end of the record.
    • end. The last column of each record does not include the field delimiter. This is the default setting.
    • none. The last column of each record does not have a delimiter; used for fixed-width fields.
    • null. The last column of each record is delimited by the ASCII null character.
    • comma. The last column of each record is delimited by the ASCII comma character.
    • tab. The last column of each record is delimited by the ASCII tab character.

      When writing, a space is now inserted after every field except the last in the record. Previously, a space was inserted after every field including the last. (If you want to revert to the pre-release 7.5 behavior of inserting a space after the last field, set the APT_FINAL_DELIM_COMPATIBLE environment variable.

  • Intact. The intact property specifies an identifier of a partial schema. A partial schema specifies that only the column(s) named in the schema can be modified by the stage. All other columns in the row are passed through unmodified. The file containing the partial schema is specified in the Schema File property on the Properties tab. This property has a dependent property, Check intact, but this is not relevant to input links.
  • Record delimiter string. Specify a string to be written at the end of each record. Enter one or more characters. This is mutually exclusive with Record delimiter, which is the default, record type and record prefix.
  • Record delimiter. Specify a single character to be written at the end of each record. Type a character or select one of the following:
    • UNIX Newline (the default)
    • null

    (To implement a DOS newline, use the Record delimiter string property set to "\R\N" or choose Format as > DOS line terminator from the shortcut menu.)

    Note: Record delimiter is mutually exclusive with Record delimiter string, Record prefix, and Record type.
  • Record length. Select Fixed where fixed length fields are being written. IBM® DataStage® calculates the appropriate length for the record. Alternatively specify the length of fixed records as number of bytes. This is not used by default (default files are comma-delimited). The record is padded to the specified length with either zeros or the fill character if one has been specified.
  • Record Prefix. Specifies that a variable-length record is prefixed by a 1-, 2-, or 4-byte length prefix. It is set to 1 by default. This is mutually exclusive with Record delimiter, which is the default, and record delimiter string and record type.
  • Record type. Specifies that data consists of variable-length blocked records (varying) or implicit records (implicit). If you choose the implicit property, data is written as a stream with no explicit record boundaries. The end of the record is inferred when all of the columns defined by the schema have been parsed. The varying property allows you to specify one of the following IBM blocked or spanned formats: V, VB, VS, VBS, or VR. Data is imported by using one of these formats. These formats are not available for export.

    This property is mutually exclusive with Record length, Record delimiter, Record delimiter string, and Record prefix and by default is not used.

Field defaults

Defines default properties for columns written to the file or files. These are applied to all columns written, but can be overridden for individual columns from the Columns tab using the Edit Column Metadata dialog box. Where you can enter a character, this can usually be an ASCII character or a multi-byte Unicode character (if you have NLS enabled). The available properties are:
  • Actual field length. Specifies the number of bytes to fill with the Fill character when a field is identified as null. When DataStage identifies a null field, it will write a field of this length full of Fill characters. This is mutually exclusive with Null field value.
  • Delimiter. Specifies the trailing delimiter of all fields in the record. Type an ASCII character or select one of whitespace, end, none, null, comma, or tab.
    • whitespace. Whitespace characters at the end of a column are ignored, that is, are not treated as part of the column.
    • end. The end of a field is taken as the delimiter, that is, there is no separate delimiter. This is not the same as a setting of `None' which is used for fields with fixed-width columns.
    • none. No delimiter (used for fixed-width).
    • null. ASCII Null character is used.
    • comma. ASCII comma character is used.
    • tab. ASCII tab character is used.
  • Delimiter string. Specify a string to be written at the end of each field. Enter one or more characters. This is mutually exclusive with Delimiter, which is the default. For example, specifying `, ` (comma space - you do not need to enter the inverted commas) would have each field delimited by `, ` unless overridden for individual fields.
  • Null field length. The length in bytes of a variable-length field that contains a null. When a variable-length field is written, DataStage writes a length value of null field length if the field contains a null. This property is mutually exclusive with null field value.
  • Null field value. Specifies the value written to null field if the source is set to null. Can be a number, string, or C-type literal escape character. For example, you can represent a byte value by \ooo, where each o is an octal digit 0 - 7 and the first o is < 4, or by \xhh, where each h is a hexadecimal digit 0 - F. You must use this form to encode non-printable byte values.

    This property is mutually exclusive with Null field length and Actual length. For a fixed width data representation, you can use Pad char (from the general section of Type defaults) to specify a repeated trailing character if the value you specify is shorter than the fixed width of the field.

    Null field value has a sub property named Null field value separator. This is intended for output data, and should be ignored on Format tabs belonging to input links.

  • Prefix bytes. Specifies that each column in the data file is prefixed by 1, 2, or 4 bytes containing, as a binary value, either the column's length or the tag value for a tagged field.

    You can use this option with variable-length fields. Variable-length fields can be either delimited by a character or preceded by a 1-, 2-, or 4-byte prefix containing the field length. DataStage inserts the prefix before each field.

    This property is mutually exclusive with the Delimiter, Quote, and Final Delimiter properties, which are used by default.

  • Print field. This property is not relevant for input links.
  • Quote. Specifies that variable length fields are enclosed in single quotes, double quotes, or another character or pair of characters. Choose Single or Double, or enter a character. This is set to double quotes by default.

    When writing, DataStage inserts the leading quote character, the data, and a trailing quote character. Quote characters are not counted as part of a field's length.

  • Vector prefix. For fields that are variable length vectors, specifies a 1-, 2-, or 4-byte prefix containing the number of elements in the vector. You can override this default prefix for individual vectors.

    Variable-length vectors must use either a prefix on the vector or a link to another field in order to specify the number of elements in the vector. If the variable length vector has a prefix, you use this property to indicate the prefix length. DataStage inserts the element count as a prefix of each variable-length vector field. By default, the prefix length is assumed to be one byte.

Type defaults

These are properties that apply to all columns of a specific data type unless specifically overridden at the column level. They are divided into a number of subgroups according to data type.

General

These properties apply to several data types (unless overridden at column level):
  • Byte order. Specifies how multiple byte data types (except string and raw data types) are ordered. Choose from:
    • little-endian. The high byte is on the right.
    • big-endian. The high byte is on the left.
    • native-endian. As defined by the native format of the machine. This is the default.
  • Data Format. Specifies the data representation format of a field. Applies to fields of all data types except string, ustring, and raw and to record, subrec or tagged fields containing at least one field that is neither string nor raw. Choose from:
    • binary
    • text (the default)

      A setting of binary has different meanings when applied to different data types:

    • For decimals, binary means packed.
    • For other numerical data types, binary means "not text".
    • For dates, binary is equivalent to specifying the julian property for the date field.
    • For time, binary is equivalent to midnight_seconds.
    • For timestamp, binary specifies that the first integer contains a Julian day count for the date portion of the timestamp and the second integer specifies the time portion of the timestamp as the number of seconds from midnight. A binary timestamp specifies that two 32-but integers are written.

      By default data is formatted as text, as follows:

    • For the date data type, text specifies that the data to be written contains a text-based date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system.
    • For the decimal data type: a field represents a decimal in a string format with a leading space or '-' followed by decimal digits with an embedded decimal point if the scale is not zero. The destination string format is: [+ | -]ddd.[ddd] and any precision and scale arguments are ignored.
    • For numeric fields (int8, int16, int32, uint8, uint16, uint32, sfloat, and dfloat): DataStage assumes that numeric fields are represented as text.
    • For the time data type: text specifies that the field represents time in the text-based form %hh:%nn:%ss or in the default date format if you have defined a new one on an NLS system.
    • For the timestamp data type: text specifies a text-based timestamp in the form %yyyy-%mm-%dd %hh:%nn:%ss or in the default date format if you have defined a new one on an NLS system.
  • Field max width. The maximum number of bytes in a column represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width character set, you can calculate the length exactly. If you are using variable-length character set, calculate an adequate maximum width for your fields. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type.
  • Field width. The number of bytes in a field represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width charset, you can calculate the number of bytes exactly. If it's a variable length encoding, base your calculation on the width and frequency of your variable-width characters. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type.

    If you specify neither field width nor field max width, numeric fields written as text have the following number of bytes as their maximum width:

    • 8-bit signed or unsigned integers: 4 bytes
    • 16-bit signed or unsigned integers: 6 bytes
    • 32-bit signed or unsigned integers: 11 bytes
    • 64-bit signed or unsigned integers: 21 bytes
    • single-precision float: 14 bytes (sign, digit, decimal point, 7 fraction, "E", sign, 2 exponent)
    • double-precision float: 24 bytes (sign, digit, decimal point, 16 fraction, "E", sign, 3 exponent)
    Important: If you are using Unicode character columns, you must calculate the field length in bytes and specify that value in the Field Width column property.
  • Pad char. Specifies the pad character used when strings or numeric values are written to an external string representation. Enter a character (single-byte for strings, can be multi-byte for ustrings) or choose null or space. The pad character is used when the external string representation is larger than required to hold the written field. In this case, the external string is filled with the pad character to its full length. Space is the default. Applies to string, ustring, and numeric data types and record, subrec, or tagged types if they contain at least one field of this type.
  • Character set. Specifies the character set. Choose from ASCII or EBCDIC. The default is ASCII. Applies to all data types except raw and ustring and record, subrec, or tagged containing no fields other than raw or ustring.

String

These properties are applied to columns with a string data type, unless overridden at column level.
  • Export EBCDIC as ASCII. Select this to specify that EBCDIC characters are written as ASCII characters. Applies to fields of the string data type and record, subrec, or tagged fields if they contain at least one field of this type.
  • Import ASCII as EBCDIC. Not relevant for input links.

Decimal

These properties are applied to columns with a decimal data type unless overridden at column level.
  • Allow all zeros. Specifies whether to treat a packed decimal column containing all zeros (which is normally illegal) as a valid representation of zero. Select Yes or No. The default is No.
  • Decimal separator. Specify the ASCII character that acts as the decimal separator (period by default).
  • Packed. Select an option to specify what the decimal columns contain, choose from:
    • Yes to specify that the decimal columns contain data in packed decimal format (the default). This has the following sub-properties:

      Check. Select Yes to verify that data is packed, or No to not verify.

      Signed. Select Yes to use the existing sign when writing decimal columns. Select No to write a positive sign (0xf) regardless of the columns' actual sign value.

    • No (separate) to specify that they contain unpacked decimal with a separate sign byte. This has the following sub-property:

      Sign Position. Choose leading or trailing as appropriate.

    • No (zoned) to specify that they contain an unpacked decimal in either ASCII or EBCDIC text. This has the following sub-property:

      Sign Position. Choose leading or trailing as appropriate.

    • No (overpunch) to specify that the field has a leading or end byte that contains a character which specifies both the numeric value of that byte and whether the number as a whole is negatively or positively signed. This has the following sub-property:

      Sign Position. Choose leading or trailing as appropriate.

  • Precision. Specifies the precision where a decimal column is written in text format. Enter a number. When a decimal is written to a string representation, DataStage uses the precision and scale defined for the source decimal field to determine the length of the destination string. The precision and scale properties override this default. When they are defined, DataStage truncates or pads the source decimal to fit the size of the destination string. If you have also specified the field width property, DataStage truncates or pads the source decimal to fit the size specified by field width.
  • Rounding. Specifies how to round a decimal column when writing it. Choose from:
    • up (ceiling). Truncate source column towards positive infinity. This mode corresponds to the IEEE 754 Round Up mode. For example, 1.4 becomes 2, -1.6 becomes -1.
    • down (floor). Truncate source column towards negative infinity. This mode corresponds to the IEEE 754 Round Down mode. For example, 1.6 becomes 1, -1.4 becomes -2.
    • nearest value. Round the source column towards the nearest representable value. This mode corresponds to the COBOL ROUNDED mode. For example, 1.4 becomes 1, 1.5 becomes 2, -1.4 becomes -1, -1.5 becomes -2.
    • truncate towards zero. This is the default. Discard fractional digits to the right of the right-most fractional digit supported by the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, truncate to the scale size of the destination decimal. This mode corresponds to the COBOL INTEGER-PART function. Using this method 1.6 becomes 1, -1.6 becomes -1.
  • Scale. Specifies how to round a source decimal when its precision and scale are greater than those of the destination. By default, when the DataStage writes a source decimal to a string representation, it uses the precision and scale defined for the source decimal field to determine the length of the destination string. You can override the default by means of the precision and scale properties. When you do, DataStage truncates or pads the source decimal to fit the size of the destination string. If you have also specified the field width property, DataStage truncates or pads the source decimal to fit the size specified by field width.

Numeric

These properties apply to integer and float fields unless overridden at column level.
  • C_format. Perform non-default conversion of data from integer or floating-point data to a string. This property specifies a C-language format string used for writing integer or floating point strings. This is passed to sprintf(). For example, specifying a C-format of %x and a field width of 8 ensures that integers are written as 8-byte hexadecimal strings.
  • In_format. This property is not relevant for input links..
  • Out_format. Format string used for conversion of data from integer or floating-point data to a string. This is passed to sprintf(). By default, DataStage invokes the C sprintf() function to convert a numeric field formatted as either integer or floating point data to a string. If this function does not output data in a satisfactory format, you can specify the out_format property to pass formatting arguments to sprintf().

Date

These properties are applied to columns with a date data type unless overridden at column level. All of these are incompatible with a Data Format setting of Text.
  • Days since. Dates are written as a signed integer containing the number of days since the specified date. Enter a date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system.
  • Format string. The string format of a date. By default this is %yyyy-%mm-%dd.
  • Is Julian. Select this to specify that dates are written as a numeric value containing the Julian day. A Julian day specifies the date as the number of days from 4713 BCE January 1, 12:00 hours (noon) GMT.

Time

These properties are applied to columns with a time data type unless overridden at column level. All of these are incompatible with a Data Format setting of Text.
  • Format string. Specifies the format of columns representing time as a string.
  • Is midnight seconds. Select this to specify that times are written as a binary 32-bit integer containing the number of seconds elapsed from the previous midnight.

Timestamp

These properties are applied to columns with a timestamp data type unless overridden at column level.
  • Format string. Specifies the format of a column representing a timestamp as a string. Defaults to %yyyy-%mm-%dd %hh:%nn:%ss. The format combines the format for date strings and time strings.
Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more