0 / 0
Column Import stage in DataStage: output link format section

Output link format section

The Format section on the Output tab allows you to supply information about the format of the column you are importing.

You use format section in the same way as you would to describe the format of a flat file you were reading.

The following sections list the property types and properties available for each type.

Record level

These properties define details about how data records are formatted in the flat file. Where you can enter a character, it can usually be an ASCII character. The available properties are shown in the following list:

  • Final delimiter. Specify the single character that is written after the last column of a record in place of the field delimiter. Type a character or select one of white space, end, none, null, tab, or comma. IBM DataStage skips the specified delimiter string when reading the file. See the following diagram for an illustration.
    • whitespace. The last column of each record will not include any trailing white spaces that are found at the end of the record.
    • end. The last column of each record does not include the field delimiter. This setting is the default.
    • none. The last column of each record does not have a delimiter, which is used for fixed-width fields.
    • null. The last column of each record is delimited by the ASCII null character.
    • comma. The last column of each record is delimited by the ASCII comma character.
    • tab. The last column of each record is delimited by the ASCII tab character.
  • Intact. The intact property specifies an identifier of a partial schema. A partial schema specifies that only one or more columns that are named in the schema can be modified by the stage. All other columns in the row are passed through unmodified. The file containing the partial schema is specified in the Schema File property on the Outputs tab. This property has a dependent property:
    • Check intact. Select this setting to force validation of the partial schema as the file or files are imported. This setting can degrade performance.
  • Record delimiter string. Specify the string at the end of each record. Enter one or more characters. This setting is mutually exclusive with Record delimiter, which is the default, and record type and record prefix.
  • Record delimiter. Specify the single character at the end of each record. Type a character or select one of the following options:
    • UNIX Newline (the default)
    • null

      (To specify a DOS newline, use the Record delimiter string property set to "\R\N" or choose Format as > DOS line terminator from the menu.)

      Record delimiter is mutually exclusive with Record delimiter string, Record prefix, and record type.

  • Record length. Select Fixed where fixed-length fields are being read. IBM DataStage calculates the appropriate length for the record. Alternatively specify the length of fixed records as number of bytes. This setting is not used by default (default files are comma-delimited).
  • Record Prefix. Specifies that a variable-length record is prefixed by a 1-, 2-, or 4-byte length prefix. It is set to 1 by default. This setting is mutually exclusive with Record delimiter, which is the default, and record delimiter string and record type.
  • Record type. Specifies that data consists of variable-length blocked records (varying) or implicit records (implicit). If you choose the implicit property, data is written as a stream with no explicit record boundaries. The end of the record is inferred when all of the columns that are defined by the schema are parsed. You can use the varying property to specify one of the following IBM blocked or spanned formats: V, VB, VS, VBS, or VR. Data is imported by using one of these formats. These formats are not available for export.

    This property is mutually exclusive with Record length, Record delimiter, Record delimiter string, and Record prefix and by default is not used.

Field Defaults

Defines default properties for columns that are read from the file or files. These properties are applied to all columns, but can be overridden for individual columns from the Columns tab by using the Edit Column Metadata dialog box. A common reason to override a property for an individual column occurs when comma-separated values (CSV) files are being read. CSV files often enclose fields in quotation marks when the fields might contain a special character, such as the field delimiter. In this case, the Quote property for the columns in question should be overridden.

Where you can enter a character, this can usually be an ASCII character. The available properties are:

  • Actual field length. Specifies the actual number of bytes to skip if the field's length equals the setting of the null field length property.
  • Delimiter. Specifies the trailing delimiter of all fields in the record. Type an ASCII character or select one of whitespace, end, none, null, comma, or tab. IBM DataStage skips the delimiter when reading.
    • whitespace. Whitespace characters at the end of a column are ignored, that is, are not treated as part of the column.
    • end. The end of a field is taken as the delimiter, that is, there is no separate delimiter. This is not the same as a setting of `None', which is used for fields with fixed-width columns.
    • none. No delimiter (used for fixed-width).
    • null. ASCII Null character is used.
    • comma. ASCII comma character is used.
    • tab. ASCII tab character is used.
  • Delimiter string. Specify the string at the end of each field. Enter one or more characters. This is mutually exclusive with Delimiter, which is the default. For example, specifying `, ` (comma space - you do not need to enter the inverted commas) specifies each field is delimited by `, ` unless overridden for individual fields. IBM DataStage skips the delimiter string when reading.
  • Null field length. The length in bytes of a variable-length field that contains a null. When a variable-length field is read, a length of null field length in the source field indicates that it contains a null. This property is mutually exclusive with null field value.
  • Null field value. Specifies the value that is given to a null field if the source is set to null. Can be a number, string, or C-type literal escape character. For example, you can represent a byte value by \ooo, where each o is an octal digit 0 - 7 and the first o is < 4, or by \xhh, where each h is a hexadecimal digit 0 - F. You must use this form to encode nonprintable byte values.

    This property is mutually exclusive with Null field length and Actual length. For a fixed width data representation, you can use Pad char (from the general section of Type defaults) to specify a repeated trailing character if the value you specify is shorter than the fixed width of the field.

    You can specify a list of null values that a column might contain that represent null. Specify a separator character in the dependent Null field value separator property, and then use this separator to delimit the null values in the Null field value property. For example, if you set Null field value separator to contain the slash character (/), then you could specify NULL/null/NUL/nul to specify that any of these strings could represent a null value in this column.

  • Null field value separator

    This is a dependent property of Null field value. You can specify a separator that can be used in the Null field value property to specify a range of values that might represent the null value. You can specify a number, string, or C-type literal escape character (as for Null field value) as a separator, but a single character such as a comma (,) or slash (/) character is the best choice. You must specify a separator only if you specify multiple values in Null field value; specifying a separator without using it will cause a runtime error.

  • Prefix bytes. You can use this option with variable-length fields. Variable-length fields can be either delimited by a character or preceded by a 1-, 2-, or 4-byte prefix that contains the field length. IBM DataStage reads the length prefix but does not include the prefix as a separate field in the data set it reads from the file.

    This property is mutually exclusive with the Delimiter, Quote, and Final Delimiter properties, which are used by default.

  • Print field. This property is intended for use when debugging jobs. Set it to have IBM DataStage produce a message for every field it reads. The message has the format:
    Importing N: D
    where:
    • N is the field name.
    • D is the imported data of the field. Nonprintable characters that are contained in D are prefixed with an escape character and written as C string literals; if the field contains binary data, it is output in octal format.
  • Quote. Specifies that variable length fields are enclosed in single quotation marks, double quotation marks, or another character or pair of characters. Choose Single or Double, or enter a character. This is set to double quotation marks by default.

    When reading, IBM DataStage ignores the leading quotation character and reads all bytes up to but not including the trailing quotation character.

  • Vector prefix. For fields that are variable length vectors, specify that a 1-, 2-, or 4-byte prefix contains the number of elements in the vector. You can override this default prefix for individual vectors.

    Variable-length vectors must use either a prefix on the vector or a link to another field to specify the number of elements in the vector. If the variable length vector has a prefix, you use this property to indicate the prefix length. IBM DataStage reads the length prefix but does not include it as a separate field in the data set. By default, the prefix length is assumed to be 1 byte.

Type Defaults

These properties apply to all columns of a specific data type unless specifically overridden at the column level. They are divided into a number of subgroups according to data type.

General

These properties apply to several data types (unless overridden at column level):
  • Byte order. Specifies how multiple byte data types (except string and raw data types) are ordered. Choose from:
    • little-endian. The high byte is on the right.
    • big-endian. The high byte is on the left.
    • native-endian. As defined by the native format of the machine. This is the default.
  • Data Format. Specifies the data representation format of a field. Applies to fields of all data types except string, ustring, and raw and to record, subrec or tagged fields containing at least one field that is not string or raw. Choose from:
    • binary
    • text (the default)

      A setting of binary has different meanings when applied to different data types:

    • For decimals, binary means packed.
    • For other numerical data types, binary means "not text".
    • For dates, binary is equivalent to specifying the julian property for the date field.
    • For time, binary is equivalent to midnight_seconds.
    • For timestamp, binary specifies that the first integer contains a Julian day count for the date portion of the timestamp and the second integer specifies the time portion of the timestamp as the number of seconds from midnight. A binary timestamp specifies that two 32-but integers are written.

      By default data is formatted as text, as follows:

    • For the date data type, text specifies that the data read, contains a text-based date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system.
    • For the decimal data type: a field represents a decimal in a string format with a leading space or '-' followed by decimal digits with an embedded decimal point if the scale is not zero. The destination string format is: [+ | -]ddd.[ddd] and any precision and scale arguments are ignored.
    • For numeric fields (int8, int16, int32, uint8, uint16, uint32, sfloat, and dfloat): IBM DataStage assumes that numeric fields are represented as text.
    • For the time data type: text specifies that the field represents time in the text-based form %hh:%nn:%ss.
    • For the timestamp data type: text specifies a text-based timestamp in the form %yyyy-%mm-%dd %hh:%nn:%ss.
  • Field max width. The maximum number of bytes in a column that is represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width character set, you can calculate the length exactly. If you are using variable-length character set, calculate an adequate maximum width for your fields. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type.
  • Field width. The number of bytes in a field that is represented as a string. Enter a number. This is useful where you are storing numbers as text. If you are using a fixed-width charset, you can calculate the number of bytes exactly. If it's a variable length encoding, base your calculation on the width and frequency of your variable-width characters. Applies to fields of all data types except date, time, timestamp, and raw; and record, subrec, or tagged if they contain at least one field of this type.

    If you specify neither field width nor field max width, numeric fields written as text have the following number of bytes as their maximum width:

    • 8-bit signed or unsigned integers: 4 bytes
    • 16-bit signed or unsigned integers: 6 bytes
    • 32-bit signed or unsigned integers: 11 bytes
    • 64-bit signed or unsigned integers: 21 bytes
    • single-precision float: 14 bytes (sign, digit, decimal point, 7 fraction, "E", sign, 2 exponent)
    • double-precision float: 24 bytes (sign, digit, decimal point, 16 fraction, "E", sign, 3 exponent)
  • Pad char. This property is ignored for output links.
  • Character set. Specifies the character set. Choose from ASCII or EBCDIC. The default is ASCII. Applies to all data types except raw and ustring and record, subrec, or tagged containing no fields other than raw or ustring.

String

These properties are applied to columns with a string data type, unless overridden at column level.
  • Export EBCDIC as ASCII. Not relevant for output links.
  • Import ASCII as EBCDIC. Select this to specify that ASCII characters are read as EBCDIC characters.

Decimal

These properties are applied to columns with a decimal data type unless overridden at column level.
  • Allow all zeros. Specifies whether to treat a packed decimal column containing all zeros (which is normally illegal) as a valid representation of zero. Select Yes or No. The default is No.
  • Decimal separator. Specify the ASCII character that acts as the decimal separator (period by default).
  • Packed. Select an option to specify what the decimal columns contain, choose from:
    • Yes to specify that the decimal fields contain data in packed decimal format (the default). This has the following subproperties:

      Check. Select Yes to verify that data is packed, or No to not verify.

      Signed. Select Yes to use the existing sign when reading decimal fields. Select No to write a positive sign (0xf) regardless of the fields' actual sign value.

    • No (separate) to specify that they contain unpacked decimal with a separate sign byte. This has the following subproperty:

      Sign Position. Choose leading or trailing as appropriate.

    • No (zoned) to specify that they contain an unpacked decimal in either ASCII or EBCDIC text. This has the following subproperty:

      Sign Position. Choose leading or trailing as appropriate.

    • No (overpunch) to specify that the field has a leading or end byte that contains a character that specifies both the numeric value of that byte and whether the number as a whole is negatively or positively signed. This has the following subproperty:

      Sign Position. Choose leading or trailing as appropriate.

  • Precision. Specifies the precision of a packed decimal. Enter a number.
  • Rounding. Specifies how to round the source field to fit into the destination decimal when reading a source field to a decimal. Choose from:
    • up (ceiling). Truncate source column toward positive infinity. This mode corresponds to the IEEE 754 Round Up mode. For example, 1.4 becomes 2, -1.6 becomes -1.
    • down (floor). Truncate source column toward negative infinity. This mode corresponds to the IEEE 754 Round Down mode. For example, 1.6 becomes 1, -1.4 becomes -2.
    • nearest value. Round the source column toward the nearest representable value. This mode corresponds to the COBOL ROUNDED mode. For example, 1.4 becomes 1, 1.5 becomes 2, -1.4 becomes -1, -1.5 becomes -2.
    • truncate toward zero. This is the default. Discard fractional digits to the right of the rightmost fractional digit supported by the destination, regardless of sign. For example, if the destination is an integer, all fractional digits are truncated. If the destination is another decimal with a smaller scale, truncate to the scale size of the destination decimal. This mode corresponds to the COBOL INTEGER-PART function. Using this method 1.6 becomes 1, -1.6 becomes -1.
  • Scale. Specifies the scale of a source packed decimal.

Numeric

These properties apply to integer and float fields unless overridden at column level.
  • C_format. Perform nondefault conversion of data from string data to a integer or floating-point. This property specifies a C-language format string that is used for reading integer or floating point strings. This is passed to sscanf(). For example, specifying a C-format of %x and a field width of 8 ensures that a 32-bit integer is formatted as an 8-byte hexadecimal string.
  • In_format. Format string that is used for conversion of data from string to integer or floating-point data This is passed to sscanf(). By default, IBM DataStage launches the C sscanf() function to convert a numeric field formatted as a string to either integer or floating point data. If this function does not output data in a satisfactory format, you can specify the in_format property to pass formatting arguments to sscanf().
  • Out_format. This property is not relevant for output links.

Date

These properties are applied to columns with a date data type unless overridden at column level. All of these are incompatible with a Data Format setting of Text.
  • Days since. Dates are written as a signed integer that contains the number of days since the specified date. Enter a date in the form %yyyy-%mm-%dd or in the default date format if you have defined a new one on an NLS system.
  • Format string. The string format of a date. By default this is %yyyy-%mm-%dd.
  • Is Julian. Select this to specify that dates are written as a numeric value containing the Julian day. A Julian day specifies the date as the number of days from 4713 BCE January 1, 12:00 hours (noon) GMT.

Time

These properties are applied to columns with a time data type unless overridden at column level. All of these are incompatible with a Data Format setting of Text.
  • Format string. Specifies the format of columns representing time as a string. By default this is %hh-%mm-%ss.
  • Is midnight seconds. Select this to specify that times are written as a binary 32-bit integer containing the number of seconds elapsed from the previous midnight.

Timestamp

These properties are applied to columns with a timestamp data type unless overridden at column level.
  • Format string. Specifies the format of a column representing a timestamp as a string. The format combines the format for date strings and time strings.
Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more