Data types

Table of contents

  1. Value formats
  2. Input and output values in Code operators
  3. Numbers
  4. Dates

Value formats

The section is about parsing and formatting of data, depending on its type.

Parsing values from JSON

The following table gives examples of valid values for the supported data types. Messages with invalid values are skipped and reported in the Notifications pane of the cMetrics</wintitle> page. This data type handling applies to all Source operators except the Code operator in Sources.

For valid output values in Code operators, see section Expected output values.

Flow column data types Valid value examples The default when value is missing
Text “Hello World” ”” (empty string)
Number 12
-7
3.14
“12”
“-7”
“3.14”
0 (zero)
Date “2018-03-26T11:38:47”
“2018 03 26 11:38:47”
“2018-03-26T11:38:47.123456”

For more information, see section Date.
No default value because a missing Date value is not valid. Messages with missing or empty Date values are skipped and reported in the Notifications pane of the Metrics page.
Boolean true
false
“true”
“false”

Note: These values are the only valid Boolean values, and they are case-sensitive.
false

Serializing values to JSON and CSV

Formatted values are in quotation marks (“) only for data types Text and Date. The following table shows examples:

Flow column data types Output value examples
Text “Hello World”
Number 12
-7
3.14
Date (*) “2018-03-26T15:38:47”(with no split seconds)
“2018-03-26T15:38:47.123456”
Boolean true
false

(*) For more information about inserting Date values into schema-less or schema-based Target operators, see Formatting for Target operators.

Input and output values in Code operators

Data values at Code operator input

The following table describes the data type mapping between flow columns and Python dictionary values at Code operator inputs. It applies to the Code operator and to the Python Machine Learning operator in Processing and Analytics.

Flow column data types Mapped to Python dictionary value data type Python value examples
Text str “Hello World”
Number float -2.5
3.14
Number int -5
451
Date datetime.datetime
datetime.datetime object

datetime.datetime(2018, 3, 27, 10, 42, 0, 892960)
Boolean bool True
False

Expected Code operator output values

The section applies to the Code operator in Sources, and to the Code operator and the Python Machine Learning operator in Processing and Analytics.

The following table gives examples of valid output values, by data type. Outputs with invalid values are skipped and reported in the Notifications pane of the Metrics page.

Flow column data types Output value data types Valid output value examples Default value
Text str

Any non-string values are auto-converted to strings
“Hello World”
“78”

Valid:
- 78 (converts to “78”)
- True (converts to “True”)
”” (empty string)
Number int 0
-4
451
0
Number float -3.14
0.0
17.9
0.0
Date datetime.datetime datetime.datetime object


- datetime.datetime.now()
- datetime.datetime(2018, 3, 27, 10, 42, 0, 892960)
- datetime.datetime.strptime( "2018-03-27T07:58:35", '%Y-%m-%dT%H:%M:%S')
datetime.datetime(1970, 1, 1, 0, 0, 0)
Boolean bool

Any non-bool value except 0, None, empty strings, and empty objects are auto-converted to True
True
False

The following values are also accepted:
“true” (converts to True)
“False” (converts to True) (*)
“go” (converts to True)
451 (converts to True)
”” (converts to False)
“0” (converts to False)
False

(*) False is a Boolean literal, but "False" is a string literal. Any non-empty string is interpreted as True, whether it’s “True”, “False”, or “yes sir”.

When your output dictionary has no entry for an output column, or when its value is None, then the streaming runtime acts in the following way:

  • If a same-name input column with the same column data type exists, its value is used (the value is passed through).

  • Otherwise, the default value for the output column’s data type is used.

Numbers

  • When Number column values are parsed from JSON messages of Source operators such as Event Streams or Watson IoT, the values are always mapped to float values in input dictionaries.

  • When Number column values are coming in from other operators, the Python type can be mapped to float or int values, depending on the operator logic. For example, the count function in the Aggregation operator produces values of type int.

  • In the Code operator, you can output either int or float values for output columns of type Number.

Dates

The following traits apply to Date values:

  • Date values express a point in time, and as such, include date and time information similar to a UNIX timestamp.

  • Split-second precision of Date values is up to microseconds (up to 6 digits after the decimal point).

  • Date values do not carry time zone information.

Parsing dates from Source operator messages

  • When parsing data that comes from Source operators such as IBM Event Streams or IBM Watson IoT, a streams flow assumes that all Date values are given in ISO 8601 format. The streams flow normalizes the date values internally to Coordinated Universal Time (UTC).

  • Separator characters other than -, T, and : are also accepted. For example, the date might use a space as a separator: 2018-03-26 15:3:47.123456

  • This parsing is applicable to all Source operators except Code (in Sources).

The following table gives examples of supported formats:

Format example Comments Internal time representation
2018-03-26T11:38:47 No time zone. Up to full seconds. 11:38:47 (*)
2018-03-26T11:38:47.123456 No time zone. With microseconds. 11:38:47.123456 (*)
2018-03-26T11:38:47.123 No time zone. With split seconds other than microseconds. 11:38:47.123 (*)
2018-03-26T11:38:47+01:00 With time zone. The time zone is UTC+1, so it is normalized to UTC. 10:38:47 </br>The time in UTC </br>when it’s 11:38:47 in a </br>time zone that’s UTC+01:00
2018-03-26T11:38:47-01:00 With time zone. The time zone is UTC-1, so it is normalized to UTC. 12:38:47 </br>The time in UTC </br>when it’s 11:38:47 in a </br>time zone that’s </br>UTC-01:00
2018-03-26T11:38:47.123456-05:00 With time zone and microseconds. </br>The time zone is UTC-5, so it is normalized to UTC. 16:38:47.123456 </br>The time in UTC </br>when it’s 11:38:47 in a </br>time zone that’s </br>UTC-05:00

(*) When no time zone is indicated, the time is assumed to be given as UTC.

Tip

When processing dates from multiple time zones, their textual representation in source messages ideally have the respective time zone offset. If not, then you can use a Code operator to enhance the Date values with time zone information that is not explicit in the source. You can also use a Code operator for parsing formats other than ISO 8601.

Formatting Date output for Target operators

  • Outputting Date values to schema-less Target operators such as Event Streams, Redis, or Cloud Object Storage

    The values are formatted according to ISO 8601:

    • 2018-03-26T11:38:47Z (value has no split seconds)

    • 2018-03-26T11:38:47.123400Z (value contains split seconds)

  • Outputting Date values to schema-based Target operators such as dashDB tables

    When a streams flow formats data for output to schema-based Target operators, it formats all Date values according to ISO 8601 format and as UTC, without a time zone offset. As a result, a runtime error occurs if a Target operator expects Date values in a string format that is not the ISO 8601 format.

    Tip

    The workaround is to insert a Code operator in front of the Target operator, and in there, format the Date value according to what the operator’s schema expects.