The Code operator can be of type Source or of type Processing and Analytics.


  • The code must be created with the same version of packages that are listed here.
  • Regardless of the operator type, you must declare all output attributes in the Edit Schema window.



Passed datetime objects must not be timezone-aware.

When you return or submit event tuples with datetime objects, make sure that they are "naive", that is "timezone-unaware". For more information about "timezone aware" and "naive" in Python coding, see datetime — Basic date and time types.

Passing datetime objects that are "timezone-aware" results in runtime errors, and such events are ignored.


In Python, you can convert a timezone-aware datetime object to a "naive", UTC-based datetime object. The following code snippet is an example.

from import tzutc

event['dt_utc'] = dt.astimezone(tzutc()).replace(tzinfo=None)

See Example 2 for a full example.


Code as a Source operator

The Code as Source operator has no input parameter, but does have a return value.

You might use Code as a Source operator when you want to generate some test data without having to set up an Event Streams instance. Another use is when you want to bring in data from a web socket or from your proprietary data source.

A. Asynchronous or synchronous cases: produce

The produce approach with its background queue is adequate for asynchronous (such as websockets) as well as for synchronous use cases.

The following code shows an example of produce. The code generates some test data to check your Cloud Object Storage instance.

def produce(submit, state):
    counter = 0
    while(counter <= 10000):
        # Submit a tuple in each iteration:
        submit({"number": counter, "square": counter * counter})
        counter += 1
        time.sleep(0.5) # Simulates a delay of 0.5 seconds between emitted events


The produce() function is called when the job starts to run. It is called on a background thread, and it typically invokes the submit() callback whenever a tuple of data is ready to be emitted from this operator. The produce() function allows for using asynchronous data services as well as synchronous data generation or retrieval.

@submit is a Python callback function that takes one argument. The agrument is a dictionary that represents a single tuple.

@state is a Python dictionary object for keeping state.


In the Edit Output Schema window, you declare two output attributes - number and square - of type Number. Edit schema for code as source

When the streams flow is running, the Flow of Events shows the generated test data. Flow of Events for code as source

B. Optimized for synchronous cases: generate_sync

The produce approach with its background queue's interthread communication has its performance cost. Therefore, for synchronous use cases with high data throughput, it is recommended to use the faster generate_sync function approach.

To this end, instead of the produce function, your code needs to implement the generate_sync function. The code structure might look very similar, except that instead of calling submit for each emitted event, the code will use the yield directive for this purpose.

The following code shows an example of generate_sync:

# The generate_sync() function will be called when the job starts to run.
# It emits events by calling 'yield' on the output event dictionary.
# @state a Python dictionary object for keeping state
# You must declare all output attributes in the Edit Schema window.
def generate_sync(state):
    counter = 0
         # Submit a tuple in each iteration:
         yield {"number": counter, "square": counter * counter}
         counter += 1
         time.sleep(0.5)    # Simulates a delay of 0.5 seconds between emitted events

The output schema and runtime output data are the same as in the example for produce.

Comparison of produce and generate_sync

The following table compares the produce and generate_sync approaches:

Aspect produce generate_sync
Use cases Async, sync Sync
Performance Slower, due to interthread communication Faster
Function signature def produce(submit, state) def generate_sync(state)
Emitting event submit(...) yield ...


Code as a Processing and Analytics operator

The Code as a Processing and Analytics operator has both an input parameter and a return value.

Example 1

Goal: You need to return to the output schema two new attributes that are not present in the input schema.

The following code snippet shows an example for two attributes, “friendly_greeting” and “formal_greeting”, in the output schema of the Code operator.

import sys
def process(event):
  if 'name' in event:
    name = event['name']
    name = 'stranger'
  friendly_greeting = 'Hey ' + name + '!'
  formal_greeting = "Dear ' + name + ','

The returned attributes ‘friendly_greeting’ and ‘formal_greeting’ must be included in the output schema of the Code operator. To check that they are present, click Edit Output Schema in the Code Properties pane, and then add the attributes and their type, when needed.

Example 2

Goal: You need to return to the output schema the time of an event that was changed from IST to UTC time zone.

from dateutil.parser import parse
from dateutil import tz
def process(event):
  # datetime.parse doesn't understand "IST" as a time zone indicator, so swap for +05:30

  dt = parse(event['event_time'].replace('IST','+05:30'))

  # convert to UTC time zone too

  event['dt_utc'] = dt.astimezone(tz.gettz('UTC'))
  return event

For more information about date formats, see Date formats.


Example 3

Goal: You need to return every other tuple to the output schema.

import sys


def process(event):
    global counter
    event['counter'] = counter
    if counter%2 is 0:
        return None
    return event


Example 4

Goal: You want to report the geographic movement of mobile GPS-enabled devices (phones, tablets, cars). This scenario is an example for a use case that requires state.

def init(state):
    # Nothing to initialize, in this example

def process(event, state):
    deviceId = event['deviceId']  # Extract the device ID from the event tuple
    if deviceId not in state:
        # No previous record for this device ID, meaning it's the first event for it
        message = "Detected initial location"
        record = state[deviceId]  # Extract the device record of last (previous) location
        directions = []  # Array for gathering directions: north or south, east or west

        if event['lat'] > record['last_lat']:
        elif event['lat'] < record['last_lat']:

        if event['long'] > record['last_long']:
        elif event['long'] < record['last_long']:

        message = "Moved " + "-".join(directions)

    output = {
        'deviceId': deviceId,
        'message': message
    rememberDeviceLocation(event, state)  # For comparing with future locations
    return output

Example output

Input event 1 {'deviceId': 'p008', 'lat': '32.678611', 'long': '35.576944'}

Resulting output {'deviceId': 'p008', 'message': 'Detected initial location'}

Input event 2 {'deviceId': 'p008', 'lat': '32.67862', 'long': '35.576944'}

Resulting output {'deviceId': 'p008', 'message': 'Moved north'}

Input event 3 {'deviceId': 'p008', 'lat': '32.6786', 'long': '35.57694'}

Resulting output {'deviceId': 'p008', 'message': 'Moved south-west'}

Input event 4 {'deviceId': 'x123', 'lat': '32.6786', 'long': '35.57694'}

Resulting output {'deviceId': 'x123', 'message': 'Detected initial location'}


Example 5

Goal: Watson IoT operator ingests electricity usage readings from smart meters to bill users. You need to take that data and implement time-based billing. You also want to send an email alert to customers whose usage for the current month is high.

Watch the video Use Python code in a streams flow.


Learn more