Event Streams (Source) operator

Introduction

The Event Streams operator uses IBM Event Streams for IBM Cloud to provide a fully managed messaging service.

Event Streams implements publish-and-subscribe messaging by using topics. Applications send data by creating a message, and then publish it to a topic. To receive messages, applications subscribe to a topic, and choose to either receive all the topic’s messages or to share the messages between them.

Apache Kafka forms the messaging core of Event Streams.

Message retention and lost messages

If a streams flow stops and Event Streams producers continue to send messages to the topic, those messages are not retained. Then, when the streams flow is restarted, it cannot go back in time and consume those lost messages.

Configuring how to read lost messages

The Initial Offset parameter determines where to begin reading from when the streams flow runs for the first time or when the resumption offset is lost.

The resumption offset can be lost in the following cases:

  • Messages for the offset became older than the message retention setting and the messages were purged. The default message retention is two days.

  • Offset retention time for the elapsed topic and the offset was purged. The default offset retention is 24 hours.

Choose to start from the oldest retained message or from the newest message that was added to the topic after the streams flow started. Your choice is passed to the Kafka broker as property auto.offset.reset.

Partitions and parallel workers

A Kafka topic can be divided into partitions, which contain messages. You can “parallelize” a topic by splitting the topic’s data across multiple brokers. Consumers pull messages from parallel topic partitions. As a result, multiple partitions can optimize the ingestion rate.

When the Event Streams instance is created in IBM Cloud, the number of partitions is defined for each topic.

Number of partitions in Event Streams topic

Workers are termed “parallel physical streams operators” that share the load of consuming the data in a topic. Usually, the higher the number of workers, the faster the message consumption, up to the number of topic partitions or the number of CPU cores available to your Streaming Analytics instance, whichever number is lower. Do not increase the number of workers beyond that number, because no performance benefits will result.

Examples

Number of partitions Number of CPUs Maximum number of workers for best performance
3 4 3
10 4 4

Learn more