Distributed Transaction stage in DataStage

Use the Distributed Transaction stage to run transactions that span multiple data sources, managed by IBM® MQ or Apache Kafka. You can use IBM Db2 for DataStage, Oracle Database for DataStage, IBM MQ, or Teradata as input connectors.

Overview

A transaction is a series of actions that are completed as a single operation. A transaction ends with a commit action that makes the changes permanent. If any of the changes cannot be committed, the transaction rolls back all changes.

A distributed transaction is a transaction that might span multiple data sources, such as one or more databases and a transaction manager such as an IBM MQ message queue. For the transaction to commit successfully, all the individual data sources must commit successfully. If any resource cannot commit, the entire transaction is rolled back. For example, a distributed transaction might consist of a money transfer between two bank accounts that are on different databases. The transaction is committed only if the withdrawal from one account and the deposit into the other account are successfully completed.

The Distributed Transaction stage follows the X/Open standard, which uses a processing model that consists of the following components:

An application program that defines transaction boundaries and specifies actions that constitute a transaction
Resource managers, such as databases or file systems that provide access to shared data sources
A transaction manager that assigns identifiers to transactions, monitors their progress, and manages transaction completion and failure recovery

Supported resource managers include IBM Db2 for DataStage and Oracle Database for DataStage. Supported transaction managers include IBM MQ and Apache Kafka.

In an example of a typical design for a distributed transaction flow, an IBM MQ connector consumes source messages from a message queue and moves the messages to a persistent work queue. The connector copies the data, message ID, and other message header fields from the source to the target message. The connector also sends the message data to an output link. One or more stages process the message data and additional data from an Oracle Database for DataStage connector, and send the processed data to the Distributed Transaction stage over one or more links. Each input link to the stage represents output to a target database. The links provide the message ID of the original source message, which is consumed from the work queue as part of the distributed transaction.

Input tab

Configure the connector properties for each input link. Select a data source and specify the associated connection properties. Select the write method that you are using to write data to a target and specify the properties that are required by that write method and target.

Stage tab

Specify a transaction manager and a connection. Select whether to enable global transactions and IBM MQ messaging. Specify a work queue to move messages into. You can select whether to reject failing units, which rolls back transactions that include failed records. You can specify a reject queue to store failed records and set other reject properties. You can specify the order in which to process input links. You can also set up record ordering to control the order of record processing regardless of which link the records are on.