ELT run mode in DataStage | IBM Cloud Pak for Data as a Service

ELT run mode in DataStage

Use DataStage® in the Extract, Load, and Transform (ELT) run mode to more efficiently run SQL queries in target databases.

What is the ELT process?
What is the ELT run mode in DataStage?
Availability
Supported connectors
Supported stages
Limitations
Enabling ELT run mode
Selecting a materialization policy
Setting database connector permissions for ELT run mode
Compiling with the dsjob command
Example

What is the ELT process?

The primary process that DataStage uses is Extract, Transform, and Load (ETL), in which data is read into memory, processed, then written to a target. In DataStage, all jobs run in ETL mode by default.

The Extract, Load, and Transform (ELT) process transfers data from a source to a target database and then prepares the information in the database itself. In certain situations, it is advantageous to use the ELT process to transform the data in the target database. An example might be when you need to transform large data sets that are already in specific data sources or targets.

What is the ELT run mode in DataStage?

You can set up DataStage to run flows in ELT mode. When flows are run in ELT mode, DataStage analyzes the items in the flow, such as stages and connectors, and determines whether the flow can be run in ELT mode. The following list describes the possible actions when ELT run mode is enabled.

No ELT mode: When the analysis determines that the DataStage flow cannot be converted to SQL, ETL mode is used, and DataStage compiles the flow with the PX runtime engine.
ELT mode: When the analysis determines that the DataStage flow can be converted to SQL, ELT mode is used, and DataStage compiles the flow to SQL.
Mixed ETL and ELT mode: When the analysis determines that the DataStage flow can be only partially converted to SQL, both ETL and ELT modes are used as needed.

Availability

ELT run mode is available in both DataStage Enterprise and DataStage Enterprise Plus.

Supported connectors

The following connectors are supported as sources and targets in ELT run mode:

Amazon RDS for PostgreSQL
Amazon Redshift
Google BigQuery
IBM Cloud® Databases for PostgreSQL
IBM Db2®
IBM Db2 for DataStage
IBM Db2 on Cloud
IBM® Db2 Warehouse
Oracle
PostgreSQL
Snowflake
Teradata
Teradata database for DataStage

The following table shows which connectors are supported in ELT run mode or mixed ELT and ETL, and any limitations that apply. For connectors listed as running in Mixed mode, a subset of properties are supported for ELT. If a property does not support ELT, compilation falls back to ETL. See the log in the DataStage canvas console for information on which components were able to compile and run.

Table 1. Connectors with limitations in ELT run mode or mixed ELT and ETL mode
Connector	Mode	Limitations
Amazon Redshift	Mixed	Sampling Random sampling type is not supported Sampling percentage is not supported Sampling seed is not supported Before/after SQL statements Before (node) and after (node) are not supported Write mode Load is not supported Update statement is not supported Read mode Select statement is not supported
Google BigQuery	ELT	Only supports one authentication method: Account key (full JSON snippet) Complex flows may run into nested view limits when materialization policy is set to Link as view Some parallel transform functions are not supported. See limitations for the Transformer stage
IBM Db2	ELT	Jobs with large data volumes may fail with error `SQL0964C The transaction log for the database is full.` To resolve this, a database admin can increase the transaction log size. Data conversion functions in SQL throw an exception when an invalid date is provided Some parallel transform functions are not supported. See limitations for the Transformer stage
IBM Db2 optimized	ELT	Default schema name is not supported When a table name is specified, the schema name must be included in the form `SchemaName.TableName`
Oracle	ELT	Some parallel transform functions are not supported. See limitations for the Transformer stage

Supported stages

The following table shows which stages are supported in ELT run mode or mixed ELT and ETL, and any limitations that apply. For stages listed as running in Mixed mode, a subset of properties are supported for ELT. If a property does not support ELT, compilation falls back to ETL. See the log in the DataStage canvas console for information on which components were able to compile and run.

Table 2. Stages with limitations in ELT run mode or mixed ELT and ETL mode
Stage	Mode	Limitations
Aggregator	Mixed	Aggregation type: Re-calculation is not supported Calculation type Summary is not supported NLS locale and advanced properties are ignored Slight differences in the fraction component of decimal numbers may occur between ETL and mixed modes due to rounding error handling
Copy	ELT	NLS locale and advanced properties are ignored
Filter	Mixed	Predicates (where clause) are not supported Job parameters (where clause) are not supported Output row only once False Output rejects True NLS locale and advanced properties are ignored
Funnel	ELT	NLS locale and advanced properties are ignored
Join	ELT	NLS locale and advanced properties are ignored
Lookup	Mixed	Constraint conditions: Not all parallel transform functions are supported. For function IsValid, only `decimal[x,y]` is supported as the first argument. Most functions are supported with Google BigQuery and IBM Db2. With PostgreSQL and Snowflake, only the following functions are supported: IsNull IsNotNull Compare Abs Acos Asin Atan Atan2 Ceil Cos Cosh Exp Fabs Floor Llabs Ln Lg (only PostgreSQL) Max Min Mod Neg Pwr Sin Sinh Sqrt Tan Tanh Alnum Alpha LowerCase UpperCase Length CompactWhiteSpace StringNumConcatenate Change CompareNoCase CompareNum CompareNumNoCase Dquote Squote Ereplace FindReplace Left Right Reverse Soundex (only Snowflake) Space Str StrCmp StripWhitespace Trim (only PostgreSQL) Lookup failure and Condition not met must be set to the same value Fail is not supported Range lookup is not supported on both the primary and reference link, you can only define range lookup on one link NLS locale and advanced properties are ignored
Remove Duplicates	Mixed	Currently always case-sensitive remove duplicates Duplicates to retain first/last is not supported NLS locale and advanced properties are ignored
Sort	Mixed	Case insensitive sorting is not supported No Sort as EBCDIC No key change columns or cluster key change columns Don't sort is ignored Output statistics is ignored Stable sort is ignored Both sort utility DataStage and Unix are normal database sorting NLS locale and advanced properties are ignored
Transformer	Mixed	Most functions are supported with Google BigQuery, Db2, and Oracle. The following functions are not supported: ForceError GetEnvironment GetNumOfPartitions GetPartitionNum GetSavedInputRecord NextSKChain NextSurrogateKey PrevSKChain PrintMessage PrintWarning SaveInputRecord SendCustomInstanceReport SetCustomMetadataInfo SendCustomReport SetCustomSummaryInfo SetUserStatus Loop variables are not supported Surrogate keys are not supported Triggers (routines) are not supported Advanced null handling settings are not supported: Legacy null processing Abort on unhandled null

Limitations

The following limitations apply to ELT run mode:

Flows with runtime column propagation enabled are not supported.
Implicit data truncation is not performed when the table action is CREATE or REPLACE.

Enabling ELT run mode

To enable ELT run mode in DataStage:

Open a DataStage flow.
On the toolbar, click the Settings icon .
Click the Run tab.
Click Extract, load, transform (ELT). Then, click Save.

Selecting a materialization policy

Select a materialization policy to define your query processing. The following options are available.

Generate nested SQL: Only output models (target tables) are materialized. Each output model is represented by a single query that includes all transformations from input nodes to outputs.
Link as table: All integration flow links are materialized as tables. These tables are temporary and are deleted once target tables are built.
Link as view: All integration flow links are materialized as views. These views are temporary and are deleted once target tables are built.
Advanced: Cardinality changers are materialized as tables. A cardinality changer is a subset of connected nodes, based around stages such as Lookup, Join, and Aggregator. These nodes are combined to conserve resources by building more efficiently and avoiding repetitive computations.

Setting database connector permissions for ELT run mode

To ensure that ELT mode runs successfully, you must set certain permissions for the database connectors that you use in the DataStage ELT process. The following permissions must be set for all database connectors:

Create views from select statements
Drop views
Create tables from select statements
Drop tables
Alter tables for adding null constraints and primary keys

If a connector is configured with custom SQL statements, then more permissions are needed for the SQL operations that are used in the custom SQL statements.

The permissions must be set within the corresponding database by a database administrator, which is outside of the scope of DataStage and Cloud Pak for Data.

Compiling with the dsjob command

You can compile a DataStage flow by using the command line:

cpdctl dsjob compile --project <project name> --enable-elt-mode

Example

You have a DataStage flow with four objects in it:

A PostgreSQL data source
A Sort stage
A Filter stage
A PostgreSQL data target

You enable ELT mode by opening the run settings and selecting it.

After you compile the job successfully, the Compile log has a single message of "full pushdown". This message indicates that the entire flow was compiled in ELT mode.

You run the job. During the job run, the PostgreSQL source data is converted by using SQL statements that apply the sorting and filtering. The result is persisted as a table that is defined in the PostgreSQL target connector.

The job run finishes and the target database now holds all the transformed data.