0 / 0
ELT run mode in DataStage

ELT run mode in DataStage

Use DataStage® in the Extract, Load, and Transform (ELT) run mode to more efficiently run SQL queries in target databases.

What is the ELT process?

The primary process that DataStage uses is Extract, Transform, and Load (ETL), in which data is read into memory, processed, then written to a target. In DataStage, all jobs run in ETL mode by default.

The Extract, Load, and Transform (ELT) process transfers data from a source to a target database and then prepares the information in the database itself. In certain situations, it is advantageous to use the ELT process to transform the data in the target database. An example might be when you need to transform large data sets that are already in specific data sources or targets.

What is the ELT run mode in DataStage?

You can set up DataStage to run flows in ELT mode. When flows are run in ELT mode, DataStage analyzes the items in the flow, such as stages and connectors, and determines whether the flow can be run in ELT mode. The following list describes the possible actions when ELT run mode is enabled.
No ELT mode
When the analysis determines that the DataStage flow cannot be converted to SQL, ETL mode is used, and DataStage compiles the flow with the PX runtime engine.
ELT mode
When the analysis determines that the DataStage flow can be converted to SQL, ELT mode is used, and DataStage compiles the flow to SQL.
Mixed ETL and ELT mode
When the analysis determines that the DataStage flow can be only partially converted to SQL, both ETL and ELT modes are used as needed.

Availability

ELT run mode is available in both DataStage Enterprise and DataStage Enterprise Plus.

Supported connectors

The following connectors are supported as sources and targets in ELT run mode:
  • Amazon RDS for PostgreSQL
  • Amazon Redshift
  • Google BigQuery
  • IBM Cloud® Databases for PostgreSQL
  • IBM Db2®
  • IBM Db2 for DataStage
  • IBM Db2 on Cloud
  • IBM® Db2 Warehouse
  • Oracle
  • PostgreSQL
  • Snowflake
  • Teradata
  • Teradata database for DataStage
The following table shows which connectors are supported in ELT run mode or mixed ELT and ETL, and any limitations that apply. For connectors listed as running in Mixed mode, a subset of properties are supported for ELT. If a property does not support ELT, compilation falls back to ETL. See the log in the DataStage canvas console for information on which components were able to compile and run.
Table 1. Connectors with limitations in ELT run mode or mixed ELT and ETL mode
Connector Mode Limitations
Amazon Redshift Mixed
  • Sampling
    • Random sampling type is not supported
    • Sampling percentage is not supported
    • Sampling seed is not supported
  • Before/after SQL statements
    • Before (node) and after (node) are not supported
  • Write mode
    • Load is not supported
    • Update statement is not supported
  • Read mode
    • Select statement is not supported
Google BigQuery ELT
  • Only supports one authentication method: Account key (full JSON snippet)
  • Complex flows may run into nested view limits when materialization policy is set to Link as view
IBM Db2 ELT
  • Jobs with large data volumes may fail with error SQL0964C The transaction log for the database is full. To resolve this, a database admin can increase the transaction log size.
IBM Db2 optimized ELT
  • Default schema name is not supported
    • When a table name is specified, the schema name must be included in the form SchemaName.TableName

Supported stages

The following table shows which stages are supported in ELT run mode or mixed ELT and ETL, and any limitations that apply. For stages listed as running in Mixed mode, a subset of properties are supported for ELT. If a property does not support ELT, compilation falls back to ETL. See the log in the DataStage canvas console for information on which components were able to compile and run.
Table 2. Stages with limitations in ELT run mode or mixed ELT and ETL mode
Stage Mode Limitations
Aggregator Mixed
  • Aggregation type:
    • Re-calculation is not supported
  • Calculation type
    • Summary is not supported
  • NLS locale and advanced properties are ignored
  • Slight differences in the fraction component of decimal numbers may occur between ETL and mixed modes due to rounding error handling
Copy ELT
  • NLS locale and advanced properties are ignored
Filter Mixed
  • Predicates (where clause) are not supported
  • Job parameters (where clause) are not supported
  • Output row only once
    • False
  • Output rejects
    • True
  • NLS locale and advanced properties are ignored
Funnel ELT
  • NLS locale and advanced properties are ignored
Join ELT
  • NLS locale and advanced properties are ignored
Lookup Mixed
  • Constraint conditions:
    • Not all parallel transform functions are supported.
    • For function IsValid, only decimal[x,y] is supported as the first argument.
    • Most functions are supported with Google BigQuery and IBM Db2. With PostgreSQL and Snowflake, only the following functions are supported:

      • IsNull

      • IsNotNull

      • Compare

      • Abs

      • Acos

      • Asin

      • Atan

      • Atan2

      • Ceil

      • Cos

      • Cosh

      • Exp

      • Fabs

      • Floor

      • Llabs

      • Ln

      • Lg (only PostgreSQL)

      • Max

      • Min

      • Mod

      • Neg

      • Pwr

      • Sin

      • Sinh

      • Sqrt

      • Tan

      • Tanh

      • Alnum

      • Alpha

      • LowerCase

      • UpperCase

      • Length

      • CompactWhiteSpace

      • StringNumConcatenate

      • Change

      • CompareNoCase

      • CompareNum

      • CompareNumNoCase

      • Dquote

      • Squote

      • Ereplace

      • FindReplace

      • Left

      • Right

      • Reverse

      • Soundex (only Snowflake)

      • Space

      • Str

      • StrCmp

      • StripWhitespace

      • Trim (only PostgreSQL)

    • Lookup failure and Condition not met must be set to the same value
    • Fail is not supported
  • Range lookup is not supported on both the primary and reference link, you can only define range lookup on one link
  • NLS locale and advanced properties are ignored
Remove Duplicates Mixed
  • Currently always case-sensitive remove duplicates
  • Duplicates to retain first/last is not supported
  • NLS locale and advanced properties are ignored
Sort Mixed
  • Case insensitive sorting is not supported
  • No Sort as EBCDIC
  • No key change columns or cluster key change columns
  • Don't sort is ignored
  • Output statistics is ignored
  • Stable sort is ignored
  • Both sort utility DataStage and Unix are normal database sorting
  • NLS locale and advanced properties are ignored
Transformer Mixed
  • Most functions are supported with Google BigQuery and Db2
  • Loop variables are not supported
  • Surrogate keys are not supported
  • Triggers (routines) are not supported
  • Advanced null handling settings are not supported:
    • Legacy null processing
    • Abort on unhandled null

Limitations

The following limitations apply to ELT run mode:
  • Flows with runtime column propagation enabled are not supported.
  • Implicit data truncation is not performed when the table action is CREATE or REPLACE.

Enabling ELT run mode

To enable ELT run mode in DataStage:
  1. Open a DataStage flow.
  2. On the toolbar, click the Settings icon Settings icon image.
  3. Click the Run tab.
  4. Click Extract, load, transform (ELT). Then, click Save.

Selecting a materialization policy

Select a materialization policy to define your query processing. The following options are available.
Generate nested SQL
Only output models (target tables) are materialized. Each output model is represented by a single query that includes all transformations from input nodes to outputs.
Link as table
All integration flow links are materialized as tables. These tables are temporary and are deleted once target tables are built.
Link as view
All integration flow links are materialized as views. These views are temporary and are deleted once target tables are built.
Advanced
Cardinality changers are materialized as tables. A cardinality changer is a subset of connected nodes, based around stages such as Lookup, Join, and Aggregator. These nodes are combined to conserve resources by building more efficiently and avoiding repetitive computations.

Setting database connector permissions for ELT run mode

To ensure that ELT mode runs successfully, you must set certain permissions for the database connectors that you use in the DataStage ELT process. The following permissions must be set for all database connectors:
  • Create views from select statements
  • Drop views
  • Create tables from select statements
  • Drop tables
  • Alter tables for adding null constraints and primary keys

If a connector is configured with custom SQL statements, then more permissions are needed for the SQL operations that are used in the custom SQL statements.

The permissions must be set within the corresponding database by a database administrator, which is outside of the scope of DataStage and Cloud Pak for Data.

Compiling with the dsjob command

You can compile a DataStage flow by using the command line:
cpdctl dsjob compile --project <project name> --enable-elt-mode

Example

You have a DataStage flow with four objects in it:
  • A PostgreSQL data source
  • A Sort stage
  • A Filter stage
  • A PostgreSQL data target

You enable ELT mode by opening the run settings and selecting it.

After you compile the job successfully, the Compile log has a single message of "full pushdown". This message indicates that the entire flow was compiled in ELT mode.

You run the job. During the job run, the PostgreSQL source data is converted by using SQL statements that apply the sorting and filtering. The result is persisted as a table that is defined in the PostgreSQL target connector.

The job run finishes and the target database now holds all the transformed data.

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more