Troubleshoot IBM DataStage

Last updated: Dec 09, 2024

Use these solutions to help resolve problems that you might encounter with IBM® DataStage®.

Getting help and support for DataStage

If you have problems or questions when you use DataStage, you can get help by searching for information or by asking questions through a forum. You can also open a support ticket.

When you ask a question on the forums, tag your question so that it is seen by the DataStage development teams.

For questions about the service and getting started instructions, use the forum at https://stackoverflow.com/questions/tagged/datastage.

If you still can't resolve the problem, you can open an IBM support ticket. For information about opening an IBM support ticket or about support levels, see "Contact support" at https://cloud.ibm.com/unifiedsupport/supportcenter.

General

Jobs fail because SQL and Before SQL statements run in incorrect order
The mailx command fails to run in before-job and after-job subroutines without SMTP server info
Properties selections not preserved if you deselect "Use DataStage properties"
Routine fails when CEL function ds.getUserStatus is run on an external flow
Job fails when loading a large Excel file
Exported flows generate JSON connection files that contain plaintext passwords
Issues browsing database tables with columns that contain special characters
Incorrect inferences assigned to a schema read by the Asset Browser
Using sequential files as a source
Error running jobs with a parquet file format
Migration pod getting evicted for exceeding its ephemeral storage limits
Error occurs during the upgrade of the Cloud Pak for Data from 5.0.0 to 5.0.1 version
Compilation of a flow that contains the transformer stage is timed out

Connectors

Netezza® connector: Duplicate records occur when partitioned reads are enabled
MySQL connector: Jobs might fail if you use Write mode "Update" for the target without a primary key
FTP connector: The home directory path is prepended to the path
Jobs fail with error "The connector could not establish a connection to Db2® database"
Job with source data from a SAP OData connector fails
Cannot run transactional SQL on data from Apache Hive version 2.0 or earlier
IBM Db2 for DataStage connection with SSL certificate fails with "Protocol specific error code 414" error
Error parameterizing the credential field for a flow connection in IBM Cloud® Object Storage
PostgreSQL connector times out on large tables
Schema changes that originate in data from the HTTP connector can cause the job to fail
Not able to create successful database connection that uses the SSL certificate
Snowflake connector: Jobs are failing with "java/lang/OutOfMemoryError" error
Snowflake connector: Jobs are failing with "Fork failed: Resource temporarily unavailable" error

Runtime

Out of memory issue for the DataStage operator
Jobs queue when compute pods fail to start
No existing jobs getting stuck or display in a Starting or Running state

General

Jobs fail because SQL and Before SQL statements run in incorrect order

On the Teradata connector set to ANSI transaction mode, the Before SQL statement may run after the SQL statement instead of before, causing the job to fail.

Workaround: Add a commit statement after each Before SQL statement.

The mailx command fails to run in before-job and after-job subroutines without SMTP server info: If the mailx command is used in a before-job or after-job subroutine, you must provide the SMTP server info or it will forward execution to sendmail and fail.

Properties selections not preserved if you deselect "Use DataStage properties"

If you enter other properties (for example for tables or schemas) with the default Use DataStage properties option selected, and then deselect Use DataStage properties, the properties are not preserved.

Workaround: Deselect the default Use DataStage properties if you intend not to use them before you enter other properties. Otherwise, reselect the properties.

Routine fails when CEL function ds.getUserStatus is run on an external flow

When built-in CEL function ds.getUserStatus is run on a target that is not within the same pipeline, it fails and cannot retrieve the user status. Use the dsjob CLI in your Run Bash script node instead.

For an example of how to rewrite this, see the dsjob command used by DSGetUserStatus() in Routine replacement examples in DataStage.

Job fails when loading a large Excel file

A job with a connector that is processing a large Excel file might fail with this error:

"CDICO9999E: Internal error occurred: IO error: The Excel file is too large. (error code: DATA_IO_ERROR)"

Try increasing the heap size. The Heap size properties option is in the Other properties section of the connector's Stage tab.

Exported flows generate JSON connection files that contain plaintext passwords

Downloaded flows might include connection assets that have credentials or other sensitive information. You can run the following command to change the export behavior so that all future exports remove credentials by default.


oc -n ${PROJECT_CPD_INST_OPERANDS} patch datastage datastage --patch '{"spec":{"migration_export_remove_secrets":true}}' --type=merge

Issues browsing database tables with columns that contain special characters

You might have issues when you use the Asset Browser to browse database tables if the selected table contains a column with special characters such as ., $, or #, and you add that table into a DataStage flow. DataStage does not support column names that contain special characters. DataStage flows that reference columns with names that include these special characters will not work.

To work around this problem, create a view over the database table and redefine the column name in the view. For example:

create view view1 as select column1$ as column1, column2# as column2 ... from table

Then, when you use the Asset Browser, find the view and add it to the DataStage flow.

Incorrect inferences assigned to a schema read by the Asset Browser

The Asset Browser will read the first 1000 records and infer the schema, such as column name, length, data type, and nullable, based on these first 1000 records in the files in IBM Cloud Object Storage, Amazon S3, Google Cloud Storage, Azure File Storage, Azure Blob Storage, or the Azure Data Lake service. For instance, the Asset Browser might identify a column as an integer based on what is detected in the first 1000 records, however, later records in the file might show that this column ought to be treated as varchar data type. Similarly, the Asset Browser might infer a column as varchar(20) even though later records show that the column ought to be varchar(100).

Workaround:

Profile the source data to generate better metadata.
Change all columns to be varchar(1024) and gradually narrow down the data type.

Using sequential files as a source

To use sequential files as a source, you must load files into a project bucket in a specific location. To determine the project bucket location:

Find the project Cloud Object Storage instance.
In the project instance, find the bucket corresponding to the current project. The location is usually: <lowercase-project-name>-donotdelete-<random-string>
For example: project2021mar01-donotdelete-pr-ifpkjcbk71s36j

Then, upload the files to by specifying DataStage/files/ in the Prefix for object field.

Error running jobs with a parquet file format

You might receive the following error when you try to run a job with a parquet file format:

Error: CDICO9999E: Internal error occurred: Illegal 
state error: INTEGER(32,false) can only annotate INT32.

The unsigned 32-bit integer(uint32) and unsigned 64-bit integer(uint64) data types are not supported in the Parquet format that DataStage is using for all the file connectors.

Workaround: You must use supported data types.

Migration pod getting evicted for exceeding its ephemeral storage limits

During import, pod usage of ephemeral local storage can exceed the total limit of containers. You might receive the following message:


Status: Failed
Reason: Evicted
Message: Pod ephemeral local storage usage exceeds the total limit of containers 900Mi.

Workaround: To avoid this problem, you need to increase the ephemeral storage limit to 4Gi from default of 900Mi by running the following command:


oc -n ${PROJECT_CPD_INST_OPERANDS} patch datastage datastage --type merge -p '{"spec": {"custom": {"resources":{"components":{"migration":{"limits":{"ephemeral":"4Gi"}}}}}}}'

Error occurs during the upgrade of the Cloud Pak for Data from 5.0.0 to 5.0.1 version

You may face this error while performing the upgrade of Cloud Pak for Data from 5.0.0 to 5.0.1 The upgrade fails in new upgrade tasks for the remote instances.

Workaround: When the DataStage CR is alternating between Failed and InProgress during the 5.0.1 upgrade, go through the following steps:

Log in to the Red Hat® OpenShift Container Platform cluster by oc and set the default project path to where the Cloud Pak for Data is installed.
```
oc project $PROJECT_CPD_INST_OPERANDS
```
Verify if the PXRruntime instances have been successfully upgraded to version 5.0.1.
```
oc get pxruntime
```

If the PXRruntime CR is not successfully upgraded to version 5.0.1, then run the following commands:


echo "Adding installedVersion to DataStage CR"
oc patch datastage datastage --type='json' -p='[{"op": "add", "path": "/spec/installedVersion", "value": "5.0.1" }]'
while true; do echo "Waiting for DataStage CR to be in Completed state"; sleep 30; if [ $(oc get datastage datastage -o=jsonpath="{.status.dsStatus}") = "Completed" ]; then break; fi; done
echo "Removing installedVersion from DataStage CR"
oc patch datastage datastage --type='json' -p='[{"op": "remove", "path": "/spec/installedVersion"}]'
while true; do echo "Waiting for DataStage CR to be in Completed state"; sleep 30; if [ $(oc get datastage datastage -o=jsonpath="{.status.dsStatus}") = "Completed" ]; then break; fi; done

Flows that contain the Transformer stage time out during compilation

A timeout may occur while compiling flows that contain the transformer stage.

Default value of the APT_COMPILEOPT environment variable:

-c -O -fPIC -Wno-deprecated -m64 -mtune=generic -mcmodel=small

Workaround: Disable compile time optimization by changing -O to -O0 in compile options in the APT_COMPILEOPT environment variable:

-c -O0 -fPIC -Wno-deprecated -m64 -mtune=generic -mcmodel=small

Connectors

Netezza connector: Duplicate records occur when partitioned reads are enabled

When partitioned reads are enabled on the Netezza connector in parallel execution mode, duplicate records may occur. To avoid duplicate records, add partition placeholders into the SQL or set the execution mode to sequential. To add partition placeholders, add the string mod(datasliceid,[[node-count]])=[[node-number]], as in the following example.

SELECT * FROM table WHERE mod(datasliceid,[[node-count]])=[[node-number]]

MySQL connector: Jobs might fail if you use Write mode "Update" for the target without a primary key

If you create a table in a MySQL database without specifying a primary key in the WHERE column, and then you try to run a job that uses that table with the Write mode Update for the target, the job might fail.

Solution: Specify a primary key name in the Key column names field. If the table is big and does not have a primary column, you can create a separate column with auto-increment values to use as the primary key.

FTP connector: The home directory path is prepended to the path

When you run a job that uses data from an FTP data source, the home or login directory is prepended to the path that you specified. This action happens regardless if you specify an absolute path (with a leading forward slash) or a relative path (without a leading forward slash). For example, if you specify the directory as /tmp/SampleData.txt, the path resolves to /home/username/tmp/SampleData.txt.

Workaround: Edit the File name in the FTP connector. Specify the absolute path to the source or target file.

Jobs fail with error "The connector could not establish a connection to Db2 database"

Jobs may fail with error "The connector could not establish a connection to Db2 database".

Workaround: Go to the connection properties and set the Options property to connectTimeout=0.

Job with source data from a SAP OData connector fails

If your flow includes source data from SAP OData, the flow might fail if you created the flow by manually adding columns that do not follow the SAP naming convention.

Workaround: Update the flow by or adding the columns with the Asset browser or by renaming the columns according to the SAP naming convention. The SAP naming convention follows the SAP object hierarchy with two underscore characters (__) as a separator. For example, if the PurchaseOrder column belongs to PurchaseOrderNote, then the column name should be specified as PurchaseOrderNote__PurchaseOrder.

Cannot run transactional SQL on data from Apache Hive version 2.0 or earlier

If your data is from Apache Hive version 2.0 or earlier and your DataStage flow executes UPDATE or DELETE statements, the job might fail. Make sure that the target table has been created according to Hive transactions requirements and that the Apache Hive server is configured to support ACID operations.

The minimum set of parameters (configured in the hive-site.xml file) that you must enable for ACID tables in Apache Hive is:

hive.support.concurrency = true
hive.enforce.bucketing = true (not required as of Hive 2.0)
hive.exec.dynamic.partition.mode = nonstrict
hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

hive.compactor.initiator.on = true
hive.compactor.worker.threads = <a positive number>

For more information, refer to Hive Transactions.

IBM Db2 for DataStage connection with SSL certificate fails with "Protocol specific error code 414" error

If you use an SSL certificate in the IBM Db2 for DataStage connection and the connection fails with a "Protocol specific error code 414" error, use this workaround:

Identify the root certificate on the Db2 server. You can use this command to view the certificate chain:
```
openssl s_client -connect <hostname>:<port> -showcerts
```
Ensure that the certificate has the same subject and issuer.
In the Create connection: IBM IBM Db2 for DataStage page, enter the root certificate in the SSL certificate (arm) field.

Error parameterizing the credential field for a flow connection in IBM Cloud Object Storage: When the Authentication method property is set to Service credentials (full JSON snippet), do not parameterize the Service credentials field. If a parameter is provided for that field, the flow will not compile.

PostgreSQL connector times out on large tables: The PostgreSQL connector might fail with a timeout error when a large table (100,000+ rows) is used as a source. To fix this error, try setting a higher timeout value for the APT_RECORD_TIMEOUT environment variable. See Managing environment variables in DataStage Managing environment variables in DataStage.

Schema changes that originate in data from the HTTP connector can cause the job to fail

When you use the HTTP connector to download a file and then upload the same file into IBM Cloud Object Storage or a database, if the file's schema changes over time, the job might fail.

Workaround: Re-create the stage.

Not able to create successful database connection that uses the SSL certificate

The OpenSSL 3.0.9 version does not allow weak ciphers to be used while generating the SSL certificate. When ValidateServerCertificate (VSC) attribute is set to 0, the connection treats the certificate as invalid. The value must be set to 1 to create a connection.

Workaround: Generate a new SSL certificate which has strong ciphers using OpenSSL 3.0.x version.

Check ciphers in your SSL certificate with the following command. If the certificate uses sha1WithRSAEncryption signature algorithm, it is considered as a weak cipher:
```
openssl x509 -in cert.pem -text -noout
```

Generate the SSL certificate with the following command:

openssl.exe pkcs12 -in certificate_name -export -out truststore_filename -nokeys -keypbe cryptographic_algorithm -certpbe cryptographic_algorithm -password pass:truststore_password -nomac

Runtime

Out of memory issue for the DataStage operator

When more than 5 PX runtime instances are deployed on the cluster, the operator may run out of memory. To resolve this issue, update the CSV to increase the memory limits:

Retrieve the DataStage CSV:


oc -n ${PROJECT_CPD_INST_OPERATORS} oc get csv | grep datastage

Patch the DataStage CSV to increase the operator memory from 1Gi to 2Gi:




oc -n ${PROJECT_CPD_INST_OPERATORS} patch csv <DataStage-CSV-name> --type='json' -p='[{"op": "replace", "path": "/spec/install/spec/deployments/0/spec/template/spec/containers/0/resources/limits/memory", "value": "2Gi" }]'

Jobs queue when compute pods fail to start

If the runtime instance compute pods don't start, all jobs will run on the px-runtime pod. Resource limits lead to jobs being queued.

Workaround: Fix any issues that are preventing the compute pods from starting.

No existing jobs getting stuck or display in a Starting or Running state

When your stale jobs are running, you can have a problem with deleting them from the UI. You can face similar problem with the projects because they can possibly using resources (for example memory). To clean up those processes, use one of the following commands:

cpdctl asset delete --asset-id ASSET-ID --purge-on-delete=true`

cpdctl dsjob jobrunclean {{--project PROJECT | --project-id PROJID} | {--space SPACE | --space-id SPACEID}} {--name NAME | --id ID} [--run-id RUNID] [--dry-run] [--threads n] [--all-spaces] [--before YYYY-MM-DD:hh:mm:ss]

Note: Using those commands cleans up all active jobs in a project. Make sure that you quiesced your operating system.