0 / 0
DataStage

DataStage

IBM® DataStage® is an ETL tool that you can use to transform and integrate data in projects.

 

DataStage is designed for ease of use and is fully integrated into Cloud Pak for Data. You can import your existing legacy parallel jobs into DataStage by using ISX files, use the DataStage design canvas to create, edit, and test flows, and run jobs that are generated from the flows. The DataStage service is a part of the data fabric.

DataStage is a data integration tool that moves and transforms data between operational, transactional, and analytical target systems. Data integration specialists use DataStage to develop flows that process and transform data. Hundreds of prebuilt transformation functions, parallel processing capabilities, and platform connectivity is available to connect directly to enterprise applications, cloud data sources, relational and NoSQL systems, REST endpoints, and more. You can administer, manage, deploy, and reuse these flows to integrate data across many systems throughout your organization.

Data format
Tabular: Avro, CSV, JSON, Parquet, TSV (read only), or delimited text files
Data size
Any
Required services
DataStage
Connectors
Example connectors include: Db2®, Netezza® Performance Server, Microsoft SQL Server, Oracle,Teradata, Snowflake, Microsoft Azure File Storage, Amazon Web Services and Google Cloud Platform services, and Amazon S3.

See DataStage connectors for the list of connectors that DataStage supports.

Stages
This service provides stages, which describe a particular process such as accessing a database or transforming data in some way. DataStage stages provide common functions for moving and transforming data. QualityStage stages are important for, but not limited to, eliminating redundant, obsolete, or inaccurate data, standardizing data, and verifying address data.

See DataStage stages and QualityStage stages for information on the stages that DataStage supports.

For more information, see QualityStage stages.

Learn more

To review a quick start tutorial for DataStage, see Quick start: Transform data.

To review a tutorial for DataStage in the data fabric context, see Multicloud data integration tutorial: Integrate data.

For more information about using DataStage, see the following topics:

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more