DataStage
IBM® DataStage® is an ETL tool that you can use to transform and integrate data in projects.
DataStage is designed for ease of use and is fully integrated into Cloud Pak for Data. You can import your existing legacy parallel jobs into DataStage by using ISX files, use the DataStage design canvas to create, edit, and test flows, and run jobs that are generated from the flows. The DataStage service is a part of the data fabric.
DataStage is a data integration tool that moves and transforms data between operational, transactional, and analytical target systems. Data integration specialists use DataStage to develop flows that process and transform data. Hundreds of prebuilt transformation functions, parallel processing capabilities, and platform connectivity is available to connect directly to enterprise applications, cloud data sources, relational and NoSQL systems, REST endpoints, and more. You can administer, manage, deploy, and reuse these flows to integrate data across many systems throughout your organization.
- Data format
- Tabular: Avro, CSV, JSON, Parquet, TSV (read only), or delimited text files
- Data size
- Any
- Required services
- DataStage
- Connectors
- Example connectors include: Db2®, Netezza® Performance
Server, Microsoft SQL Server, Oracle,Teradata, Snowflake, Microsoft Azure File
Storage, Amazon Web Services and Google Cloud Platform services, and Amazon S3.
See DataStage connectors for the list of connectors that DataStage supports.
- Stages
- This service provides stages, which describe a particular process such as accessing a database
or transforming data in some way. DataStage stages
provide common functions for moving and transforming data. QualityStage stages are important for,
but not limited to, eliminating redundant, obsolete, or inaccurate data, standardizing data, and
verifying address data.
See DataStage stages and Quality stages in DataStage for information on the stages that DataStage supports.
For more information, see Quality stages in DataStage.
Learn more
To review a quick start tutorial for DataStage, see Quick start: Transform data.
To review a tutorial for DataStage in the data fabric context, see Multicloud data integration tutorial: Integrate data.
For more information about using DataStage, see the following topics: