Configuring metadata import for data integration assets

Last updated: Dec 03, 2024

Data integration uses technical and business processes to combine data from different sources and create information that is meaningful and valuable for your enterprise. Data integration tools can use extract, transform, and load (ETL) patterns to transform and organize data based on business needs. Such ETL jobs move the data from source systems to target systems.

You can import lineage metadata from InfoSphere DataStage. You cannot import assets from this source.

InfoSphere DataStage

When you create a lineage metadata import for InfoSphere DataStage, you can set options specific to this data source, and define the scope of data for which lineage is generated. For details about metadata import, see Designing metadata imports.

External inputs

You can provide InfoSphere DataStage assets such as jobs, stages, or operational metadata file as external inputs in a .zip file. The folder must have the following structure:

<job_name.xml> - The parallel DataStage jobs that you want to analyze and the parameter sets that you use in those jobs as one XML file, exported from the designer client.
omd_files - A folder for operational metadata (OMD) files.
- <omd_file_name> - Operational metadata files that are collected during job runs to resolve any type of parameter in jobs.
sql_files - A folder that contains all SQL files.
- <sql_file_name> - A file with SQL queries.
DSParams - A file that contains the project- or environment-level parameters if applicable. You can get this file from the project directory.
datastageParameterOverride.txt - A file with parameter-set overrides if your jobs use parameter sets.
connection_definition/odbcConnectionDefinition.ini - A file with connection definitions for ODBC connections. Definitions of ODBC connections are not included in the DataStage XML exports and must be specified separately.
datastageComponentOverrides.csv - A file with component-lineage overrides.

The datastageParameterOverride.txt file format
The datastageParameterOverride.txt file has the following content:

[ENVIRONMENT]
PARAM1_NAME = "param1_value"
PARAM2_NAME = "param2_value"
PARAM3_NAME = "param3_value"
[PARAMETER_SET/parameter_set_name]
param4_name  = "default_param4_value"
param5_name  = "default_param5_value"
$PARAM3_NAME = "$PROJDEF"
[VALUE_FILE/parameter_set_name/value_file1_name]
param4_name  = "some_param4_value"
param5_name  = "some_param5_value"
$PARAM3_NAME = "some_param3_value"
[VALUE_FILE/parameter_set_name/value_file2_name]
param4_name  = "other_param4_value"
param5_name  = "other_param5_value"
$PARAM3_NAME = "other_param3_value"
[JOB/job1_name]
param6_name = "param6_value"
param7_name = "param7_value"
[JOB/job2_name]
param7_name = "param8_value"

The connection_definition/odbcConnectionDefinition.ini file format
The connection_definition/odbcConnectionDefinition.ini file has the following content. Create a separate [Shortcut_Name] secion for each connection.

[<Shortcut_Name>]
Type=<connection_type>
Connection_String=<connection_string>
Server_Name=<server_name>
Database_Name=<database_name>
Schema_Name=<schema_name>
User_Name=<user_name>

Shortcut_Name: The name of the connection or data server that is used by the data integration tool.
connection_type: The type of data source.
connection_string: A JDBC connection string or any identification of the database such as the system ID (SID) or the host name.
server_name: The value depends on the type of data source:
- Db2, Microsoft SQL Server, Netezza Performance Server, SAP ASE (formerly Sybase), or Teradata: The server name.
- FTP: The hostname.
- Oracle and other databases: The value is ignored.
database_name: The value depends on the type of data source:
- Oracle: The global database name.
- Db2, Microsoft SQL Server, Netezza Performance Server, SAP ASE (formerly Sybase), Teradata, and other databases: The name of the default database.
- user_name: The name of the user that logs in to the database.

Add a new line at the end of the parameters for each section.

The datastageComponentOverrides.csv file format
The datastageComponentOverrides.csv file has the following content:

"Full path to Stage";"Input Link name";"Input Column name";"Output Link name";"Output Column name";"Edge Type (DIRECT | FILTER)";"Description (optional)"
"manual_mapping_job/Generic_3";"DSLink2";"a";"DSLink5";"b";"DIRECT";""

The path to the stage is in the format Job/[Shared and Local containers optional]/Stage.

Advanced import options

Oracle proxy user authentication: You can use Oracle proxy user authentication. Set the value to true to change Oracle usernames in \"USERNAME[SCHEMA_OWNER]\" format to \"SCHEMA_OWNER\" format. In other cases, set the value to false.
Value files: Specify the names of value files to use in Parameter Sets in order of priority. For example, DEV1,TEST,PROD.

Learn more

What is ETL?

Parent topic: Designing metadata imports

Was the topic helpful?

0/1000