To access your data in OpenLineage, create a connection asset for it.
OpenLineage is an open framework that can be used to collect and analyze data lineage.
Create a connection to OpenLineage
To create the connection asset, you need the following connection details:
- Hostname or IP address
- Port number
Choose the method for creating a connection based on where you are in the platform
- In a project
- Click Assets > New asset > Connect to a data source. See Adding a connection to a project.
- In a catalog
- Click Add to catalog > Connection. See Adding a connection asset to a catalog.
- In the Platform assets catalog
- Click New connection. See Adding platform connections.
Next step: Add data assets from the connection
Where you can use this connection
You can use the OpenLineage connection in the following workspaces and tools:
Projects
- Metadata import (IBM Knowledge Catalog)
Catalogs
- Platform assets catalog
- Other catalogs (IBM Knowledge Catalog)
Data lineage
- Metadata import (lineage) (IBM Knowledge Catalog and IBM Manta Data Lineage)
Configuring lineage metadata import for OpenLineage
When you create a metadata import for the OpenLineage connection, you can set options specific to this data source, and define the scope of data for which lineage is generated. For details about metadata import, see Designing metadata imports.
To import lineage metadata for OpenLineage, complete these steps:
- Create a data source definition. Select OpenLineage as the data source type.
- Create a connection to the data source in a project.
- Create a metadata import. Learn more about options that are specific to OpenLineage data source:
- When you define a scope, you can analyze the entire data source or use the include and exclude options to define the exact job namespaces that you want to be analyzed. See Include and exclude lists.
- Optionally, you can provide external input. You add this file in the Add inputs from file field. The file must have a supported structure. See External inputs.
Include and exclude lists
You can include or exclude assets by using job namespaces in OpenLineage events. The whole input is evaluated as a regular expression. Example values:
myPrestoApp1Namespace
: all events with job namespacemyPrestoApp1Namespace
.mySparkApp[1-5]Namespace
: all events with job namespace that starts withmySparkApp1Namespace
and ends with a digit between 1 and 5.
External inputs
You can add OpenLineage events as external inputs. The file can have the following structure:
<event_file_name>.json
Parent topic: Supported connections