0 / 0
OpenLineage connection
Last updated: Dec 13, 2024
OpenLineage connection

To access your data in OpenLineage, create a connection asset for it.

OpenLineage is an open framework that can be used to collect and analyze data lineage.

Create a connection to OpenLineage

To create the connection asset, you need the following connection details:

  • Hostname or IP address
  • Port number

Choose the method for creating a connection based on where you are in the platform

In a project
Click Assets > New asset > Connect to a data source. See Adding a connection to a project.

In a catalog
Click Add to catalog > Connection. See Adding a connection asset to a catalog.

In the Platform assets catalog
Click New connection. See Adding platform connections.

Next step: Add data assets from the connection

Where you can use this connection

You can use the OpenLineage connection in the following workspaces and tools:

Projects

  • Metadata import (IBM Knowledge Catalog)

Catalogs

  • Platform assets catalog
  • Other catalogs (IBM Knowledge Catalog)

Data lineage

  • Metadata import (lineage) (IBM Knowledge Catalog and IBM Manta Data Lineage)

Configuring lineage metadata import for OpenLineage

When you create a metadata import for the OpenLineage connection, you can set options specific to this data source, and define the scope of data for which lineage is generated. For details about metadata import, see Designing metadata imports.

To import lineage metadata for OpenLineage, complete these steps:

  1. Create a data source definition. Select OpenLineage as the data source type.
  2. Create a connection to the data source in a project.
  3. Create a metadata import. Learn more about options that are specific to OpenLineage data source:
    • When you define a scope, you can analyze the entire data source or use the include and exclude options to define the exact job namespaces that you want to be analyzed. See Include and exclude lists.
    • Optionally, you can provide external input. You add this file in the Add inputs from file field. The file must have a supported structure. See External inputs.

Include and exclude lists

You can include or exclude assets by using job namespaces in OpenLineage events. The whole input is evaluated as a regular expression. Example values:

  • myPrestoApp1Namespace: all events with job namespace myPrestoApp1Namespace.
  • mySparkApp[1-5]Namespace: all events with job namespace that starts with mySparkApp1Namespace and ends with a digit between 1 and 5.

External inputs

You can add OpenLineage events as external inputs. The file can have the following structure:

<event_file_name>.json

Parent topic: Supported connections