Managing metadata imports

You can import technical and process metadata associated with the data assets in your organization into a project or a catalog to inventory, evaluate, and catalog these assets.

Technical metadata describes the structure of data objects. Process metadata includes operative information about the lineage of a data asset. This metadata helps users decide whether the data is appropriate for the task at hand, whether they can trust the data, and how to work with the data.

The metadata that you import can later be enriched with other information to help users find data faster and use it with confidence. Such information includes, for example, terms that define the meaning of the data, rules that document ownership or determine quality standards, or reviews.

When you import metadata, you add data assets to a project or a catalog. If you import the assets to a project, they are not visible in any catalog until you publish them. After you share them to a catalog, other catalog users can work with these assets.

Importing metadata involves the following process:

  • Identify the data source from which you want to import. You might already have a connection to this data source defined. Otherwise, ensure that you have the credentials to connect to it. For a list of supported connections, see step 4 of the instructions for adding a metadeta import asset.
  • In a project, create a metadata import asset to configure the import details like the scope and the target of the import and the schedule for the the import job.
  • Import assets to the project or the catalog. When you access an imported data asset, the data is dynamically retrieved from the data source.
  • Analyze and preview the imported metadata, and share it to the catalog if you imported the metadata to a project.

Watch this short video to see how to import asset metadata from an external source into a Watson Studio project.

Creating a metadata import asset and importing metadata

To create a metadata import asset for importing metadata into a project or a catalog:

  1. Open a project and click Add to project > Metadata Import. After you create the first metadata import in this way, you can add new metadata import assets from the project’s Asset page.
  2. Specify a name for the metadata import. Optionally, you can provide a description.
  3. Select the import target. You can import metadata into the project that you’re working in or to a catalog. When you choose to import into a catalog, you can pick one from all catalogs that are available to you.

    Import metadata into a project for analysis before you decide which assets to share to a catalog for other users to work on them. If you know the contents of the data assets well, you can import their metadata directly into the catalog.

  4. Select an existing connection asset as the source of the data, or click Add connection and create a connection asset.

    You can import metadata from these data sources:

    IBM Third-party
    Analytics Engine HDFS
    Cloud Object Storage
    Cloud Object Storage (infrastructure)
    Compose for MySQL
    Databases for PostgreSQL
    Db2
    Db2 Big SQL
    Db2 for i
    Db2 for z/OS
    Db2 Hosted
    Db2 on Cloud
    Db2 Warehouse
    Informix
    Amazon RDS for MySQL
    Amazon RDS for PostgreSQL
    Apache HDFS
    Apache Hive
    Microsoft Azure Data Lake Store
    Microsoft Azure SQL Database
    Microsoft SQL Server
    MySQL
    Oracle
    PostgreSQL
    Sybase
    Sybase IQ

  5. Click Next.
  6. Define a scope for the metadata import. Depending on the size and contents of your data source, you might not want to import all assets but a select subset. You can include complete schemas or folders, or drill down to individual tables or files. When you select a schema or a folder, you can immediately see how many items it contains. Thus, you can decide whether you want to include the whole set or whether a subset serves your purpose better.

Scoping is not supported when importing copybook assets.

Click Add to scope for each item that you want to include in the import. When you’re done selecting items, click Next.

  1. Define whether you want to run scheduled import jobs. If you don’t set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time. If you select to run the import on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs.

    Optionally, change the name of the import job. The default name is metadata_import_name job.

    You can later access the import job you create from within the metadata import asset or from the project’s Jobs page.

  2. Review the metadata import configuration. If you need to make changes, go back and change settings as required.
  3. Click Save. If you didn’t configure a schedule, the metadata import asset is saved, and the import is run immediately. If you configured a schedule, the metadata import asset is also saved, but the import will run on the defined schedule.

    Important: Assets that were already imported through a different metadata import are not imported again and do not show up in the current metadata import. Thus, the new metadata import might not contain any assets at all.

Viewing the metadata import

Metadata import assets are listed in the Metadata imports section of the Assets page. To view an asset, click its name.

When you view the metadata import asset, you can see the list of assets imported with a run of the associated import job. You can work with these assets, rerun the import to refresh the imported assets, or delete all assets imported with this metadata import from the project.

You can work with imported data assets in exactly the same way as with connected data assets. However, imported assets have some tags automatically assigned: the tag discovered and a tag reflecting the asset’s parent, if applicable.

Available import actions are:

  • Import again

    This action refreshes the assets. Existing assets are updated, which means, any content changes are merged. New assets in the data source might be added, depending on the defined scope. If you removed an asset from the metadata import asset, project, or catalog, the asset in question is imported again. Removal of an asset from the data source is not reflected in the metadata import. Such an asset might still show up in the metadata import asset, project, or catalog but will be stale.

  • Remove all assets

    This action removes all imported assets from the list within the metadata import and, if imported into the project, from the project. If assets were imported into the catalog, this action has no effect.

To view metadata import asset details, click the information icon. You can edit the asset name and the description. Note that changing the asset name does not change the name of the associated import job.

Rerunning the import

If you did not configure a schedule, you can manually rerun the metadata import at any time in several ways:

  • Open the metadata import asset and select Import actions > Import again.
  • Open the metadata import asset and click the job name beneath the asset name, which takes you to the job page. Click the run icon on this page.
  • Go to the project’s Jobs page and run the import job from there.

Any reruns of an import refresh asset information as described in Import again.