Creating metadata imports
You can import technical metadata to add data assets to a project or a catalog. In a project, you can prepare and analyze the data before you publish it to a catalog.
Import metadata into a project as data assets to prepare and analyze the data before you publish it to a catalog. Profile such data assets, analyze data quality, and assign terms to provide business context by running metadata enrichment. To do a deeper quality analysis, run data quality rules on the data assets. You can also add data assets to a catalog directly if the data is ready to be shared without further preparation.
You can use APIs instead of the user interface to retrieve the list of supported connections or to create a metadata import asset. The links to these APIs are listed in the Learn more section.
- Asset types
-
Data assets that represent tables or files from a connection to an external data source.
Note: For Microsoft Excel workbooks, each sheet is imported as a separate data asset. The data asset name equals the name of the Excel sheet. - Supported connections
-
See the Metadata import column in Supported connectors.
- Required permissions
-
To create, manage, and run a metadata import, you must have these roles and permissions:
- The Admin or the Editor role in the project.
- The Admin or the Editor role in the catalog to which you want to import or publish the assets.
- Access to the connections to the data sources of the data assets to be imported and the SELECT or a similar permission on the corresponding databases.
Overview
Importing metadata for discovery involves the following process:
- Identify the data source from which you want to import. You might already have a connection to this data source defined. Otherwise, ensure that you have the credentials to connect to it. For a list of supported connections, see Supported connectors.
- In a project, create a metadata import asset to configure the import details like the scope and the target of the import and the schedule for the import job.
- Import assets to the project or the catalog. When you access an imported data asset, the data is dynamically retrieved from the data source.
- Analyze and preview the metadata that you imported to a project, and share it to the catalog. You can create profiles for individual assets one at a time from each asset’s Profile tab. You can also create profiles for multiple data assets in parallel and add business context to them by creating and running a metadata enrichment asset.
Watch this short video to see how to import asset metadata from an external source into a project.
This video provides a visual method to learn the concepts and tasks in this documentation.
Creating a metadata import asset and importing metadata
To create a metadata import asset and a job for importing metadata into a project or a catalog:
-
Open a project, go to the project's Asset page and click New asset > Import metadata for data assets.
-
Specify a name for the metadata import. Optionally, you can provide a description.
-
Optional: Select tags to be assigned to the metadata import asset to simplify searching. You can create new tags by entering the tag name and pressing Enter.
-
Select the import target. You can import metadata into the project that you're working in or to any catalog that you are a member of.
Import metadata into a project for analysis before you decide which assets to share to a catalog for other users to work on them. In a project, you can run metadata enrichment and data quality rules on the imported data assets.
If you know the contents of the data assets well, you can import their metadata directly into the catalog.
If your project is marked as sensitive, you can import only to the project, not to a catalog.
-
Define a scope for the metadata import.
-
Select an existing connection asset as the source of the data, or click Create a new connection and create a connection asset. You can import metadata from the data sources that are listed in Supported connectors.
-
Select the items that you want to include in the import and click Select. Depending on the size and contents of your data source, you might not want to import all assets but a select subset. You can include complete schemas or folders, or drill down to individual tables or files. When you select a schema or a folder, you can immediately see how many items it contains. Thus, you can decide whether you want to include the whole set or whether a subset serves your purpose better.
Note that you can't import data from schemas where the name contains special characters.
-
Review the selected scope. You can directly delete assets from the data scope or you can rework the entire scope by clicking Edit data scope. When you're done refining the data scope, click Next.
-
-
Define whether you want to run scheduled import jobs. If you don't set a schedule, you run the import when you save the metadata import asset. You can rerun the import manually at any time. If you select to run the import on a specific schedule, define the date and time you want the job to run. You can schedule single and recurring runs. If you schedule a single run, the job runs exactly one time at the specified day and time. If you schedule recurring runs, the job runs for the first time at the timestamp indicated in the Repeat section.
Optionally, change the name of the import job. The default name is metadata_import_name job.
You can later access the import job you create from within the metadata import asset or from the project's Jobs page. See Jobs.
-
Optional. Customize the import behavior. You can choose to prevent specific properties from being updated and to delete existing assets that are not included in the reimport.
- Update on reimport
- By default, all asset properties are updated when assets are reimported. If you don't want the asset names, asset descriptions, or any column descriptions to be updated on reimport, clear the respective checkboxes.
- Delete on reimport
- By default, no assets are deleted from the target project or catalog when you rerun the import. To clean up the target project or catalog, you can choose to delete assets that are no longer available in the data source or assets that were removed from the import scope on reimport.
- Exclude from import
- For metadata imports that you run on relational databases, you can select whether you want to import all types of relational assets or whether you want to exclude tables, or views, aliases, and synonyms. These options are mutually exclusive.
- Import additional properties
- For metadata imports that you run on relational databases, you can select whether primary and foreign keys that might be defined in the database are imported.
Additional import options:
- Incremental import
Enable incremental imports to import only new or modified data assets when you rerun the import. This option is available only for data sources that support incremental imports:
Updating or removing the description of an asset in the data source does not change the asset's modification date. The modification date also doesn't change for assets that are removed from the list of imported assets. Therefore, such assets are not considered for incremental imports. In addition, assets that are deleted from the data source or from the scope are not detected with incremental imports. Thus, such assets are not marked as Removed or deleted as specified with the Delete on reimport settings. To see such changes reflected, disable incremental imports to reimport all assets in the data scope.
Important: Incremental imports might not work if the data source and the place from which you access your Cloud Pak for Data account are in different time zones. If you access your Cloud Pak for Data account in a time zone that is ahead of the data source's time zone, the metadata import job might not detect assets that were added or modified after the last import run. In this case, disable incremental import so that all assets are included when you rerun the import.
For incremental imports to work, the data source must be in the GMT time zone regardless of the Cloud Pak for Data account's time zone.- Collect metadata from database catalog
For metadata imports that you run on relational databases, you can choose to import metadata from the database catalog. Thus, the user who runs the import needs access only to the database catalog but doesn't need to have SELECT permission on the actual data. The imported assets cannot be profiled or used in metadata enrichment.
-
Review the metadata import configuration. To make changes, click the Edit icon on the tile and update the settings.
-
Click Create. The metadata import asset is added to the project, and a metadata import job is created. If you didn't configure a schedule, the import is run immediately. If you configured a schedule, the import runs on the defined schedule.
Important: Assets from the same connection that were already imported through a different metadata import are not imported anew but are updated. Such assets do no longer show up in the initial metadata import. Only the most recently run metadata import contains the assets.
Depending on the outcome of the metadata import job run, a completion message or an error notification is displayed.
A completion message is displayed when the job run completed successfully, completed with warnings, or completed with errors. An error notification is displayed if the entire job run failed. Either type of notification contains a link to the job run log that provides details about the specific job run.
When the import is complete, you can see the list of assets with the following information:
- The asset name, which provides a link to the asset in the project or catalog.
- The asset type, such as
Data
, and the format, such asRelational table
. - The asset context, such as the parent or file path.
- The date and time that the asset was last imported.
- The import status, which can be
Imported
for successfully imported data,In progress
, orRemoved
if the asset couldn't be reimported.
You can work with most imported data assets in the same way as with connected data assets. Imported assets have a tag automatically assigned that reflects the asset's parent if applicable.
To profile, analyze, and provide business context to imported data assets, create a metadata enrichment asset and include the metadata import asset in the data scope.
Learn more
Next steps
Parent topic: Importing metadata