0 / 0
Supported data sources for metadata import, metadata enrichment, and data quality rules

Supported data sources for metadata import, metadata enrichment, and data quality rules

The following table lists the data sources from which you can import metadata, against which you can run metadata enrichment or data quality rules, and to which you can write the output of data quality rules.

Required permissions
Users must be authorized to access the connections to the data sources. For metadata import, the user running the import must have the SELECT or a similar permission on the databases in question.

Connection assets must exist in the project for connections that are used in these cases:

  • For running metadata enrichment
  • For running advanced analysis on assets in a metadata enrichment: in-depth primary key analysis, in-depth relationship analysis, or advanced data profiling
  • For running data quality rules
  • For creating query-based data assets (dynamic views)
  • For writing output of data quality checks

If the asset types imported from a specific connection don't allow for enrichment or running data quality rules, not applicable (abbreviated to N/A) is shown in the Metadata enrichment, Metadata enrichment: Advanced analysis, and rules-related columns. A dash (—) in a column indicates that the data source is not supported for this purpose.

By default, data quality rules and the underlying DataStage flows support standard platform connections. Not all connectors that were supported in traditional DataStage and potentially used in custom DataStage flows are supported in IBM Knowledge Catalog.

In general, the following data formats are supported:

  • All: Tables from relational and nonrelational data sources
  • Metadata import: Any format from file-based connections to the data sources. For Microsoft Excel workbooks, each sheet is imported as a separate data asset. The data asset name equals the name of the Excel sheet.
  • Metadata enrichment: Tabular: CSV, TSV, Avro, Parquet, Microsoft Excel (For workbooks uploaded from the local file system, only the first sheet in a workbook is profiled.)
  • Data quality rules: Tabular: Avro, CSV, Parquet, ORC
Supported connections
Connector Metadata import Metadata enrichment Metadata enrichment:
Advanced analysis
Bindings in rules created from data quality definitions SQL-based rules SQL-based data assets Output tables
Amazon RDS for MySQL
Amazon RDS for PostgreSQL
Amazon Redshift
Amazon S3
Apache Cassandra
Apache HDFS
Apache Hive 4
Apache Kafka
Box
Cloudera Impala
Generic S3
Google BigQuery
Greenplum
Connector Metadata import Metadata enrichment Metadata enrichment:
Advanced analysis
Bindings in rules created from data quality definitions SQL-based rules SQL-based data assets Output tables
IBM Cloud Data Engine
IBM Cloud Databases for MongoDB
IBM Cloud Databases for MySQL
IBM Cloud Databases for PostgreSQL
IBM Cloud Object Storage
IBM Data Virtualization Manager for z/OS 1
IBM Db2
IBM Db2 Big SQL
IBM Db2 for i
IBM Db2 for z/OS
IBM Db2 on Cloud
IBM Db2 Warehouse
IBM Informix
IBM Match 360
IBM Netezza Performance Server
IBM Data Virtualization
IBM watsonx.data
Connector Metadata import Metadata enrichment Metadata enrichment:
Advanced analysis
Bindings in rules created from data quality definitions SQL-based rules SQL-based data assets Output tables
MariaDB
Microsoft Azure Data Lake Storage
Microsoft Azure SQL Database
Microsoft SQL Server
MongoDB
MySQL
Oracle 2
PostgreSQL
Presto
Salesforce.com 3 3
SAP ASE
SAP IQ
SingleStoreDB
Snowflake
Teradata

Notes:

1 With Data Virtualization Manager for z/OS, you add data and COBOL copybooks assets from mainframe systems to catalogs in IBM Cloud Pak for Data. Copybooks are files that describe the data structure of a COBOL program. Data Virtualization Manager for z/OS helps you create virtual tables and views from COBOL copybook maps. You can then use these virtual tables and views to import and catalog mainframe data from mainframes into IBM Cloud Pak for Data in the form of data assets and COBOL copybook assets.

The following types of COBOL copybook maps are not imported: ACI, Catalog, Natural

Restriction: You can't import COBOL copybooks larger than 1 MB.

When the import is finished, you can go to the catalog to review the imported assets, including the COBOL copybook maps, virtual tables, and views. You can use these assets in the same ways as other assets in Cloud Pak for Data.

For more information, see Adding COBOL copybook assets.

2 Table and column descriptions are imported only if the connection is configured with one of the following Metadata discovery options:

  • No synonyms
  • Remarks and synonyms

3 Some objects in the SFORCE schema are not supported. See Salesforce.com.

4 To create metadata-enrichment output tables in Apache Hive at an earlier version than 3.0.0, you must apply the workaround described in Writing metadata enrichment output to an earlier version of Apache Hive than 3.0.0.

Learn more

Parent topic: Curation

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more