Duplicate asset handling
You specify how to handle duplicate assets when you create a catalog and when you publish assets to a catalog.
What is a duplicate?
An asset is considered a duplicate of another asset in these situations:
- The asset was copied from an existing asset within the platform.
- The new asset coming into the platform has the same asset type and name as an existing asset.
Duplicate detection for copied assets
You create copies of assets within the platform with these actions:
- Copying an asset from a catalog to a project or a deployment space
- Publishing an asset to a catalog from a project or a deployment space
For example, you copy an asset from a catalog into a project. Then, you publish that asset from the project back to the same catalog. The incoming asset from the project is considered a duplicate of the original asset in the catalog. You must choose how to handle the duplicate.
Alternatively, suppose you publish an asset from a project into a catalog. Then, you copy that asset from the catalog back to the same project. The incoming asset from the catalog is considered a duplicate of the original asset in the project.
If you copy or publish the same asset more than once, the most recent copy of the asset is considered the original asset. For example, you copy an asset from a catalog into a project. Then, you copy the same asset from the catalog into the same project again. The newer copy of the asset is considered the original asset.
Duplicate detection for new assets
If you add an asset to a project, catalog, or deployment space that has the same asset type and the same name, that new asset is considered a duplicate of the original asset.
For example, you have a notebook that is named "Sales" and you add a data asset that is named "Sales". These assets are not duplicates because their asset types are different.
For data assets in catalogs, the origin of the data is considered along with the asset name. For example, the following data assets are not considered duplicates:
- A data asset that is named "Sales" from a CSV file
- A data asset that is named "Sales" from a Db2 connection named "db2_100"
- A data asset that is named "Sales" from a Db2 connection named "db2_5000"
Duplicate asset handling methods
You can specify one of these duplicate handling methods as the default for a catalog:
- Update original assets
Replace the values of the original assets with the values of the new assets. If the new assets do not have a value, the corresponding values from the original assets will remain. The privacy level, ownership, membership, and activities of the original assets are retained.
- Overwrite original assets
Overwrite all values of the original assets with the values of the new assets. However, the privacy level, ownership, membership, and activities of the original assets are not affected.
- Allow duplicates (default)
Add the new assets as duplicates of the original assets.
- Preserve original assets and reject duplicates
Reject the new duplicate assets and preserve the original assets.
You set the default duplicate handling method for a catalog when you create it. You can change it at any time on the catalog Settings page, if you have the Admin role in the catalog.
When users add assets to a catalog with API calls, the default duplicate asset handling method for the catalog is used.
However, the catalog duplicate handling setting can be superseded in these circumstances:
Publishing from a project or adding to a catalog directly. Project collaborators can select how to handle duplicate assets when they publish assets from the project to a catalog.
If the default catalog setting is to allow duplicates, project collaborators can choose to combine or overwrite the existing assets, to create duplicates, or to reject duplicate assets.
To add assets, whether unique or duplicate, the user must have the Admin or the Editor role in the catalog. To update or overwrite an asset, the user must have the Admin role in the catalog, or the Editor role and be the asset owner or an asset member.
When publishing metadata enrichment results, data class and business term assignments are published as follows:
- If a duplicate asset is created, the entire set is published along with the asset.
- If an existing asset is updated or overwritten, any business terms that aren't yet available on the asset are added. No business terms are removed. If the data class on a column changed, the assignment on the catalog asset is updated.
- Importing metadata. Irrespective of the catalog setting, duplicate assets might be added to the catalog during an initial metadata import. On reruns of a metadata import, previously imported assets either remain unchanged or are updated, depending on changes in the data source. Additional metadata such as terms or classifications remain unchanged.
Parent topic: Creating a catalog