Adding data and mapping it to your data model (IBM Match 360)
Each data source or asset must be mapped and published into the data model before it can be used in IBM Match 360 functions such as matching.
- Required permissions
- To add, map, and publish data assets into a master data instance, you must be a member of the DataEngineer user group for the IBM Match 360 service.
- If you are working with a governed catalog, you can only view or add catalog assets of which you are the data asset owner.
IBM Match 360 includes a powerful automapping capability that removes the need for data engineers to manually map each column of data into the model. The automapping feature detects, analyzes, and categorizes each column of data to the corresponding attributes or fields in the data model. Before you can run automapping, you must profile your data. Profiling analyzes and classifies your data to enable the automapping process to take place.
Profiling and automapping is only supported for person and organization record types. For other types of records, manually map your columns to the data model.
In this topic:
- Adding record data from a flat data file
- Adding data or sources through your project
- Mapping your data into the data model
- Applying a mapping pattern to a data asset
- Adding relationship data from a flat data file
- Publishing sample data
Adding record data from a flat data file
To add record data into IBM Match 360 from a CSV or TSV data file:
-
From the navigation menu, click Data setup to open the Data setup screen. Click Start with data assets or select the Assets tab.
-
Click the Upload asset to project icon in the action bar.
-
From the Data panel that opens, choose whether to add data by upload, from the project, or from the catalog. To upload a data file, choose Load.
-
On your local computer, select a flat data file in CSV or TSV format and drag it into the Data panel. When the file finishes uploading, it is added to your assets summary list.
-
On the Assets tab, use the assets summary table to review the details of your newly added asset and the other data assets in the system.
If your asset does not have any information in the Asset content column, you must define the type of data that the asset contains, such as a specific record type. It is important to assign the correct content type to each asset so that IBM Match 360 can find the part of the data model that best fits the data. Assets that do not have an asset content type defined have a Missing asset data type status. You cannot map an asset without first defining its content type.
To define an asset's content type, select it from the drop-down list in the record's Asset content column. The asset's status changes to Ready-for-mapping.
If an appropriate data type is not in the asset content list, then you might have to customize your data model. For more information, see Customizing your data model.
Next step: Map your data into the data model
Adding data or sources through your project
You can add data assets, sources, or connections to IBM Match 360 through your project.
You can use IBM Match 360 as a connected data source or target. For information about setting up and using the IBM Match 360 connection, see IBM Match 360 connection.
Any data asset files that you want to load into IBM Match 360 must include a file extension of a supported type, such as .csv
or .tsv
. This requirement includes assets that are already in your project. If an asset name
in your project does not include a supported file extension, it will appear greyed out when you try to add it. To edit the name of an asset in your project, go to your project's Assets tab, select your asset, then edit the
name in the About this asset panel.
Watch this video to see how to create a connection and add connected data to a project.
This video provides a visual method to learn the concepts and tasks in this documentation.
-
Video transcript Time Transcript 00:00 This video shows you how to set up a connection to a data source and add connected data to a Watson Studio project. 00:08 If you have data stored in a data source, you can set up a connection to that data source from any project. 00:16 From here, you can add different elements to the project. 00:20 In this case, you want to add a connection. 00:24 You can create a new connection to an IBM service, such as IBM Db2 and Cloud Object Storage, or to a service from third parties, such as Amazon, Microsoft or Apache. 00:39 And you can filter the list based on compatible services. 00:45 You can also add a connection that was created at the platform level, which can be used across projects and catalogs. 00:54 Or you can create a connection to one of your provisioned IBM Cloud services. 00:59 In this case, select the provisioned IBM Cloud service for Db2 Warehouse on Cloud. 01:08 If the credentials are not prepopulated, you can get the credentials for the instance from the IBM Cloud service launch page. 01:17 First, test the connection and then create the connection. 01:25 The new connection now displays in the list of data assets. 01:30 Next, add connected data assets to this project. 01:37 Select the source - in this case, it's the Db2 Warehouse on Cloud connection just created. 01:43 Then select the schema and table. 01:50 You can see that this will add a reference to the data within this connection and include it in the target project. 01:58 Provide a name and a description and click "Create". 02:06 The data now displays in the list of data assets. 02:09 Open the data set to get a preview; and from here you can move directly into refining the data. 02:17 Find more videos in the Cloud Pak for Data as a Service documentation.
For more information about adding data directly to your project, see Adding data to a project.
After adding data, you must map it into the IBM Match 360 data model. For details, see Map your data into the data model.
Mapping your data into the data model
Before you can publish a data asset to be used in IBM Match 360, you must map it. Each of an asset's columns must either be mapped to a corresponding data model attribute or be excluded.
To map a data asset, you have several options:
- You can manually map each column.
- You can map each column with the assistance of profiling and automapping.
- You can apply a mapping pattern.
Remember: To use the profiling and automapping features of IBM Match 360, your IBM Cloud Pak for Data deployment must include IBM Knowledge Catalog.
For details about manually mapping or using automapping, read the following procedere. For details about applying a mapping pattern, see Applying a mapping pattern to a data asset.
To map a data asset into the IBM Match 360 data model:
-
On the Data setup screen, click the Mapping tab.
-
From the Asset list, click on the data asset that you want to map into the system.
To help you to find the asset you are looking for, you can search by asset name or filter based on the number of columns, record type, mapping status, publishing status, or available mapping patterns. Click the Filter icon to apply a filter.
The data from the asset you select displays in tabular format with a number of rows and columns. Each column represents an attribute that must be mapped to a corresponding attribute type in the data model. When you first open a data source or asset, each column is marked with a Not Mapped tag.
Tip: You can manually map each column if you choose, but you can greatly speed up the mapping process by taking advantage of the automapping feature. -
In the mapping details panel, review the mapping statistics for this asset. At a glance, you can see how many data columns from this asset have been mapped, if any.
-
To enable automapping for this source or asset, you must first profile the data. Click Profile data.
Profiling analyzes and classifies your data to enable the automapping process to take place. Profiling can take some time to complete, so it runs in the background so that you can continue working. You might want to start reviewing and manually mapping some columns.
Automapping will never overwrite any manual mapping that you have done.
-
When profiling completes, click Automap asset. IBM Match 360 with Watson analyzes your data and automatically maps as many columns as possible into the data model. Even if it cannot map a specific column, the automap function can suggest some of the most likely mapping selections.
-
Review the automapping results. If any of the mappings are incorrect, or if a column remains unmapped, then manually map it correctly. Alternately, if a specific column is not required, you can exclude it from being loaded into IBM Match 360 by selecting Exclude column.
-
To manually map a column, select it, then use the Mapping targets panel to search for and select the appropriate attribute or field from the data model. Click Map and save to data model.
If an appropriate attribute or field does not exist in the data model, you can create one from the Mapping targets panel. Click either New > Create attribute or New > Create field to define and provide the details of a new attribute or field.
If you choose to create a Simple attribute instead of assigning an existing attribute type, then the new simple attribute is added directly to the record type in the Modeling tab. It is not categorized under Attribute types.
-
Scroll horizontally through the columns to ensure that every column in your data source or asset is mapped. If any columns are not mapped, automapped, or explicitly excluded from mapping, then the data asset remains in a Mapping in progress state.
To exclude a column from being loaded into IBM Match 360, select the column, then select Exclude column.
Important: If the
record_source
attribute is mapped to any field in the asset, then that field must be populated in all of the asset's records. If any record is missing a value for therecord_source
field, then the asset cannot successfully load. If you do not map therecord_source
attribute to an existing field, then a default record source name is derived using the asset name. -
When you finish mapping the data asset, you're ready to publish the data into the system.
- If your data model is new or changed, publish your model first by clicking the publish model icon in the action bar. Wait for the publish job to complete.
- To publish your data, click the publish data icon in the action bar. Wait for the publish job to complete.
-
Return to the configuration overview page by selecting Configuration overview from the navigation menu.
-
On the configuration overview page, confirm that you have at least one data source or asset that is added and mapped.
Applying a mapping pattern to a data asset
Mapping patterns help you to maintain consistency across similar data assets by making it easy to repeat your data mapping selections for compatible assets.
A mapping pattern is automatically created when you manually map a data asset. The pattern saves your column mapping selections so that they can be reused by other data assets that share the same column format and record type. By applying a mapping pattern, you can avoid manually mapping data assets that are similar to existing assets that you have already mapped.
IBM Match 360 identifies when a new asset is compatible with an existing mapping pattern in the system and then notifies you that you can use a pattern to avoid manual mapping work.
Mapping patterns can be created by manually mapping data assets or they can be imported by using configuration snapshots.
For information about managing and applying mapping patterns by using configuration snapshots, see Saving and loading configuration snapshots.
To apply a mapping pattern to a mapped or unmapped data asset:
-
On the Data setup screen, click the Mapping tab.
-
From the Asset list, locate one or more data assets that you want to apply a mapping pattern to.
To help you find the assets you are looking for, you can search by name or filter based on the number of columns, record type, mapping status, publishing status, or available mapping patterns. Click the Filter icon to apply a filter.
-
Select the data assets that you want to apply a mapping pattern to.
- To apply a mapping pattern to a single data asset, click the Apply mapping pattern icon next to the asset name.
- To apply a mapping pattern to one or more data assets, select the checkbox next to the asset name, then click Apply mapping in the Assets list. The selected data assets must share the same structure and column format to be able to share a mapping pattern.
The Apply mapping patterns page shows a list of available mapping patterns. If there is a recommended matching pattern for this asset, it has a badge icon next to it.
-
Review the mapping patterns. You can see what other assets the mapping pattern currently applies to, along with the applicable record type, last updated date, and original source.
For more details and to compare mapping patterns to each other, select a primary mapping pattern, then click Compare mapping patterns. Horizontally scroll through the patterns to compare them to the one that you selected. Vertically scroll on the page to view more details such as snapshot details and column mappings.
-
Select the mapping pattern that you want to apply to the selected data assets, then click Next.
-
Review the mapping changes that you have selected. Confirm that you have chosen the correct assets and mapping pattern.
Applying a mapping pattern can change an asset's record type. Be careful not to erroneously change the record type. Mapping changes done by applying a mapping pattern cannot be undone without manually remapping the asset.
-
Click Finish to apply the mapping pattern.
-
After applying the mapping pattern, you're ready to publish the data into the system.
- If your data model is new or changed, publish your model first by clicking the publish model icon in the action bar. Wait for the publish job to complete.
- To publish your data, click the publish data icon in the action bar. Wait for the publish job to complete.
Adding relationship data from a flat data file
Before you can load a relationship data asset into IBM Match 360, you must first define the corresponding relationship type in the data model. For details, see Customizing your data model.
Relationship data assets are formatted into delimited rows (CSV or TSV). There are several required data columns:
- Record IDs for both parties in each relationship
- Record types for both parties in each relationship
- Record sources for both parties in each relationship
You can create the relationship data asset manually, through an ETL process, or by using the application where your relationships are stored.
To add relationship data into IBM Match 360 from a CSV or TSV data file:
-
From the navigation menu, click Data setup to open the Data setup screen. Click Start with data assets or select the Assets tab.
-
Load, map, and publish the record data assets into IBM Match 360. These data assets should contain the record data that you want to associate using relationships. For details, see Adding record data from a flat data file and Mapping your data into the data model.
-
Load your relationship data asset file:
a. Click the Upload asset to project icon in the action bar.
b. From the Data panel that opens, choose whether to add data by upload, from the project, or from the catalog. To upload a data file, choose Load.
Note: If your data includes governed catalogs, you might be unable to view or add some catalog assets. Depending on your permissions, you might only be able to view catalog assets that you own or manage.c. On your local computer, select a flat data file containing relationship data in CSV or TSV format and drag it into the Data panel. When the file finishes uploading, it is added to your assets summary list.
-
Review the details of your newly added relationships asset.
-
Hover over your relationships asset's row in the assets summary list and click the three-dot overflow menu.
-
Click Set asset properties.
-
Select Relationships from the Asset data category list.
-
Select the correct relationship type from the Asset data type list and click Save.
If the appropriate relationship type is not in the asset data type list, then you might have to customize your data model. For more information, see Customizing your data model.
-
Map and publish your relationship data asset. For details, see Mapping your data into the data model. Be sure to map each of the required data columns:
from record ID
,to record ID
,from record type
,to record type
,from record source
, andto record source
.Restriction: Relationship data does not support profiling and automapping. Manually map your columns to the data model.
Publishing sample data
If you don't have your own data assets ready to go but want to get started using the IBM Match 360 service, load the provided sample data and model instead.
To load the IBM Match 360 sample data:
- Go to the master data home page.
- From the Master data tile, click Publish sample model.
- After the sample model publish is complete, click Publish sample data.
- Optionally, go to the Jobs tab to watch the progress of your sample load jobs. If you don't want to watch the progress, you can go to another screen and the jobs will continue working in the background.
Next steps
Learn more
- Saving and loading configuration snapshots
- Working with governed data in IBM Match 360
- Master Data Management tutorial: Configure a 360-degree view
Parent topic: Configuring master data