0 / 0
Adding data and mapping it to your data model (IBM Match 360)

Adding data and mapping it to your data model (IBM Match 360)

Each data source or asset must be mapped and published into the data model before it can be used in IBM Match 360 functions such as matching.

Required permissions
To add, map, and publish data assets into a master data instance, you must be a member of the DataEngineer user group for the IBM Match 360 service.
If you are working with a governed catalog, you can only view or add catalog assets of which you are the data asset owner.

IBM Match 360 includes a powerful automapping capability that removes the need for data engineers to manually map each column of data into the model. The automapping feature detects, analyzes, and categorizes each column of data to the corresponding attributes or fields in the data model. Before you can run automapping, you must profile your data. Profiling analyzes and classifies your data to enable the automapping process to take place.

Profiling and automapping is only supported for person and organization record types. For other types of records, manually map your columns to the data model.

In this topic:

Adding record data from a flat data file

To add record data into IBM Match 360 from a CSV or TSV data file:

  1. From the navigation menu, click Data setup to open the Data setup screen. Click Start with data assets or select the Assets tab.

  2. Click the Upload asset to project icon upload asset to project icon in the action bar.

  3. From the Data panel that opens, choose whether to add data by upload, from the project, or from the catalog. To upload a data file, choose Load.

  4. On your local computer, select a flat data file in CSV or TSV format and drag it into the Data panel. When the file finishes uploading, it is added to your assets summary list.

  5. Review the details of your newly added asset. If your asset does not have any information in the Asset data type column, you must define the record type. The asset data type provides information about the type of data that each asset contains. It is important to assign the record type to each asset so that IBM Match 360 can find the part of the data model that best fits the data.

    a. Hover over your asset's row in the assets summary list and click the three-dot overflow menu. Alternatively, to edit asset properties for more than one asset at a time, select the checkbox beside multiple assets in the assets summary list.

    b. Click Set asset properties.

    c. Select Records from the Asset data category list.

    d. Select the correct entity type from the Asset data type list and click Save.

    If the appropriate entity type is not in the asset data type list, then you might have to customize your data model. For more information, see Customizing your data model.

Next step: Map your data into the data model

Adding data or sources through your project

You can add data assets, sources, or connections to IBM Match 360 through your project.

You can use IBM Match 360 as a connected data source or target. For information about setting up and using the IBM Match 360 connection, see IBM Match 360 connection.

Any data asset files that you want to load into IBM Match 360 must include a file extension of a supported type, such as .csv or .tsv. This requirement includes assets that are already in your project. If an asset name in your project does not include a supported file extension, it will appear greyed out when you try to add it. To edit the name of an asset in your project, go to your project's Assets tab, select your asset, then edit the name in the About this asset panel.

Watch this video to see how to create a connection and add connected data to a project.

This video provides a visual method to learn the concepts and tasks in this documentation.

  • Video transcript
    Time Transcript
    00:00 This video shows you how to set up a connection to a data source and add connected data to a Watson Studio project.
    00:08 If you have data stored in a data source, you can set up a connection to that data source from any project.
    00:16 From here, you can add different elements to the project.
    00:20 In this case, you want to add a connection.
    00:24 You can create a new connection to an IBM service, such as IBM Db2 and Cloud Object Storage, or to a service from third parties, such as Amazon, Microsoft or Apache.
    00:39 And you can filter the list based on compatible services.
    00:45 You can also add a connection that was created at the platform level, which can be used across projects and catalogs.
    00:54 Or you can create a connection to one of your provisioned IBM Cloud services.
    00:59 In this case, select the provisioned IBM Cloud service for Db2 Warehouse on Cloud.
    01:08 If the credentials are not prepopulated, you can get the credentials for the instance from the IBM Cloud service launch page.
    01:17 First, test the connection and then create the connection.
    01:25 The new connection now displays in the list of data assets.
    01:30 Next, add connected data assets to this project.
    01:37 Select the source - in this case, it's the Db2 Warehouse on Cloud connection just created.
    01:43 Then select the schema and table.
    01:50 You can see that this will add a reference to the data within this connection and include it in the target project.
    01:58 Provide a name and a description and click "Create".
    02:06 The data now displays in the list of data assets.
    02:09 Open the data set to get a preview; and from here you can move directly into refining the data.
    02:17 Find more videos in the Cloud Pak for Data as a Service documentation.

For more information about adding data directly to your project, see Adding data to a project.

After adding data, you must map it into the IBM Match 360 data model. For details, see Map your data into the data model.

Mapping your data into the data model

To map a data asset into the IBM Match 360 data model:

  1. On the Data setup screen, click the Mapping tab.

  2. From the Asset list, select the data asset that you want to map into the system. The data from the asset displays in tabular format with a number of rows and columns. Each column represents an attribute that must be mapped to a corresponding attribute type in the data model. When you first open a data source or asset, each column is marked with a Not Mapped tag.

    Tip: You can manually map each column if you choose, but you can greatly speed up the mapping process by taking advantage of the automapping feature.
  3. To enable automapping for this source or asset, you must first profile the data. Click Profile. Profiling analyzes and classifies your data to enable the automapping process to take place. Profiling can take some time to complete, so it runs in the background so that you can continue working. You might want to start reviewing and manually mapping some columns.

    Automapping will never overwrite any manual mapping that you have done.

  4. When profiling completes, click Automap. IBM Match 360 with Watson analyzes your data and automatically maps as many columns as possible into the data model. Even if it cannot map a specific column, the automap function can suggest some of the most likely mapping selections.

  5. Review the automapping results. If any of the mappings are incorrect, or if a column remains unmapped, then manually map it correctly. Alternately, if a specific column is not required, you can exclude it from your IBM Match 360 with Watson data load.

  6. To manually map a column, select it, then use the Mapping targets panel to search for and select the appropriate attribute or field from the data model. Click Map and save to data model. If an appropriate attribute or field does not exist in the data model, you can create one from the Mapping targets panel. Click + to create and provide the details of a new field or attribute.

    If you choose to create a Simple attribute instead of assigning an existing attribute type, then the new simple attribute is added directly to the record type in the Modeling tab. It is not categorized under Attribute types.

  7. Scroll horizontally through the columns to ensure that every column in your data source or asset is mapped. If any columns are not mapped, automapped, or explicitly excluded from mapping, then the data asset remains in a Mapping in progress state.

    To exclude a column from being mapped, select the column, then select Exclude this column from mapping.

    Important: If the record_source attribute is mapped to any field in the asset, then that field must be populated in all of the asset's records. If any record is missing a value for the record_source field, then the asset cannot successfully load. If you do not map the record_source attribute to an existing field, then a default record source name is derived using the asset name.

  8. When you finish mapping the data source, you're ready to publish the data into the system.

    • If your data model is new or changed, publish your model first by clicking the publish model icon publish data model icon in the action bar. Wait for the publish job to complete.
    • To publish your data, click the publish data icon publish data icon in the action bar. Wait for the publish job to complete.
  9. Return to the configuration overview page by selecting Configuration overview from the navigation menu.

  10. On the configuration overview page, confirm that you have at least one data source or asset that is added and mapped.

Adding relationship data from a flat data file

Before you can load a relationship data asset into IBM Match 360, you must first define the corresponding relationship type in the data model. For details, see Customizing your data model.

Tip: Be sure to publish the data model after defining a new relationship type.

Relationship data assets are formatted into delimited rows (CSV or TSV). There are several required data columns:

  • Record IDs for both parties in each relationship
  • Record types for both parties in each relationship
  • Record sources for both parties in each relationship

You can create the relationship data asset manually, through an ETL process, or by using the application where your relationships are stored.

To add relationship data into IBM Match 360 from a CSV or TSV data file:

  1. From the navigation menu, click Data setup to open the Data setup screen. Click Start with data assets or select the Assets tab.

  2. Load, map, and publish the record data assets into IBM Match 360. These data assets should contain the record data that you want to associate using relationships. For details, see Adding record data from a flat data file and Mapping your data into the data model.

  3. Load your relationship data asset file:

    a. Click the Upload asset to project icon upload asset to project icon in the action bar.

    b. From the Data panel that opens, choose whether to add data by upload, from the project, or from the catalog. To upload a data file, choose Load.

    Note: If your data includes governed catalogs, you might be unable to view or add some catalog assets. Depending on your permissions, you might only be able to view catalog assets that you own or manage.

    c. On your local computer, select a flat data file containing relationship data in CSV or TSV format and drag it into the Data panel. When the file finishes uploading, it is added to your assets summary list.

  4. Review the details of your newly added relationships asset.

  5. Hover over your relationships asset's row in the assets summary list and click the three-dot overflow menu.

  6. Click Set asset properties.

  7. Select Relationships from the Asset data category list.

  8. Select the correct relationship type from the Asset data type list and click Save. If the appropriate relationship type is not in the asset data type list, then you might have to customize your data model. For more information, see Customizing your data model.

  9. Map and publish your relationship data asset. For details, see Mapping your data into the data model. Be sure to map each of the required data columns: from record ID, to record ID, from record type, to record type, from record source, and to record source.

    Restriction: Relationship data does not support profiling and automapping. Manually map your columns to the data model.

Publishing sample data

If you don't have your own data assets ready to go but want to get started using the IBM Match 360 service, load the provided sample data and model instead.

To load the IBM Match 360 sample data:

  1. Go to the master data home page.
  2. From the Master data tile, click Publish sample model.
  3. After the sample model publish is complete, click Publish sample data.
  4. Optionally, go to the Jobs tab to watch the progress of your sample load jobs. If you don't want to watch the progress, you can go to another screen and the jobs will continue working in the background.

Next steps

Learn more

Parent topic: Configuring master data

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more