Master Data Management tutorial: Configure a 360-degree view
Take this tutorial to configure a 360-degree view of customers and explore these customers with the Master Data Management use case of the data fabric trial. The goal of this tutorial is to combine customer data with credit score data to resolve entities across the data and create a consolidated 360 view of customers, as well as to identify the highest value customers to target in the campaigns and determine the best rates to offer them.
Tech preview This is a technology preview and is not yet supported for use in production environments.
The story for the tutorial is that Golden Bank wants to run a campaign to offer lower mortgage rates. As a data engineer, you must use IBM Match 360 to set up, map, and model your data for a 360-degree view of the customer.
The following animated image provides a quick preview of what you’ll accomplish by the end of this tutorial. You will set up and add assets to master data, map the data asset attributes, publish the data model and run matching, publish the matched data to a catalog, and then explore and visualize the matched data. Click the image to view a larger image.
Preview the tutorial
In this tutorial, you can complete the following tasks:
- Set up the prerequisites.
- Task 1: Create a catalog for the matched data.
- Task 2: Set up and add assets to master data.
- Task 3: Map the data asset attributes.
- Task 4: Publish the data model and run matching.
- Task 5: Publish the matched data to a catalog.
- Task 6: Preview your matched data.
- Task 7: Tune matching algorithm and run matching.
- Task 8: Gain insight on the matching results.
- Task 9: Visualize records of entities.
- Cleanup (Optional)
Watch this video to preview the steps in this tutorial. There might be slight differences in the user interface shown in the video. The video is intended to be a companion to the written tutorial.
This video provides a visual method to learn the concepts and tasks in this documentation.
Tips for completing this tutorial
Here are some tips for successfully completing this tutorial.
Use the video picture-in-picture
The following animated image shows how to use the video picture-in-picture and table of contents features:
Get help in the community
If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.
Set up your browser windows
For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.
Set up the prerequisites
Sign up for Cloud Pak for Data as a Service
You must sign up for Cloud Pak for Data as a Service and provision the necessary services for the Master Data Management use case.
- If you have an existing Cloud Pak for Data as a Service account, then you can get started with this tutorial. If you have a Lite plan account, only one user per account can run this tutorial.
- If you don't have a Cloud Pak for Data as a Service account yet, then sign up for a data fabric trial.
Watch the following video to learn about data fabric in Cloud Pak for Data.
This video provides a visual method to learn the concepts and tasks in this documentation.
Verify the necessary provisioned services
To preview this task, watch the video beginning at 00:50.
Follow these steps to verify or provision the necessary services.
-
In Cloud Pak for Data, verify that you are in the Dallas region. If not, click the region drop down, and then select Dallas.
-
From the Navigation menu , choose Services > Service instances.
-
Use the Product drop-down box to determine whether an IBM Match 360 with Watson service instance exists.
-
If you need to create a IBM Match 360 service instance, click Add service.
-
Select IBM Match 360 with Watson.
-
For the region, select Dallas.
-
Select the Lite plan.
-
Optional: Type a name for your IBM Match 360 with Watson service instance.
-
Click Create.
-
-
Repeat these steps to verify or provision the following services:
- IBM Knowledge Catalog
- Cloud Object Storage
Check your progress
The following image shows the provisioned service instances:
Create the sample project
To preview this task, watch the video beginning at 01:29.
Follow these steps to create the sample project for this tutorial:
-
Access the Master Data Management sample project in the Resource hub.
-
Click Create project.
-
If prompted to associate the project to a Cloud Object Storage instance, select a Cloud Object Storage instance from the list.
-
Click Create.
-
Wait for the project import to complete, and then click View new project to verify that the project and assets were created successfully.
Note: If this occasion is your first time accessing a project, you see a guided tour asking if you want a tour of projects. For now, click Maybe later. -
Click the Assets tab to view the project's assets.
Check your progress
The following image shows the sample project. You are now ready to start the tutorial.
Task 1: Create a catalog for the matched data
To preview this task, watch the video beginning at 02:08.
You need a catalog for the master data and for access to the matched data. With the IBM Knowledge Catalog Lite plan, you can create two catalogs. If you already have two catalogs, you can use one of your existing catalogs and verify that you are an editor of the catalog that you wish to use.
Option 1: Use the default catalog
Follow these steps to verify that you have the appropriate access to use the default catalog:
-
From the Navigation menu , choose Catalogs > View all catalogs
-
Open the catalog that you wish to use for this tutorial.
-
Click the Access control tab.
-
Verify that your account has the Editor role. If your access is Viewer, then contact your administrator to request Editor access.
Option 2: Create a new catalog
Otherwise, follow these steps to create the catalog:
-
On the Catalogs page, click Create Catalog.
-
For the Name, copy and paste the catalog name exactly as shown with no leading or trailing spaces:
Mortgage Approval Catalog
-
Select Enforce data protection rules, confirm the selection, and accept the defaults for the other fields.
-
Click Create to use the default settings. Your new catalog opens.
Check your progress
The following image shows your catalog. Now that you have a catalog, you can set up master data and add the data assets.
Task 2: Set up and add assets to master data
To preview this task, watch the video beginning at 02:48.
You must add all of the data assets that you want to consolidate to master data. The sources of data can be from sources that include your computer's hard disk or a data asset from a project or catalog.
-
From the Navigation menu , choose Data > Master data.
-
If you need to set up master data, click Set up master data and follow the steps to associate the required project and services with master data. Otherwise, click Go to configuration and continue to the next step.
-
Select your Cloud Object Storage service, then click Next.
-
Select your Master Data Management project, accept the default name for the configuration asset, and then click Next.
-
Select your existing catalog, check the Enfoce data protection rules option, and then click Next.
-
Accept the default workflow configuration name, and click Finish.
-
Click Continue with configuration to complete the setup.
-
-
Click Start with data assets.
-
Click Add data.
-
Insert all three of the data assets in the project:
-
Select the Project tab.
-
Select all three csv files, Campaign Prospects.csv, Customers.csv, and Experiancc.csv, and then click the Insert Data icon ().
-
Click Add data.
-
-
Assign the Person record type to your data assets. Record Type provides information about the type of data that an asset contains. Each asset needs to have an assigned record type so that IBM Match 360 can find the part of the model that best fits the data.
-
Select the checkbox for the three assets, Campaign Prospects.csv, Customers.csv, and Experiancc.csv, and click Set asset properties.
-
For each asset, click the Select data asset type drop-down list, and select the Person data asset type.
-
Click Save.
-
Check your progress
The following image shows the assets added to master data. Now that you set up master data and added the three data assets, you are ready to begin mapping the data asset attributes.
Task 3: Map the data asset attributes
To preview this task, watch the video beginning at 03:22.
For IBM Match 360 to match all of your data, you must specify which columns of each data set are mapped to specific attributes that are understood by IBM Match 360. Follow these steps to map the data asset attributes.
-
Click the Mapping tab to begin mapping the columns of your data assets to the appropriate attributes.
-
In the Asset list panel, select Campaign Prospects.csv.
-
If you need to profile your data, click Profile and when prompted, click Start profiling. Profiling your data is a prerequisite to automatically mapping columns of your data to attributes of the IBM Match 360 data model. Profiling takes 2-5 minutes. A message that says Profiling is complete displays when your data is finished being profiled.
-
When profiling is complete, you can automatically map columns of your data by clicking Yes, automap in the prompt or Automap from the mapping menu of your asset.
-
Refer to Table 1: Campaign Prospects.csv mapping to manually map all of the columns that have the status Not mapped or not mapped correctly according to Table 1: Campaign Prospects.csv mapping. To map a column to an attribute, you can follow the example: map an existing attribute. To exclude a column, you can follow the example: exclude columns from mapping.
-
Ensure that all of the columns in your asset have a status of either Mapped, Automapped, or Excluded, and click Map and save to data model. Otherwise, repeat Task 3, step 5.
-
Repeat Task 3 for your Customers.csv and Experiancc.csv assets. Use the respective tables to map the columns for your Customers.csv and Experiancc.csv assets to the IBM Match 360 data model as suggested in Table 2: Customers.csv suggested mapping and Table 3: Experiancc.csv suggested mapping. Refer to the examples that explain how to manually map individual attributes. You can either map a column to an existing attribute or exclude a column from mapping.
Example 1: Map an existing attribute
To preview this task, watch the video beginning at 04:07.
This example explains how to map the legal_name.full_name column in the Campaign Prospects.csv data asset to the existing attribute legal_name.full_name - Legal name - Full name. IBM Match 360 provides some attributes that are commonly associated with customer records that you can choose to map the columns in your data set to.
-
Click the column legal_name.full_name.
-
From the Mapping targets panel, in the search field, type
Legal name - Full name
. -
Click Map and Save to data model to map the column to the attribute. The column displays as Mapped and Mapped to: Legal name - Full name.
You can repeat these steps to map other columns of your data assets to existing attributes that either you previously created or provided by IBM Match 360.
Example 2: Exclude columns from mapping
To preview this task, watch the video beginning at 05:15.
This example explains how to exclude a column from the data asset mapping. You can exclude columns from the mapping if they are not useful to IBM Match 360 during the matching process or if you do not want to include them in your matched data output.
-
Click the column that is named Source.
-
Click the checkbox Exclude this column from mapping.
-
Click Map and Save data model to map the column to the attribute. The column displays as Excluded.
You can repeat these steps to exclude other columns of your data assets.
Table 1. Campaign Prospects.csv suggested mapping
Column | Target | Method |
---|---|---|
Source | Exclude this column from mapping | Exclude column from mapping |
ID | Exclude this column from mapping | Exclude column from mapping |
birth_date.value | Birth date | Map an existing attribute |
gender.value | Gender | Map an existing attribute |
legal_name.full_name | Legal name - Full name | Map an existing attribute |
mobile_telephone.phone_number | Mobile telephone - Phone number | Map an existing attribute |
personal_email.email_id | Personal email - Email address | Map an existing attribute |
Lead Quality | Exclude this column from mapping | Exclude column from mapping |
Table 2. Customers.csv suggested mapping
Column | Target | Method |
---|---|---|
Customer Number | Exclude this column from mapping | Exclude column from mapping |
NAME | Legal name - Full name | Map an existing attribute |
COUNTRY | Exclude this column from mapping | Exclude column from mapping |
STREET_ADDRESS | Primary residence - Address line 1 | Map an existing attribute |
CITY | Primary residence - City | Map an existing attribute |
STATE | Primary residence - State/Province value | Map an existing attribute |
ZIP_CODE | Primary residence - Postal code | Map an existing attribute |
EMAIL_ADDRESS | Personal email - Email address | Map an existing attribute |
PHONE_NUMBER | Home telephone - Phone number | Map an existing attribute |
GENDER | Gender | Map an existing attribute |
CREDITCARD_NUMBER | Exclude this column from mapping | Exclude column from mapping |
Table 3. Experiancc.csv suggested mapping
Column | Target | Method |
---|---|---|
source | Exclude this column from mapping | Exclude column from mapping |
Experian_ID | Exclude this column from mapping | Exclude this column from mapping |
birth_date.value | Birth date | Map an existing attribute |
gender.value | Gender | Map an existing attribute |
home_telephone.phone_number | Home telephone - Phone number | Map an existing attribute |
legal_name.given_name | Legal name - Given name | Map an existing attribute |
legal_name.last_name | Legal name - Last name | Map an existing attribute |
mobile_telephone.phone_number | Mobile telephone - Phone number | Map an existing attribute |
personal_email.email_id | Personal email - Email address | Map an existing attribute |
primary_residence.address_line1 | Primary residence - Address line 1 | Map an existing attribute |
primary_residence.address_line2 | Primary residence - Address line 2 | Map an existing attribute |
primary_residence.city | Primary residence - City | Map an existing attribute |
primary_residence.province_state | Exclude this column from mapping | Exclude column from mapping |
primary_residence.zip_postal_code | Primary residence - Postal code | Map an existing attribute |
Credit score | Exclude this column from mapping | Exclude column from mapping |
CREDITCARD_NUMBER | Exclude this column from mapping | Exclude column from mapping |
Check your progress
The following image shows all of the mapped data assets. Now that you mapped the attributes for all three data assets, you can publish the data model and run matching.
Task 4: Publish the data model and run matching
Publish the data model and all data
To preview this task, watch the video beginning at 05:51.
The data model is created after you map all of the columns from your data assets to attributes. Your published data model is used by IBM Match 360 to resolve single entities from all of your data sources. Follow these steps to publish the data model.
-
After you map the last column of your last data set, you can either click Publish model in the window that displays or the Publish model icon . This option displays after you finish mapping all of the columns in your three data assets. Publishing your model takes up to 1 minute. You receive a notification when your data model is successfully published.
-
Click the Publish all data icon , then click Publish data to load the mapped data assets into the IBM Match 360 data model based on the mapping. The statuses of the assets change from Publishing data to Ready to match. The data takes 5-10 minutes to load into service.
Check your progress
The following image shows the data assets listed as loaded into service indicating that the data model was published successfully. Next, you can run matching.
Complete matching setup and run matching
To preview this task, watch the video beginning at 06:23.
IBM Match 360 uses your published data model to consolidate all of the records of your data sources into single entities to create a data asset with more complete records. Follow these steps to run matching:
-
Click the Data setup drop-down, and select Matching setup from the menu.
-
Click the Match Settings tab, and then click Got it on the Attribute selection screen. For this tutorial, you can accept the default attributes that are already selected. Here you can choose attributes that can help distinguish records from each other like birth dates, email addresses, or phone numbers to help the matching algorithm.
-
Select the Match results tab, and click Run matching. You receive a notification when the matching process is complete and the matching results are displayed.
Check your progress
The following image shows the results after you ran matching. Now that you published the data model and ran matching, you are ready to publish the matched data to a catalog.
Task 5: Publish the matched data to a catalog
To preview this task, watch the video beginning at 06:54.
Create a connection asset for IBM Match 360
To access the matched data in a project, you need to create a connection asset to IBM Match 360. The IBM Match 360 connection asset connects data that is matched with the IBM Match 360 service to a connected data asset. Follow these steps to create the connection asset.
-
From the Navigation menu , choose Projects > View all projects
-
Choose your Master Data Management sample project.
-
On the Assets tab, click New asset > Connect to a data source.
-
Select the IBM Match 360 connector, and click Next.
-
Type the connection asset name,
Match 360 Connection
. -
Retrieve the CRN of your IBM Match 360 with Watson service instance:
-
From the IBM Cloud console resource list page, click Analytics to expand the list of your service instances.
-
In the Product column, click IBM Match 360 with Watson.
-
In the details panel that opens, click the Copy to clipboard icon for the CRN of your selected IBM Match 360 with Watson service.
-
-
In the Connection details, paste the CRN that corresponds with your IBM Match 360 with Watson service instance.
-
Create an IBM Match 360 API key:
-
From the IBM Cloud console, click Manage > Access (IAM).
-
Click the API keys page.
-
Click Create an IBM Cloud API key. If you have any existing API keys, the button may be labelled Create.
-
Type a name and description.
-
Click Create.
-
Copy the API key.
-
Download the API key for future use.
-
-
Complete the API key field with the API key that you created.
-
Click Create.
-
If asked to confirm you want to create the connection without setting location and sovereignty, click Create.
Check your progress
The following image shows the Match 360 connection asset. Now you can create a connected data asset from this connection.
Import connected data asset
To preview this task, watch the video beginning at 8:32.
Now use the IBM Match 360 connection to create a new connected data asset of your consolidated data from IBM Match 360. Follow these steps to create a connected data asset.
-
Click Import assets.
-
On the Import assets page, select Connected data.
-
Select Match 360 connection > records > person > person_entity.
-
Click Import.
Check your progress
The following image shows the connected data asset. Now that you created the connected data asset for your consolidated, matched data, you can publish that asset to a catalog.
Publish the connected data asset to your catalog
To preview this task, watch the video beginning at 8:55.
Follow these steps to publish the consolidated, matched data to that catalog.
-
In your Master Data Management project, verify that you are on the Assets tab.
-
Click the Overflow menu for your connected data asset person_entity, and choose Publish to catalog.
-
Select the Mortgage Approval Catalog (or your catalog name) from the list, and click Next.
-
Optionally, select the option to Go to the catalog after publishing it, and click Next.
-
Review the assets, and click Publish.
-
-
View and update the asset in the catalog:
-
From the Navigation menu , choose Catalogs > View all catalogs.
-
Click the catalog that you published your connected data asset to.
-
In your catalog, click the person_entity connected data asset.
-
Click the Edit name icon and type the name for your connected data asset,
Golden Bank 360 View
-
Click the Asset tab to preview the data.
-
Check your progress
The following image shows the data asset in the catalog.
As a data engineer for Golden Bank, you successfully used IBM Match 360 to set up, map, and model your data for a 360-degree view of the customer. You then published the complete 360-degree view of your matched data to your catalog for others in your organization to access.
Task 6: Preview your matched data
To preview this task, watch the video beginning at 09:28.
Now that you published your model or data changes to IBM Match 360, set your matching parameters, and run matching, you can use master data explorer to query your matched data. The master data explorer empowers you to find, view, compare, and edit matching results. Now, as a data analyst for Golden Bank, you must analyze, explore, and validate IBM Match 360 results to identify and select the best qualifying customers to target formarketing campaign offers. Follow these steps to explore and tune your matched data.
-
From the Navigation menu , choose Data > Master data.
-
Click Search master data.
-
In the search bar, type
Branden Banks
, and press Enter to add Branden Banks as a search criteria. For this search query, 2 entities appear for Branden Banks. The number 2 in the first column indicates that two source records that make up this entity and the number 1 in the first column means that one source record makes up the other entity. -
Click the arrow icon to expand both of the entities. You can see that these separate entities for Branden Banks is likely just one person. To join these entities into a single entity, you can tune the matching algorithm.
Check your progress
The following image shows the search results in Master data explorer. Next, you can tune the matching algorithm and run matching again.
Task 7: Tune matching algorithm and run matching
To preview this task, watch the video beginning at 10:09.
After exploring the matched data, it is sometimes necessary to fine tune the matching algorithm and run matching again to obtain better results.
-
Click the Master data explorer drop-down, and select Matching setup from the menu.
-
Click the Matching Settings tab, and then select the Algorithm tuning page.
-
In the Autolink threshold fiel, type
20
. Reducing the threshold to 20 results in more overall matches between records across your sources. -
Click Apply threshold > Next > Run matching to run matching with your tuned algorithm.
-
Click the Match results tab. The results are displayed when matching is finished.
Check your progress
The following image shows the results of matching setup. Next, you can view the matched data again to see how the fine tuning changed the results.
Task 8: Gain insight on the matching results
To preview this task, watch the video beginning at 10:45.
You can return to the master data explorer to see how algorithm tuning changed your match results.
-
Click the Matching setup drop-down, and select Master data explorer from the menu.
-
In the search bar, type
Branden Banks
, and press Enter to add Branden Banks as a search criteria. The number 3 associated with the entity that is displayed means that three records make up the entity Branden Banks wheras before it was split up across separate entities. -
Expand the row in the first column of the entity to view the records. You can see the three records that were matched to this entity.
Check your progress
The following image shows the search results in Master data explorer. Next, you can gain insight by visualizing the matching results.
Task 9: Visualize records of entities
To preview this task, watch the video beginning at 11:11.
You can also visualize your tuned matching results as nodes to gain insights.
-
Click Show graph to see which records are contributing to queried entities.
-
Click any of the nodes that are connected to the person entity to view the details associated with it. From here, you can visualize and manually modify which records are associated which each entity from your query to make corrections as needed.
Check your progress
The following image shows the search results as a graph.
As a data analyst, you analyzed, explored, and validated IBM Match 360 results to identify and select the best qualifying customers to target for marketing campaign offers.
Cleanup (Optional)
If you would like to retake the tutorials in the Master Data Management use case, delete the following artifacts.
Artifact | How to delete |
---|---|
Mortgage Approval Catalog | Delete a catalog |
Master Data Management sample project | Delete a project |
Next steps
-
Try these tutorials:
-
Sign up for another Data fabric use case.
Learn more
Parent topic: Use case tutorials