AutoAI Tutorial: Data join multiclass classification model
This tutorial demonstrates how to merge data files to create a single data source for training an AutoAI experiment.
Attention: The AutoAI experiment feature for joining multiple data sources to create a single training data set is deprecated. Support for joining data in an AutoAI experiment will be removed on Dec 7, 2022. After Dec 7, 2022, AutoAI experiments with joined data and deployments of resulting models will no longer run. To join multiple data sources, use a data preparation tool such as Data Refinery or DataStage to join and prepare data, then use the resulting data set for training an AutoAI experiment. Redeploy the resulting model.
The objective of the analysis is to gain more insight on factors that impact customer experience so that customer service can be improved. The data consists of historical information about customer interaction with call agents, call type, customer wireless plans, and call type resolution. Each source of information is kept in a separate table (a CSV file).
Using the data join capabilities of AutoAI, you connect the tables by using common columns, or keys, to create a single data source, without needing to write SQL-like queries. Additionally, AutoAI performs automated data preparation, or feature engineering on the merged data before using it to train the model.
Watch this video to see a preview of the tutorial steps.
This video provides a visual method as an alternative to following the written steps in this documentation.
Overview of the data sets
This figure shows the relationship between the data. You use the data join canvas to create the data connections that are required to combine the data for the experiment.
The data that you join contains the following information:
- User_experience: User experience reflects the satisfactory feedback from customers to each call agent daily.
- Call_log: Records historical information about the calls from customers to the call center in the last 3 years.
- Call_Type: Records call type information.
- Wireless_Plans: Records the kind of wireless plans customers are subscribed to.
- Call_Resolution_Type: Records type of call resolution.
Tasks overview:
This tutorial presents the basic steps for building and training a machine learning model by using AutoAI:
- Create a Watson Studio project
- Create an AutoAI experiment
- Configuring the experiment
- Training the experiment
Task 1: Create a Watson Studio project
-
From the Gallery, download the Call Centre data set file to your local computer.
-
In the Projects page to create a new project, select New Project.
a. Select Create an empty project.
b. Include your project name.
c. Click Create.
Task 2: Create an AutoAI experiment
-
In the Assets tab from within your project, click New asset and choose AutoAI.
-
Specify a name and optional description for your new experiment, then click Create.
-
Select the Associate a Machine Learning service instance link to associate the IBM Watson Machine Learning service instance with your project. Click Reload to confirm your configuration.
-
To add a data source, you can choose one of the following:
a. To downloaded your file locally, upload the 5 CSV files in the Call Center Data data set, drag the files onto the data pane or click browse and then follow the prompts.
b. If you already uploaded your file to your project, click select from project, then select the data asset tab and add the five tables from the Call Center data set to the project.
Task 3: Configure the experiment
Step 1: Select main data source
-
Choose User_experience.csv as the main source (the table with a prediction target column).
-
Click Save Join to open the data join canvas.
Step 2: Connect the data tables
To connect data tables, drag from the plus button on the end of one source to the source you want to connect. For each connection, you are prompted to specify a key, which is the common column. You can choose from suggested keys or specify the keys manually.
-
Starting from the User_Experience.csv, drag the node to the
Call_log
table to create a connection. -
In the pane for configuring the join, click (+) to add the suggested key
Agent_ID
as a key andCall_Date
as a second key. -
Click Done to complete the join.
-
Using the details in this table, repeat steps 1-3 to create the remaining joins:
Main source | Joined source | Key |
---|---|---|
User_Experience | Call_log | Agent_ID Call_Date |
Call_log | Call_Resolution_Type | Call_resolution_ID |
Call_log | Call_Type | Call_Type_ID |
Call_log | Wireless_Plan | Plan_ID |
Your canvas looks like this when you complete the data joins:
Click the button Done and Save Join to finish the data join.
Task 4: Train the experiment
To train the model, you choose a prediction column in the main source and use the combined data source to train the model to create the prediction.
-
In Configuration details, select No for the option to create a Time Series Forecast.
-
Choose User_Experience as the column to predict.
-
Click Run experiment. As the model trains, you see an infographic that shows the process of building the pipelines.
-
After all the pipelines are created, you can compare their accuracy on the Pipeline leaderboard.
-
You can click Pipeline comparison to see how they differ. For example:
- Select the pipeline with Rank 1 and click Save as to create your model. Then, select Create. This saves the pipeline under the Models section in the Assets tab.
Learn more
Parent topic: Building an experiment with joined data