0 / 0
AutoAI Tutorial: Data join multiclass classification model
AutoAI Tutorial: Data join multiclass classification model

AutoAI Tutorial: Data join multiclass classification model

This tutorial demonstrates how to merge data files to create a single data source for training an AutoAI experiment.

Attention: The AutoAI experiment feature for joining multiple data sources to create a single training data set is deprecated. Support for joining data in an AutoAI experiment will be removed on Dec 7, 2022. After Dec 7, 2022, AutoAI experiments with joined data and deployments of resulting models will no longer run. To join multiple data sources, use a data preparation tool such as Data Refinery or DataStage to join and prepare data, then use the resulting data set for training an AutoAI experiment. Redeploy the resulting model.

The objective of the analysis is to gain more insight on factors that impact customer experience so that customer service can be improved. The data consists of historical information about customer interaction with call agents, call type, customer wireless plans, and call type resolution. Each source of information is kept in a separate table (a CSV file).

Using the data join capabilities of AutoAI, you connect the tables by using common columns, or keys, to create a single data source, without needing to write SQL-like queries. Additionally, AutoAI performs automated data preparation, or feature engineering on the merged data before using it to train the model.

Watch this video to see a preview of the tutorial steps.

This video provides a visual method as an alternative to following the written steps in this documentation.

Overview of the data sets

This figure shows the relationship between the data. You use the data join canvas to create the data connections that are required to combine the data for the experiment.

An image of tables connected by foreign keys

The data that you join contains the following information:

  • User_experience: User experience reflects the satisfactory feedback from customers to each call agent daily.
  • Call_log: Records historical information about the calls from customers to the call center in the last 3 years.
  • Call_Type: Records call type information.
  • Wireless_Plans: Records the kind of wireless plans customers are subscribed to.
  • Call_Resolution_Type: Records type of call resolution.

Tasks overview:

This tutorial presents the basic steps for building and training a machine learning model by using AutoAI:

  1. Create a Watson Studio project
  2. Create an AutoAI experiment
  3. Configuring the experiment
  4. Training the experiment

Task 1: Create a Watson Studio project

  1. From the Gallery, download the Call Centre data set file to your local computer.

  2. In the Projects page to create a new project, select New Project.
    a. Select Create an empty project.
    b. Include your project name.
    c. Click Create.

Task 2: Create an AutoAI experiment

  1. In the Assets tab from within your project, click New asset and choose AutoAI.

  2. Specify a name and optional description for your new experiment, then click Create.

  3. Select the Associate a Machine Learning service instance link to associate the IBM Watson Machine Learning service instance with your project. Click Reload to confirm your configuration.

  4. To add a data source, you can choose one of the following:
    a. To downloaded your file locally, upload the 5 CSV files in the Call Center Data data set, drag the files onto the data pane or click browse and then follow the prompts.
    b. If you already uploaded your file to your project, click select from project, then select the data asset tab and add the five tables from the Call Center data set to the project.

Task 3: Configure the experiment

Step 1: Select main data source

  1. Choose User_experience.csv as the main source (the table with a prediction target column).

  2. Click Save Join to open the data join canvas.

Joining configuration of the five tables

Step 2: Connect the data tables

To connect data tables, drag from the plus button on the end of one source to the source you want to connect. For each connection, you are prompted to specify a key, which is the common column. You can choose from suggested keys or specify the keys manually.

  1. Starting from the User_Experience.csv, drag the node to the Call_log table to create a connection.

    Dragging node of the main source to the call log table.

  2. In the pane for configuring the join, click (+) to add the suggested key Agent_ID as a key and Call_Date as a second key.

    Adding Agent ID and Call Date as keys

  3. Click Done to complete the join.

  4. Using the details in this table, repeat steps 1-3 to create the remaining joins:

Main source Joined source Key
User_Experience Call_log Agent_ID
Call_Date
Call_log Call_Resolution_Type Call_resolution_ID
Call_log Call_Type Call_Type_ID
Call_log Wireless_Plan Plan_ID

Your canvas looks like this when you complete the data joins:

Five tables after they are joined

Click the button Done and Save Join to finish the data join.

Task 4: Train the experiment

To train the model, you choose a prediction column in the main source and use the combined data source to train the model to create the prediction.

  1. In Configuration details, select No for the option to create a Time Series Forecast.

  2. Choose User_Experience as the column to predict.

    Configuring experiment details. No to time series forecast and User Experience as the column to predict.

  3. Click Run experiment. As the model trains, you see an infographic that shows the process of building the pipelines.

    Experiment summary generating pipelines

  4. After all the pipelines are created, you can compare their accuracy on the Pipeline leaderboard.

    Ranked pipeline leaderboard based on accuracy

  5. You can click Pipeline comparison to see how they differ. For example:

    Metric chart of pipeline comparison

  6. Select the pipeline with Rank 1 and click Save as to create your model. Then, select Create. This saves the pipeline under the Models section in the Assets tab.

Learn more

AutoAI overview

Parent topic: Building an experiment with joined data

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more