Quick start: Build and deploy a machine learning model with AutoAI

Last updated: Nov 27, 2024

You can automate the process of building a machine learning model with the AutoAI tool. Read about the AutoAI tool, then watch a video and take a tutorial that’s suitable for beginners and does not require coding.

Required services: watsonx.ai Studio; watsonx.ai Runtime

Your basic workflow includes these tasks:

Create a project. Projects are where you can collaborate with others to work with data.
Add your data to the project. You can add CSV files or data from a remote data source through a connection.
Create an AutoAI experiment in the project.
Review the model pipelines and save the desired pipeline as a model to deploy or as a notebook to customize.
Deploy and test your model.

Read about AutoAI

The AutoAI graphical tool automatically analyzes your data and generates candidate model pipelines customized for your predictive modeling problem. These model pipelines are created iteratively as AutoAI analyzes your dataset and discovers data transformations, algorithms, and parameter settings that work best for your problem setting. Results are displayed on a leaderboard, showing the automatically generated model pipelines ranked according to your problem optimization objective.

Watch a video about creating a model using AutoAI

Watch Video Watch this video to see how to create and run an AutoAI experiment based on the bank marketing sample.

Note: This video shows tasks 2-5 of this tutorial.

This video provides a visual method to learn the concepts and tasks in this documentation.

Transcript

Synchronize transcript with video

Video transcript
Time	Transcript
00:00	This video shows you how to run a sample AutoAI experiment to create a Machine Learning model.
00:08	Start in a project and add to that project a new AutoAI experiment.
00:16	To run an AutoAI experiment, you'll need the Machine Learning service.
00:22	Here you have the option to associate a Machine Learning service with this project.
00:29	You can either create a new service instance or select an existing service instance.
00:39	When you return to the page where you're creating the experiment, just reload the page and you'll see the new service instance listed.
00:48	For this first experiment, you will select a sample.
00:52	The "Bank marketing" sample contains text data collected from phone calls to a bank in response to a marketing campaign.
01:01	When you select a sample, the experiment name and description are filled in for you, so you're ready to create the experiment.
01:11	Next, the AutoAI experiment builder displays.
01:15	Since this experiment is from a sample, the bank marketing source file is already selected.
01:22	And the column to predict is also already selected.
01:26	In this case, it's the "y" column, which represents whether a user will sign up for a term deposit as part of the marketing campaign.
01:35	Based on the data set and the selected column to predict, AutoAI analyzes a subset of the data and chooses a prediction type and metric to optimize.
01:47	In this case, since the column to predict contains values of "Y" or "N" (for yes or no) the binary classification was chosen.
01:57	The positive class is "Yes" and the optimized metric is ROC AUC.
02:03	The ROC AUC metric balances precision, accuracy, and recall.
02:10	Now, run the experiment and wait as the "Pipeline leaderboard" fills in.
02:17	During AutoAI training, your data set is split into two parts: training data and holdout data.
02:24	The training data is used by the AutoAI training stages to generate the model pipelines and cross validation scores are used to rank them.
02:34	After training, the holdout data is used for the resulting pipeline model evaluation and computation of performance information, such as the ROC curves and confusion matrices.
02:48	Next, AutoAI generates pipelines using different estimators, such as the XGBoost classifier, or enhancements, such as hyperparameter optimization and feature engineering, with the pipelines ranked based on the accuracy metric.
03:06	Hyperparameter optimization is a mechanism for automatically exploring a search space of potential hyperparameters, building a series of models, and comparing the models using metrics of interest.
03:20	Feature engineering attempts to transform the raw data into the combination of features that best represents the problem to achieve the most accurate prediction.
03:31	Okay, the run has completed.
03:34	The legend explains where to find the data, top algorithm, pipelines, and feature transformers on the relationship map.
03:44	You can view the full log to see complete details.
03:48	By default, you'll see the "Relationship map", but you can swap views to see the "Progress map".
03:57	Scroll down to take a look at the leaderboard.
04:01	You may want to start with comparing the pipelines.
04:05	This chart provides metrics for the eight pipelines, viewed by cross-validation score, or by holdout score.
04:13	You can see the pipelines ranked based on other metrics, such as average precision.
04:21	Back on the "Experiment summary" tab, expand a pipeline to view the model evaluation measures and ROC curve.
04:30	You can view an individual pipeline to see more details in addition to the confusion matrix, precision recall curve, model information, feature transformations, and feature importance.
04:49	This pipeline had the highest ranking, so you can save this as a machine learning model.
04:55	Just accept the defaults and save the model.
05:01	Now that you've trained the model, you're ready to view the model and deploy it.
05:06	The "Overview" tab shows a model summary and the input schema.
05:12	To deploy the model, you'll need to promote it to a deployment space.
05:17	Since this project doesn't have a deployment space associated with it yet, you'll need to set up the association first.
05:25	You can either select an existing deployment space or create a new deployment space.
05:31	When you create a new space, just provide a name and description and select the Cloud Object Storage and Machine Learning service.
05:41	Then create the space.
05:45	Now, select this new space, add a description for the model, and click "Promote".
05:53	Use the link to go to the deployment space.
06:00	Here's the model you just created, which you can now deploy.
06:04	In this case, it will be an online deployment.
06:08	Just provide a name for the deployment and click "Create".
06:12	Then wait while the model is deployed.
06:16	When the model deployment is complete, view the deployment.
06:20	On the "API reference" tab, you'll find the scoring endpoint for future reference.
06:26	You'll also find code snippets for various programming languages to utilize this deployment from your application.
06:35	On the "Test" tab, you can test the model prediction.
06:40	You can either enter test input data or paste JSON input data, then click "Predict".
06:52	This shows that there's a very high probability that the first person will not subscribe to a term deposit and a high probability that the second person will subscribe to a term deposit.
07:06	And back in the project, on the "Assets" tab, you'll find the AutoAI experiment and the model.
07:17	Find more videos in the Cloud Pak for Data as a Service documentation.

Try a tutorial to create a model using AutoAI

This tutorial guides you through training a model to predict if a customer is likely subscribe to a term deposit based on a marketing campaign.

In this tutorial, you will complete these tasks:

Task 1: Open a project.
Task 2: Build and train the model.
Task 3: Promote the model to a deployment space and deploy the trained model
Task 4: Test the deployed model.
Task 5: Create a batch job to score the model.

This tutorial will take approximately 30 minutes to complete.

Sample data

The sample data that is used in the guided experience is UCI: Bank marketing data used to predict whether a customer enrolls in a marketing promotion.The data is automatically uploaded and available for your use when you select Resource hub sample as the basis for your experiment.

Spreadsheet of the Bank marketing data set

Tips for completing this tutorial

Here are some tips for successfully completing this tutorial.

Use the video picture-in-picture

Tip: Start the video, then as you scroll through the tutorial, the video moves to picture-in-picture mode. Close the video table of contents for the best experience with picture-in-picture. You can use picture-in-picture mode so you can follow the video as you complete the tasks in this tutorial. Click the timestamps for each task to follow along.

The following animated image shows how to use the video picture-in-picture and table of contents features:

How to use picture-in-picture and chapters

Get help in the community

If you need help with this tutorial, you can ask a question or find an answer in the Cloud Pak for Data Community discussion forum.

Set up your browser windows

For the optimal experience completing this tutorial, open Cloud Pak for Data in one browser window, and keep this tutorial page open in another browser window to switch easily between the two applications. Consider arranging the two browser windows side-by-side to make it easier to follow along.

Side-by-side tutorial and UI

Tip: If you encounter a guided tour while completing this tutorial in the user interface, click Maybe later.

Task 1: Open a project

You need a project to store the data and the AutoAI experiment. You can use an existing project or create a project.

From the Navigation Menu , choose Projects > View all projects
Open an existing project. If you want to use a new project:
1. Click New project.
2. Select Create an empty project.
3. Enter a name and optional description for the project.
4. Choose an existing object storage service instance or create a new one.
5. Click Create.
When the project opens, click the Manage tab and select the Services and integrations page.
1. On the IBM services tab, click Associate service.
2. Select your watsonx.ai Runtime instance. If you don't have a watsonx.ai Runtime service instance provisioned yet, follow these steps:
  1. Click New service.
  2. Select watsonx.ai Runtime.
  3. Click Create.
  4. Select the new service instance from the list.
3. Click Associate service.
4. If necessary, click Cancel to return to the Services & Integrations page.

For more information or to watch a video, see Creating a project.For more information on associated services, see Adding associated services.

Check your progress

The following image shows the new project.

Task 2: Build and train the model

preview tutorial video To preview this task, watch the video beginning at 00:08.

Now that you have a project, you are ready to build and train the model using AutoAI. Follow these steps to create the AutoAI experiment, review the model pipelines, and select a pipeline to save as a model:

Click the Assets tab in your project, and then click New asset > Build machine learning models automatically.
On the Create an AutoAI Experiment page, complete the basic fields:
1. Click the Resource hub sample panel.
2. Select Bank marketing sample data, and click Next. The project name and description will be filled in for you.
3. Confirm that the watsonx.ai Runtime service instance that you associated with your project is selected in the watsonx.ai Runtime Service Instance field.
Click Create.
In this sample AutoAI experiment, you will see that the Bank marketing sample data is already selected for your experiment.
Review the preset experiment settings. Based on the data set and the selected column to predict, AutoAI analyzes a subset of the data and chooses a prediction type and metric to optimize. In this case, the prediction type is Binary Classification, the positive class is Yes, and the optimized metric is ROC AUC & run time.
Click Run experiment. As the model trains, you see an infographic that shows the process of building the pipelines.

For a list of algorithms, or estimators, available with each machine learning technique in AutoAI, see: AutoAI implementation detail.
After the experiment run is complete, you can view and compare the ranked pipelines in a leaderboard.
You can click Pipeline comparison to see how they differ.
Click the highest ranked pipeline to see the pipeline details.
Click Save as, select Model, and click Create. This saves the pipeline as a model in your project.
When the model is saved, click the View in project link in the notification to view the model in your project. Alternatively, you can navigate to the Assets tab in the project, and click the model name in the Models section.

Check your progress

The following image shows the model.

Task 3: Promote the model to a deployment space and deploy the trained model

preview tutorial video To preview this task, watch the video beginning at 04:57.

Before you can deploy the model, you need to promote the model to a deployment space. Follow these steps to promote the model to a deployment space to deploy the model:

Click the Promote to deployment space icon .
Choose an existing deployment space. If you don't have a deployment space:
1. Click Create a new deployment space.
2. Provide a space name and optional description.
3. Select a storage service.
4. Select a machine learning service.
5. Click Create.
6. Click Close.
Select your new deployment space from the list.
Select the Go to the model in the space after promoting it option.
Click Promote.

Note: If you didn't select the option to go to the model in the space after promoting it, you can use the navigation menu to navigate to Deployments to select your deployment space and model.
With the model open, click New deployment.
1. Select Online as the Deployment type.
2. Specify a name for the deployment.
3. Click Create.
When the deployment is complete, click the deployment name to view the deployment details page.

Check your progress

The following image shows the new deployment.

Task 4: Test the deployed model

preview tutorial video To preview this task, watch the video beginning at 06:22.

Now that you have the model deployed, you can test that that online deployment using the user interface or through the watsonx.ai Runtime APIs. Follow these steps to use the user interface to test the model with new data:

Click the Test tab. You can test the deployed model from the deployment details page in two ways: test with a form or test with JSON code.

Click the JSON input tab, copy the following test data, and paste it to replace the existing JSON text:

{
   "input_data": [
      {
         "fields": [
               "age",
               "job",
               "marital",
               "education",
               "default",
               "balance",
               "housing",
               "loan",
               "contact",
               "day",
               "month",
               "duration",
               "campaign",
               "pdays",
               "previous",
               "poutcome"
            ],
         "values": [
               [
               27,
               "unemployed",
               "married",
               "primary",
               "no",
               1787,
               "no",
               "no",
               "cellular",
               19,
               "oct",
               79,
               1,
               -1,
               0,
               "unknown"
               ]
            ]
      }
   ]
}

Click Predict to predict whether a customer with the specified attributes is likely to sign up for a particular kind of account. The resulting prediction indicates that this customer has a high probability of not enrolling in the marketing promotion.
Click the X to close the Prediction results window.

Check your progress

The following image shows the results of testing the deployment. The values for your prediction might differ from the values in the following image.

Task 5: Create a batch job to score the model

Now that you have tested the deployed model with a single prediction, you can create a batch deployment to score multiple records at the same time.

Task 5a: Set up batch deployment

preview tutorial video To preview this task, watch the video beginning at 07:00.

For a batch deployment, you provide input data, also known as the model payload, in a CSV file. The data must be structured like the training data, with the same column headers. The batch job processes each row of data and creates a corresponding prediction. Follow these steps to upload the payload data to the deployment space:

Copy and paste the following text into a text editor, and save the file as bank-payload.csv.

age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome
30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown
33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure
35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure
30,management,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown
59,blue-collar,married,secondary,no,0,yes,no,unknown,5,may,226,1,-1,0,unknown
35,management,single,tertiary,no,747,no,no,cellular,23,feb,141,2,176,3,failure
36,self-employed,married,tertiary,no,307,yes,no,cellular,14,may,341,1,330,2,other
39,technician,married,secondary,no,147,yes,no,cellular,6,may,151,2,-1,0,unknown
41,entrepreneur,married,tertiary,no,221,yes,no,unknown,14,may,57,2,-1,0,unknown
43,services,married,primary,no,-88,yes,yes,cellular,17,apr,313,1,147,2,failure
39,services,married,secondary,no,9374,yes,no,unknown,20,may,273,1,-1,0,unknown
43,admin.,married,secondary,no,264,yes,no,cellular,17,apr,113,2,-1,0,unknown
36,technician,married,tertiary,no,1109,no,no,cellular,13,aug,328,2,-1,0,unknown
20,student,single,secondary,no,502,no,no,cellular,30,apr,261,1,-1,0,unknown
31,blue-collar,married,secondary,no,360,yes,yes,cellular,29,jan,89,1,241,1,failure
40,management,married,tertiary,no,194,no,yes,cellular,29,aug,189,2,-1,0,unknown
56,technician,married,secondary,no,4073,no,no,cellular,27,aug,239,5,-1,0,unknown
37,admin.,single,tertiary,no,2317,yes,no,cellular,20,apr,114,1,152,2,failure
25,blue-collar,single,primary,no,-221,yes,no,unknown,23,may,250,1,-1,0,unknown
31,services,married,secondary,no,132,no,no,cellular,7,jul,148,1,152,1,other

Click your deployment space in the navigation trail.
Click the Assets tab.
Drag the bank-payload.csv file into the side panel, and wait for the file to upload.

Check your progress

The following image shows the Assets tab in the deployment space.

Task 5b: Create the batch deployment

preview tutorial video To preview this task, watch the video beginning at 07:30.

To process a batch of inputs and have the output written to a file instead of displayed in real time, create a batch deployment job.

Go to the Assets tab in the deployment space.
Click the Overflow menu for your model, and choose Deploy.
For the Deployment type, select Batch.
Type a name for the deployment.
Choose the smallest hardware specification.
Click Create.

Check your progress

The following image shows batch deployment.

Task 5c: Create the batch job

preview tutorial video To preview this task, watch the video beginning at 07:44.

The batch job runs the deployment. To create the job, you specify the input data and the name for the output file. You can set up a job to run on a schedule, or run immediately. Follow these steps to create a batch job:

On the deployment page, click New job.
Specify a name for the job, and click Next.
Select the smallest hardware specification, and click Next.
Optional: Set a schedule, and click Next.
Optional: Choose to receive notifications, and click Next.
On the Choose data screen, select the Input data:
1. Click Select data source.
2. Select Data asset > bank-payload.csv.
3. Click Confirm.
Back on the Choose data screen, specify the Output file:
1. Click Add.
2. Click Select data source.
3. Ensure that the Create new tab is selected.
4. For the Name, type bank-output.csv.
5. Click Confirm.
Click Next for the final step.
Review the settings, and click Create and run to run the job immediately.

Check your progress

The following image shows the job details for the batch deployment.

Task 5d: View the output

preview tutorial video To preview this task, watch the video beginning at 08:42.

Follow these steps to review the output file from the batch job.

Click the job name to see the status.
When the status changes to Completed, click your deployment space name in the navigation trail.
Click the Assets tab.
Click the bank-output.csv file to review the prediction results for the customer information that is submitted for batch processing. For each case, the prediction returned these customers are unlikely to subscribe to the bank promotion.

Check your progress

The following image shows the results of the batch deployment job.

Next steps

Now you can use this data set for further analysis. For example, you or other users can do any of these tasks:

Additional resources

Try these additional tutorials to get more hands-on experience with building models using AutoAI:
- Build a univariate time series experiment
- Build a text analysis experiment
Try these other methods to build models:
View more videos
Find sample data sets to gain hands-on experience building models in the Resource hub

Parent topic: Quick start tutorials