Selecting an AutoAI model
AutoAI automatically prepares data, applies algorithms, and attempts to build model pipelines best suited for your data and use case. This topic describes how to evaluate the model pipelines.
During AutoAI training, your data set is split to a training part and a hold-out part. The training part is used by the AutoAI training stages to generate the AutoAI model pipelines and cross-validation scores used to rank them. After AutoAI training, the hold-out part is used for the resulting pipeline model evaluation and computation of performance information such as ROC curves and confusion matrices, shown in the leaderboard. The training/hold-out split ratio is 90/10.
As the training progresses, you are presented with a dynamic tree infographic and leaderboard. The tree infographic shows the sequences of AutoAI training stages (pre-processing, model selection, feature engineering, and HPO) that create the resulting model pipelines, which are shown as leaves of the tree. The leaderboard contains model pipelines ranked by cross-validation scores.
Hovering over each pipeline name on a leaf of the tree infographic displays the pipeline structure. Each AutoAI model pipeline is defined by a sequence of data transformations that transform the initial data set, ending with an estimator algorithm that generates predictions. The sequence of data transformations consists of a pre-processing transformer and a sequence of data transformers, if feature engineering was performed for this pipeline. The estimator is determined by model selection and HPO steps during AutoAI training.
View the pipeline transformations
Hover over a pipeline name in the infographic to view the transformations for that pipeline.
See Implementation details to review the technical details for the pipelines.
View the leaderboard
Each model pipeline is scored for a variety of metrics and then ranked. The default ranking metric for binary classification models is the area under the ROC curve, for multi-class classification models is accuracy, and for for regression models is the root mean-squared error (RMSE). The highest-ranked pipelines are displayed in a leaderboard, so you can view more information about them. The leaderboard also provides the option to save select model pipelines after reviewing them.
You can evaluate the pipelines as follows:
- Click a pipeline in the leaderboard to view more detail about the metrics and performance.
- Click Compare to view how the top pipelines compare.
- Sort the leaderboard by a different metric.
Viewing the confusion matrix
One of the details you can view for a pipeline for a classification experiment is a Confusion matrix.
The confusion matrix is based on the holdout data, which is the portion of the training dataset not used for training the model pipeline but only used to measure its performance on data that was not seen during training.
In a binary classification problem with a positive class and a negative class, the confusion matrix summarizes the pipeline model’s positive and negative predictions in four quadrants depending on their correctness with respect to the positive or negative class labels of the holdout dataset.
For example, the Bank sample experiment seeks to identify customers that will take promotions offered to them. The confusion matrix for the top-ranked pipeline is:
The positive class is ‘yes’ (meaning a user will take the promotion), so you can see that the measurement of true negatives, that is, customers the model predicted correctly they would refuse their promotions, is fairly high.
Click the items in the navigation menu to view other details about the selected pipeline. For example, Feature importance shows which data features contribute most to your prediction output.
Save a pipeline as a model
When you are satisfied with a pipeline, click Save model to save the candidate as a model to your project so you can test and deploy it. A notification confirms that you saved the model to the space associated with the project. Click the space to configure, train, test, and deploy the model.
Follow the steps in Deploying an AutoAI model for details on how to deploy and make predictions with your model.