Selecting an AutoAI model

AutoAI automatically prepares data, applies algorithms, and attempts to build model pipelines best suited for your data and use case. This topic describes how to evaluate the model pipelines.

During AutoAI training, your data set is split to a training part and a hold-out part. The training part is used by the AutoAI training stages to generate the AutoAI model pipelines and cross-validation scores used to rank them.  After AutoAI training, the hold-out part is used for the resulting pipeline model evaluation and computation of performance information such as ROC curves and confusion matrices, shown in the leaderboard. The training/hold-out split ratio  is  90/10. 

As the training progresses, you are presented with a dynamic tree infographic and leaderboard. The tree infographic shows the sequences of AutoAI training stages (pre-processing, model selection, feature engineering, and HPO) that create the resulting model pipelines, which are shown as leaves of the tree. The leaderboard contains model pipelines ranked by cross-validation scores.

Hovering over each pipeline name on a leaf of the tree infographic displays the pipeline structure.  Each AutoAI model pipeline is defined by a sequence of data transformations that transform the initial data set, ending with an estimator algorithm that generates predictions. The sequence of data transformations consists of a pre-processing transformer and a sequence of data transformers, if feature engineering was performed for this pipeline. The estimator is determined by model selection and HPO steps during AutoAI training.

View the pipeline transformations

Hover over a pipeline name in the infographic to view the transformations for that pipeline.

Pipeline transformation for AutoAI models

See Implementation details to review the technical details for the pipelines.

View the leaderboard

Each model pipeline is scored for a variety of metrics and then ranked. The default ranking metric for binary classification models is the area under the ROC curve, for multi-class classification models is accuracy, and for for regression models is the root mean-squared error (RMSE). The highest-ranked pipelines are displayed in a leaderboard, so you can view more information about them. The leaderboard also provides the option to save select model pipelines after reviewing them.

Leaderboard AutoAI models

You can evaluate the pipelines as follows:

  • Click a pipeline in the leaderboard to view more detail about the metrics and performance.
  • Click Compare to view how the top pipelines compare.
  • Sort the leaderboard by a different metric.

Expanding an AutoAI pipeline

Save a pipeline as a model

When you are satisfied with a pipeline, click Save model to save the candidate as a model to your project so you can test and deploy it. A notification confirms that you saved the model to the space associated with the project. Click the space to configure, train, test, and deploy the model.

Next step

Follow the steps in Deploying an AutoAI model for details on how to deploy and make predictions with your model.