KNN Visualizations

The following tables and options are available for KNN visualizations.

Model Evaluation panel

For classification models, the Model Evaluation panel shows a bar graph showing the overall prediction accuracy, or proportion of correct predictions, and a table containing a set of evaluation statistics (if the prediction accuracy is exactly 0, the graph will not be shown). The evaluation statistics include the overall accuracy and a series of figures based on treating each category of the target field as the category of interest (or positive response) and averaging the calculated statistics across categories with weights proportional to the observed proportions of instances in each category. The weighted measures include true and false positive rates (TPR and FPR), precision, recall, and the F1 measure, which is the harmonic mean of precision and recall. When weighted in this manner (based on observed proportions), weighted true positive rate and weighted recall are the same as overall accuracy.

For regression models, the panel shows a table with the mean squared error (MSE) and root mean squared error (RMSE).

For models involving only identification of the k nearest neighbors, this panel will not appear.

Model Information table

This table contains information on the type of model fitted (classification, regression, or clustering), identifies the target field (if specified), the number k of nearest neighbors, and the distance measure used.

Predictor Importance chart

Shown for models with a target, this chart displays bars representing the predictors in descending order of relative importance for predicting the target, as determined by a variance-based sensitivity analysis algorithm. The values for each predictor are scaled so that they add to 1.

Predictor or K Selection Error Log chart

Shown for analyses with a target where automatic selection of the number of nearest neighbors, or k, is specified, feature selection is specified, or both. When only k selection is in effect, a line chart shows the error measure (misclassification rate for classification models or mean squared error for regression models) on the vertical axis vs. k on the horizontal axis. When only feature selection is in effect, the error measure is plotted against sequential models, each adding an additional feature. When both k and feature selection are in effect, a multiple-line chart shows the error measure plotted against sequential models, with a separate line for each value of k.

Confusion Matrix (Classification Table)

The confusion matrix or classification table contains a cross-classification of observed by predicted labels or groups. The numbers of correct predictions are shown in the cells along the main diagonal. Correct percentages are shown for each row, column and overall:

  • The percent correct for each row shows what percentage of the observations with that observed label were correctly predicted by the model. If a given label is considered a target label, this is known as sensitivity, recall or true positive rate (TPR). In a 2 x 2 confusion matrix, if one label is considered the non-target label, the percentage for that row is known as the specificity or true negative rate (TNR).
  • The percent correct for each column shows the percentage of observations with that predicted label that were correctly predicted. If a given predicted label is considered a target label, this is known as precision or positive predictive value (PPV). For a 2 x 2 confusion matrix, if one label is considered the non-target label, the percentage for that column is known as the negative predictive value (NPV).
  • The percent correct at the bottom right of the table gives the overall percentage of correctly classified observations, known as the overall accuracy.

Predictor Space Scatterplot

Assuming at least two features or predictors are input, the data points are plotted in a two-dimensional subspace of the predictors, using the two most important predictors if predictor importance has been used to compute weighted distances. If predictor importance has not been used, including models without a target, where only identification of the k nearest neighbors for each instance is performed, the first two features define the plot axes.

If a field identifying focal records was used in fitting the model, these records will be highlighted and their nearest neighbors linked via lines. Clicking on a point will clear identified focal records and mark the clicked point as a focal record. Ctrl-click will add the clicked point to the set of identified focal records. This interactivity relates this chart to the Peers charts and the K Nearest Neighbors and Distances table.

Peers chart

A parallel coordinates plot showing the target and up to five predictors, or up to six features if there is no target, with focal records identified and accompanied by their k nearest neighbors. This allows you to see the directions of differences between focal records and their nearest neighbors. This chart is shown only when at least one focal record is identified.

K Nearest Neighbors and Distances table

If no focal records have been identified in the Predictor Space scatter plot, this table shows one row for each instance or record in the data, displaying the record ID, along with the IDs for the k nearest neighbors and each of their distances to that record. If one or more focal records have been identified in the Predictor Space scatter plot, only rows for those records are shown.

Next steps

Like your visualization? Why not deploy it? For more information, see Deploy a model.