GLE Visualizations

The following tables and options are available for GLE visualizations.

Model Evaluation panel

For classification models, the Model Evaluation panel shows a bar graph showing the overall prediction accuracy, or proportion of correct predictions, and a table containing a set of evaluation statistics (if the prediction accuracy is exactly 0, the graph will not be shown).

The evaluation statistics include the overall accuracy and a series of figures based on treating each category of the target field as the category of interest (or positive response) and averaging the calculated statistics across categories with weights proportional to the observed proportions of instances in each category. The weighted measures include true and false positive rates (TPR and FPR), precision, recall, and the F1 measure, which is the harmonic mean of precision and recall. When weighted in this manner (based on observed proportions), weighted true positive rate and weighted recall are the same as overall accuracy.

For regression models, the panel shows a bar graph displaying the R2 as a measure of prediction accuracy, and a table with R2, mean squared error (MSE) and root mean squared error (RMSE).

Model Information table

Displays the target name and the number of features or predictors included in the model.

This table contains information on the type of model and how the model was fitted, so you can make sure that the model you have is what you intended. It contains information on input settings such as the target field, the probability distribution and link function, how the generalized linear model’s scale (or dispersion) parameter has been handled, whether a model selection or regularization method has been used, the number of features or predictors input and the number in the final model.

Each of the four information criterion measures (Akaike Information Criterion – AIC, Bayesian Information Criterion – BIC, Finite Sample Corrected AIC – AICc and Consistent AIC – CAIC) can be used to compare models with different numbers of parameters when fitted to the same target variable with the same data. They differ in the relative penalties assigned to the number of parameters in the model and in all cases, smaller values are preferred. Like the log-likelihood value on which they are largely based, these measures are functions of the target variable values, so unlike R2 measures they cannot be used to compare models for different targets or different sets of data.

Records Summary table

This table shows you how many records were used to fit the model and whether any records were excluded due to missing data. If frequency weighting is in effect, it shows information about both unweighted and weighted numbers of records. If events/trials input format is used for a binary target, only the number of physical observations input is shown.

Predictor Importance chart

This chart displays bars representing the predictors in descending order of relative importance for predicting the target, as determined by a variance-based sensitivity analysis algorithm. The values for each predictor are scaled so that they add to 1. Hovering over the bar for a particular predictor shows a table with its importance value and descriptive statistics about the predictor.

Tests of Model Effects table

This table gives one or two Wald chi-square tests for each term in the model, including effects representing multiple parameters for categorical predictors. The Sig. column provides the probability of observing a χ2 statistic as large or larger than the one observed in a sample if sampling from a population where the predictor has no effect, and can be used to identify “statistically significant” predictors. In large samples predictors may be identified as statistically significant even though in practical terms they are not important.

The two types of tests for each effect are known as Type I and Type III tests. Type III are the default. In models with only the main effects of all predictors included these will produce the same p values as are given for the parameter estimates for single degree of freedom effects. In such models they assess the effect of adding a given predictor after all others in the model. They are unique for a given model. Type I tests are equivalent to testing each effect added to the model after those entered previously. They thus depend on the order of entry of effects in the model and are therefore not unique for a given model.

If a regularization method (Lasso, ridge regression or Elastic Net) has been used to the fit the model, or a special estimation algorithm designed for models with very large numbers of parameters has been used, this table will not appear.

Parameter Estimates table

This table displays the parameter estimates (also known as regression coefficients, beta coefficients or beta weights) for the fitted model in the logit or logistic transformation metric, along with measures of sampling variation, tests of statistical significance and confidence intervals. These coefficients combine to form the linear predictor model, which typically consists of a constant or intercept coefficient plus each regression coefficient multiplied by its predictor value, to produce the linear predictor values. For models other than those with an identity link (a linear model), these linear predictor values are back-transformed via the inverse of the link transformation to produce predicted values for scale targets or predicted probabilities for binary targets.

Exponentiated values of the coefficients and confidence intervals for these values may also appear for models such as logistic regression, where they are often referred to as (estimated) odds ratios. These values are typically considered more easily interpreted than the coefficients for the linear predictor in a logistic regression. Confidence intervals for odds ratios are not symmetric, possibly ranging between 0 and infinity. Interval bounds for predictors that are statistically significant according to the Wald tests shown in the table at a Type I or α error level corresponding to the specified confidence interval coverage level (e.g., α=0.05 for 95% confidence intervals) will exclude the value 1.

If a regularization method (Lasso, ridge regression or Elastic Net) has been used to the fit the model, or a special estimation algorithm designed for models with very large numbers of parameters has been used, only the estimated regression coefficients and possibly exponentiated values will be displayed.

Residuals by Predicted chart

This chart is a scatterplot of standardized deviance residuals vs. predicted linear predictor values. For binary data it will only be shown when data are input via the events/trials format, where events is the number of “successes” or observations in the category of interest in the response and trials is the number of possible successes or outcomes. For a model that is appropriate for the data, you expect to see no apparent pattern or relationship between the plotted residuals and predicted values, essentially residual values randomly distributed around 0, and no values that are too far from the 0 line in magnitude.

This chart is not shown for binary data entered in 0/1 response format because such a plot would always show systematic patterns. Also, if a regularization method (Lasso, ridge regression or Elastic Net) has been used to the fit the model, or a special estimation algorithm designed for models with very large numbers of parameters has been used, this table will not appear.

Confusion Matrix (Classification Table)

The confusion matrix or classification table contains a cross-classification of observed by predicted labels or groups, where predictions are based on predicted probabilities from the model and the specified probability threshold (typically, but not always 0.50). If events/trials input has been used, the labels on the categories will always indicate Events and Non-Events rather than specific values of a target variable. The numbers of correct predictions are shown in the two cells along the main diagonal. Correct percentages are shown for each row, column and overall:

  • The percent correct for the target category row shows what percentage of the observed target category observations were correctly predicted by the model, which is commonly known as sensitivity, recall or true positive rate (TPR).
  • The percent correct for the reference or non-response category row shows the percentage of the non-response observations correctly predicted by the model, which is known as specificity or true negative rate (TNR).
  • The percent correct for the column with predicted positive responses gives the percentage of observations predicted by the model to be positive responses that are actually positive, known as precision or positive predictive value (PPV).
  • The percent correct for the column with predicted negative or non-responses gives the percentage of observations predicted to be non-responses that are actually negative, known as the negative predictive value (NPV).
  • The percent correct at the bottom right of the table gives the overall percentage of correctly classified observations, known as the overall accuracy.

ROC Curve chart

For binary outcome predictions, the ROC (Receiver Operating Characteristic) Curve chart plots the true positive rate (TPR) on the vertical axis against the false positive rate (FPR) on the horizontal axis, as the threshold for positive classification is varied across the probability range. The true positive rate is the proportion of positive outcomes that are correctly predicted, also known as sensitivity, recall, or probability of detection. The false positive rate is the proportion of negative outcomes that are falsely predicted to be positive, also known as one minus specificity (1-specificity), fall-out, probability of false alarm, or false discovery rate (FDR).

Since the predicted probabilities from a binary classification model such as a logistic regression fall in the open interval between 0 and 1, If the classification threshold is set to 1, no true or false positives would occur, so the curve begins at the (0,0) point at the lower left, and if the threshold is set to 0, all observations would be predicted to be responses, so both the true positive and false positive rates would be 1, so the curve ends at the (1,1) point at the upper right. Intermediate threshold values will produce different combinations of true and false positive rates. The diagonal line running from the lower left to the upper right of the chart represents the expected curve if classification is performed randomly, assigning positive or negative response labels to all observations with various fixed probabilities.

While a theoretical ROC curve is a continuous function that varies over the 0 to 1 critical probability threshold range in infinitely small increments, the nonparametric ROC curve plotted in the IBM SPSS Spark Machine Learning Library will be a finite set of points connected by straight line interpolations. The points correspond to critical probability thresholds dividing the 0 to 1 probability range into 400 equally-spaced intervals of 0.0025, with points defining any intervals not containing predicted probabilities for the data removed. The number of plotted points will thus be the minimum of 401 and one more than the number of distinct predicted probabilities found in the data, which is typically the number of distinct covariate patterns.

Hovering over each plotted point will reveal a pop-up tool tip that shows the coordinate point values for false and true positive rates, as well as the probability threshold for classification. These values illustrate the fact that the ROC curve summarizes in a graphical form results from a number of confusion matrices or classification tables, each based on a different probability threshold for classifying observations.

The area under the ROC curve (AUC) is a popular summary measure of classification performance for binary classifiers. The diagonal line representing random classification divides the ROC curve space in half and corresponds to an AUC of 0.50. A model that is able to perfectly classify responses and non-responses would have an AUC of 1.00, though this is seldom seen in practice and results in non-existence of maximum likelihood estimates of one or more parameters in a logistic regression model. Typical ROC curves will have AUC values between 0.50 and 1.00. Any ROC curve with an AUC values less than 0.50 could be transformed into a curve with an AUC value above 0.50 simply by reversing the decision or group assignment rule.

While no single measure can capture all aspects of the performance of a classifier, the area under the ROC curve measure has some attractive properties as a summary measure. One is that it summarizes the performance of the classifier over the whole range of possible thresholds, not requiring the analyst to choose a single classification cut point. Also, the AUC gives the probability that a given classifier will rank a randomly selected positive observation higher than a randomly selected negative observation, which connects it to the Wilcoxon-Mann-Whitney sum of ranks test. It can also be standardized to where the chance level of 0.50 becomes 0 and the maximum possible value remains 1.00 by calculating G=2AUC-1, which is also known as the Gini coefficient, and is equal to twice the area under the curve and above the chance classification level reference line.

One important advantage of the AUC measure over some other measures, such as overall classification accuracy, is that the ROC curve is based on two quantities (true and false positive rates) that are calculated from distinct parts of the observed data, actual positive and negative observations. This results in the ROC curve and the AUC measure being insensitive to changes in the relative proportions of positive and negative observations. Overall classification accuracy, on the other hand, in addition to requiring specification of a single cut point, can be highly dependent upon the relative proportions of positive and negative observations.

There are dangers however in using the AUC to compare the performance of different classifiers. For example, if the ROC curves cross, it is possible for one classifier to produce a higher AUC value but to have inferior performance at critical probability thresholds that would be most useful in practice. Also, although the ROC curve and the area under it are insensitive to the relative proportions of positive and negative observations, they are dependent upon the distribution of the prediction scores. This implies the use of different assumptions of costs of misclassifications for different classifiers, making comparisons potentially akin to comparing measurements in different units.

Next steps

Like your visualization? Why not deploy it? For more information, see Deploy a model.