Logistic Visualizations

The following tables and options are available for Logistic visualizations.

Model Information table

Displays the target, the type of model, and the number of features or predictors in the model.

Parameter Estimates table

Displays the maximum likelihood estimates for the model parameters, or their values in the last iteration for models where estimates fail to converge, along with exponentiated values of estimates, also known as odds ratios. One set of estimates is presented for each logit in the model equation, where the number of logits is one fewer than the number of response categories. Each set of estimates involves comparing a response category to a designated reference category. The estimates are thus interpreted as changes in log odds of responding in the specified category relative to the reference category when increasing by one unit on the predictor.

Redundant parameters associated with one or more categories of categorical predictors or interactions involving categorical predictors, or other predictors deemed linearly dependent on preceding terms in the model, are set to 0.

Nominal Regression Option

Predictor Importance chart

This chart displays bars representing the predictors in descending order of relative importance for predicting the target, as determined by a variance-based sensitivity analysis algorithm. The values for each predictor are scaled so that they add to 1.

Case Processing Summary table

Shows the numbers of cases or records used in the analysis, labeled as Valid, and the number excluded due to missing data, labeled as Missing. Among the Valid cases, the numbers and percentages are shown for each category of the target field or dependent variable, and for each of the categorical predictor fields specified. The last line of the table shows the number of subpopulations, or distinct combinations of values of predictor fields, in the data.

Step Summary table

Displayed for models built with stepwise methods, this table shows the effect(s) entered or removed at each step in the model-building process, along with a set of model fitting criteria and a chi-square test of the null hypothesis that the parameter(s) added or removed at that step are 0 in the population. The model fitting criteria include the -2 log-likelihood, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC). All of these metrics are presented in smaller-is-better formats. While the -2 log-likelihood value cannot increase when adding an additional term to an existing model, the AIC and BIC measures can increase due to the penalties they add for the numbers of parameters fitted. These are designed to allow comparisons of models of different sizes on the same target and records.

Iteration History table

Displays the values of all non-redundant parameter estimates and the -2 log-likelihood function at each iteration in the fitting of the final model. The values at Iteration 0 are the starting values. Also displays the number of step-halvings, if any, used at each iteration. Step-halving is used in unusual situations where the -2 log-likelihood objective function decreases from one iteration to the next using the standard method. The movement from the prior to the current iteration is cut in half until the objective function change is in the correct direction.

Model Fitting Information table

Displays model fitting criteria for the null and final models and a likelihood ratio chi-square test of the null hypothesis that all parameters added to the null model in the final model are 0 in the population. The model fitting criteria include the -2 log-likelihood, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC). All of these metrics are presented in smaller-is-better formats. While the -2 log-likelihood value cannot increase when adding an additional term to an existing model, the AIC and BIC measures can increase due to the penalties they add for the numbers of parameters fitted. These are designed to allow comparisons of models of different sizes on the same target and records.

Goodness-of-Fit table

Provides Pearson and Deviance goodness-of-fit chi-square tests of the null hypothesis that the data were generated by the specified model. These tests are generally not useful if there are many subpopulations or covariate patterns with small numbers of observations.

Pseudo R-Square table

Displays three measures designed to provide R2-like assessments of how well the model predicts the outcome. Since the target in a logistic regression is not continuous and model estimation is based on maximum likelihood rather than minimizing a sum of squared errors, there is no precise equivalent to the R2 from a linear model with an intercept, which gives the squared correlation between observed and predicted values, as well as the proportion of variation in the dependent associated with the predictions from the model, and the proportional reduction in error achieved by using the model predictions instead of predicting the mean value of the target. The widely-used pseudo-R2 measures for logistic regression models generally approach things from the perspective of accounting for “variance” in some sense or proportional reduction in error variation achieved by adding predictors to a model, rather than a squared correlation between observed and predicted values.

For a model fit to N observations, the Cox and Snell measure takes the Nth root of the squared ratio of the likelihood of the intercept-only model to the likelihood of the full model and subtracts it from 1. This statistic cannot attain a value of 1, so Nagelkerke proposed dividing it by its maximum possible value, which is 1 minus the Nth root of the squared likelihood of the intercept-only model. This adjustment results in a measure that ranges from 0 to 1 (and is also known as Cragg & Uhler’s measure). McFadden’s measure subtracts the ratio of estimated log-likelihoods from the full model and the intercept-only model from 1. This measure tends to produce the lowest values among the three measures offered here.

Measures of Monotone Association table

Available only with a binary target, displays four measures of monotone association between target outcome and predicted probabilities. All of these measures make use of the concept of concordance and discordance between the members of a pair of observations, one member of which takes on the positive outcome value and the other which takes on the negative outcome value. A concordant pair is defined as a pair in which the member with the positive outcome has a higher predicted probability based on the model. A discordant pair is defined as a pair in which the member with the positive outcome has a lower predicted probability based on the model. Pairs in which both members have the same predicted probability based on the model are defined as tied. If there are N+ observations with positive outcomes and N- outcomes with negative outcomes, there are N+N- unique pairs.

The table first presents the numbers and percentages of concordant, discordant, tied, and total pairs. Three of the four measures shown use the number of concordant pairs minus the number of discordant pairs as their numerators. Somers’ D divides this difference by the total number of pairs. Goodman and Kruskal’s Gamma divides it by the sum of the concordant and discordant pairs, which is the same as dividing it by the total number of pairs that are not tied, so this measure is at least as large as Somers’ D. Kendall’s Tau-a divides the difference by N(N-1)/2, where N is the total number of observations, resulting in values that are generally substantially smaller than either of the first two measures. The Concordance Index C adds one-half the number of tied pairs to the number of concordant pairs, and divides this sum by the total number of pairs, giving essentially the proportion of concordant pairs. This value tends to be substantially higher than any of the other measures shown.

Likelihood Ratio Tests table

Displays model fitting criteria and a likelihood ratio chi-square test for each term in the model. The model fitting criteria (the -2 log-likelihood, the Akaike Information Criterion (AIC), and the Bayesian Information Criterion (BIC), all of which are presented in smaller-is-better forms) reflect a model including all terms in the final model except the one listed on a given line. The likelihood ratio tests thus test the null hypothesis that all parameters included in a given effect are 0 in the population. While the -2 log-likelihood value cannot increase when adding an additional term to an existing model, the AIC and BIC measures can increase due to the penalties they add for the numbers of parameters fitted. These are designed to allow comparisons of models of different sizes on the same target and records.

Note that since categorical predictors are represented by indicator or dummy variables for each category and a generalized inverse is used in estimation instead of reparameterizing models to full rank, the intercept term is contained in any model involving at least one categorical predictor and the same overall model is fitted with or without the intercept term. The chi-square value for the intercept is thus 0 on 0 degrees of freedom, and the -2 log-likelihood of the reduced model without the intercept is the same as that for the final model. If interaction terms involving categorical predictors are included along with the main effects for the categorical predictors, the same type of containment relationship exists, and the reduced-model statistics for the contained effects are again the same as those for the full model (i.e., the -2 log-likelihood, AIC, and BIC are the same, and the chi-square statistic is 0 on 0 degrees of freedom).

Parameter Estimates table

Displays the maximum likelihood estimates for the model parameters, or their values in the last iteration for models where estimates fail to converge, along with estimated standard errors, Wald chi-square statistics (squared ratios of parameter estimates to their standard errors), degrees of freedom, significance values, exponentiated values of estimates, also known as odds ratios, and confidence interval bounds for the exponentiated estimates or odds ratios.

One set of estimates is presented for each logit in the model equation, where the number of logits is one fewer than the number of response categories. Each set of estimates involves comparing a response category to a designated reference category. The estimates are thus interpreted as changes in log odds of responding in the specified category relative to the reference category when increasing by one unit on the predictor.

Redundant parameters associated with one or more categories of categorical predictors or interactions involving categorical predictors, or other predictors deemed linearly dependent on preceding terms in the model, are set to 0 and show 0 degrees of freedom and missing values for all other statistics.

If a user-defined value or the ratio of the Pearson or Deviance goodness-of-fit statistic to its degrees of freedom is used as a dispersion adjustment for the estimated asymptotic covariance matrix (typically this involves a value larger than 1, adjusting for overdispersion relative to the theoretical value of 1), the standard errors, Wald tests and confidence intervals will incorporate this adjustment.

Asymptotic Correlation Matrix

Displays correlations among estimated parameters of the final model. Values associated with pairs of estimates where at least one parameter in the pair is redundant are shown as dots for missing values, since no estimates can be derived. Values with very high magnitudes (i.e., close to -1 or 1) may be signs of harmful levels of collinearity among predictors. Since the correlation coefficients are scaled by the estimated variances of the parameters involved, dispersion adjustments to the asymptotic covariance matrix do not affect their values.

Asymptotic Covariance Matrix

Displays covariances among estimated parameters of the final model. Values associated with pairs of estimates where at least one parameter in the pair is redundant are shown as dots for missing values, since no estimates can be derived. Dispersion adjustments based on a user-defined value or based on the ratio of the Pearson or Deviance goodness-of-fit statistic to its degrees of freedom are reflected in this matrix (the naïve matrix assuming the theoretical value of 1 is multiplied by the adjustment value). Typically this adjustment value is greater than 1, reflecting empirical overdispersion relative to theoretical assumptions.

Classification table

Displays a cross-classification of observed by predicted labels or groups, where predictions are based on predicted probabilities from the model. The numbers of correct predictions are shown in the two cells along the main diagonal. Correct percentages are shown for each row, column and overall:

  • The percent correct for the target category row shows what percentage of the observed target category observations were correctly predicted by the model, which is commonly known as sensitivity, recall or true positive rate (TPR).
  • The percent correct for the reference or non-response category row shows the percentage of the non-response observations correctly predicted by the model, which is known as specificity or true negative rate (TNR).
  • The percent correct for the column with predicted positive responses gives the percentage of observations predicted by the model to be positive responses that are actually positive, known as precision or positive predictive value (PPV).
  • The percent correct for the column with predicted negative or non-responses gives the percentage of observations predicted to be non-responses that are actually negative, known as the negative predictive value (NPV).
  • The percent correct at the bottom right of the table gives the overall percentage of correctly classified observations, known as the overall accuracy.

Logistic Regression Option

Predictor Importance chart

This chart displays bars representing the predictors in descending order of relative importance for predicting the target, as determined by a variance-based sensitivity analysis algorithm. The values for each predictor are scaled so that they add to 1.

Case Processing Summary table

Shows the numbers and percentages of unweighted cases or records used in the analysis, those excluded due to missing data, and in total.

Dependent Variable Encoding table

Displays the original values or labels of the target and the internal values (0 or 1) used in model estimation and scoring.

Categorical Variable Codings table

If any categorical features or predictor variables are specified, this table shows the numbers of observations at each level and how each level is represented in computations by one or more coded numeric features. The coded values would be the values used when scoring observations using the regression coefficients from the logistic regression model.

Iteration History table

Appearing twice if requested, displays the -2 log-likelihood objective function and model parameter estimates values at each iteration in the estimation process for fitting the null model (Step 0), and then for one or more models with predictors, depending upon whether stepwise modeling is specified. Parameter estimates that grow to large magnitudes over iterations are signs of likely failure to converge.

Classification table

Displays a cross-classification of observed by predicted labels or groups, where predictions are based on predicted probabilities from the model. The numbers of correct predictions are shown in the two cells along the main diagonal. Correct percentages are shown for each row, and overall:

  • The percentage correct for the target category row shows what percentage of the observed target category observations were correctly predicted by the model, which is commonly known as sensitivity, recall or true positive rate (TPR).
  • The percentage correct for the reference or non-response category row shows the percentage of the non-response observations correctly predicted by the model, which is known as specificity or true negative rate (TNR).

The percentage correct at the bottom right of the table gives the overall percentage of correctly classified observations, known as the overall accuracy.

This table will be shown twice. The first instance will be for the null model, the second for the model or models fitted with predictors. The second instance will show multiple 2 x 2 tables stacked in a larger table if stepwise model fitting has been used and results for each step are specified. By default, classification is based on a .5 cut point for predicted probabilities. If a different cut point has been specified, that value is used.

Variables in the Equation table

Appears twice, once at Step 0 for the null model, and once for steps involving fitting models with predictors. Shows estimated regression coefficients (B), standard errors, Wald chi-square test statistics, degrees of freedom, significance or p values, odds ratios (Exp(B) column), and confidence interval values for odds ratios, for each predictor or coded categorical predictor in the model. If a stepwise model-building method has been used and output for each step has been specified, results are presented for all steps in the process. Large B coefficients accompanied by very large standard errors and small Wald statistics may indicate problems in model fitting warranting attention.

Variables not in the Equation table

Appears at Step 0, for the null model, and if a stepwise method is specified, appears a second time, summarizing any stepwise fitting of predictors. At Step 0, displays Score chi-square tests for each parameter and for each group of parameters for categorical predictors with multiple degrees of freedom, testing the null hypothesis that the parameter(s) would be 0 in the population when added to the current null model.

If shown a second time for stepwise models, at each step a set of Score chi-square tests for each parameter and for each group of parameters for categorical predictors with multiple degrees of freedom is shown, testing the null hypothesis that the parameter(s) would be 0 in the population when added to the current model. At steps beyond the first entry of predictors, an omnibus test for all predictors not yet included in the model is shown.

Omnibus Tests of Model Coefficients table

For direct entry models, this table summarizes only a single step, where all specified predictors have been added to the null model. It presents a likelihood-ratio chi-square statistic, degrees of freedom, and significance value for testing the null hypothesis that population values of all predictor parameters added to the null model are 0. If a stepwise method is used, the Step line at each step presents a test for the parameter(s) added or removed at that step, while the Model line presents a test for the full model fitted at that step. Chi-square values for Steps will be shown as negative when removing effects.

Model Summary table

Displays one line for each step involving addition or removal of predictors, containing the current -2 log-likelihood value and two pseudo-R2 measures designed to provide R2-like assessments of how well the model predicts the outcome. Since the target in a logistic regression is not continuous and model estimation is based on maximum likelihood rather than minimizing a sum of squared errors, there is no precise equivalent to the R2 from a linear model with an intercept, which gives the squared correlation between observed and predicted values, as well as the proportion of variation in the dependent associated with the predictions from the model, and the proportional reduction in error achieved by using the model predictions instead of predicting the mean value of the target. The widely-used pseudo-R2 measures for logistic regression models generally approach things from the perspective of accounting for “variance” in some sense or proportional reduction in error variation achieved by adding predictors to a model, rather than a squared correlation between observed and predicted values.

For a model fit to N observations, the Cox & Snell measure takes the Nth root of the squared ratio of the likelihood of the intercept-only model to the likelihood of the full model and subtracts it from 1. This statistic cannot attain a value of 1, so Nagelkerke proposed dividing it by its maximum possible value, which is 1 minus the Nth root of the squared likelihood of the intercept-only model. This adjustment results in a measure that ranges from 0 to 1 (and is also known as Cragg & Uhler’s measure).

Hosmer and Lemeshow Test table

This goodness-of-fit statistic is often more robust than the traditional Pearson and likelihood-ratio goodness-of-fit statistics used in binary logistic regression, particularly for models with continuous covariates and studies with small sample sizes. It is based on grouping cases into “deciles of risk” based on predicted probabilities from the fitted model and comparing the observed and expected counts in each target response category within each decile of risk. It is designed for situations in which each observation has a distinct predicted probability and is in all cases based on approximations, even in large samples. When there are tied predicted probabilities, different algorithms for resolving ties when creating deciles of risk may yield different groupings and therefore different chi-square test statistics and significance values associated with them. If a stepwise method has been used and results at each step specified, a set of tests is offered, one for the model at each step.

Contingency Table for Hosmer and Lemeshow Test

For each fitted model with at least one predictor, shows the contingency table produced by grouping cases or records into deciles of risk based on predicted probabilities from the fitted model. Each contingency table shows observed and expected counts for each target When there are tied predicted probabilities, different algorithms for resolving ties when creating deciles of risk may yield different groupings response category, as well as the total number of observed values grouped into that decile of risk. Note that unless there is at least one predictor with many distinct values, this table may not have the expected ten rows, one for each decile of risk, due to tied predicted probabilities. When there are tied predicted probabilities, different algorithms for resolving ties when creating deciles of risk may yield different groupings.

Correlation Matrix

Shows correlations among estimated regression coefficients at each step in the model fitting. Very high correlations among predictor coefficients indicate possible instability in estimation and may warrant attention.

Model if Term Removed table

At each stage in stepwise fitting of a model where results have been requested at each step, this table shows the log-likelihood for a model removing the specified term, the change in -2 log-likelihood associated with removal of that term, which gives a likelihood-ratio (LR) chi-square statistic, along with the associated degrees of freedom and significance value.

Casewise List table

Provides a listing of cases or records with studentized deviance residuals greater than two in absolute value. For each identified case or record, displays the case or record ID number, the actual observed target group value, the predicted probability from the model, the predicted target group, the raw residual, and the standardized deviance residual. This table is particularly helpful in identifying observations that are not predicted well by the fitted model.

Next steps

Like your visualization? Why not deploy it? For more information, see Deploy a model.