The TwoStep-AS node can handle mixed field types and is able to handle large datasets efficiently. It also has the ability to test several cluster solutions and choose the best, so you don’t need to know how many clusters to ask for at the outset. TwoStep-AS can be set to automatically exclude outliers, or extremely unusual cases that can contaminate your results.
Model Information table
This table provides information about the type of model, inputs and various results from the model. Included is the number of features, distance measure used in fitting the model, the numbers and sizes of the clusters, including percentages of the sample, the ratio of the largest to the smallest cluster size and the Average Silhouette measures of overall clustering quality, which is expressed on a 0-1 scale, with larger values indicating better clustering solutions. Also shown are the total and average within clusters sums of squares and the average between clusters sum of squares.
Predictor Importance chart
This chart displays bars representing the predictors in descending order of relative importance for predicting assigning instances to clusters, as determined by a variance-based sensitivity analysis algorithm. The values for each predictor are scaled so that they add to 1.
Cluster Sizes chart
A horizontal bar chart displaying the relative sizes of the clustering in descending order. Hovering over a bar shows the precise percentage of the total number of instances in that cluster based on the TwoStep clustering model.
Model Information table
Contains information about input settings and the final model. Input settings information includes whether the number of clusters was specified or chosen automatically, whether adaptive feature selection was used, the method used to measure feature importance, and the distance measure. Information about the final model includes the numbers of regular and outlier clusters and the continuous and categorical outputs used in the clustering model.
Records Summary table
This table shows you how many records were used to fit the model and whether any records were excluded due to missing data.
Excluded Inputs table
Identifies any features that failed the adaptive feature selection criteria and were deemed to have little or no potential for improving the overall model goodness of the final model, or indicates that all specified features were included.
Model Quality table
Displays for each cluster the number of records classified into that cluster, and the goodness and importance of each cluster. Goodness is a measure of cluster cohesion and separation, while importance is a measure of cluster cohesion. The overall model goodness (the Average Silhouette Coefficient) is also shown.
Feature Importance chart
This chart displays bars representing the features in descending order of relative importance for assigning group memberships, as determined by relative contributions to improvement in either the Akaike information criterion (AIC) or the Bayesian information criterion (BIC), or based on effect-size measures of the relationships between features and clusters. The values for each predictor are scaled so that the largest value is 1.
Outlier Clusters table & chart
For each identified outlier cluster, the table contains a line showing the number of records in that outlier cluster, its strength, which is a measure of how different it is from the regular clusters, and its relative similarity to each of the regular clusters, scaled so that the similarities for a given outlier cluster sum to 1 across regular clusters. The higher the outlier strength, the less useful will be the regular clusters in interpreting the outlier cluster.
The chart shows the relative similarities of each outlier cluster to each regular cluster on a 0 to 1 scale.
Cluster Profiles using Across-Cluster Feature Importance table & chart
The table contains a line for each input feature showing its Across-Cluster Featuere Importance value, or its importance to the overall clustering solution, along with its mean value for each cluster if the feature is continuous, or the mode for each cluster if the feature is categorical. The features are listed in descending order of importance. Importance values are scaled so that the highest value is 1.
The chart contains a set of boxplot-like displays, one for each cluster and one for the overall data, for the most important continuous feature. The boxes show means and standard deviations. Clicking on any other continuous feature changes the feature shown on the vertical axis to that selected feature.
Within-Cluster Feature Importance tables
One table appears for each cluster, with the cluster center represented by the mean for each continuous feature and the mode for each categorical feature, along with the Within-Cluster Feature Importance value, or the importance of a given feature to that cluster. Importance values are scaled so that the largest value is 1.
Cluster Distances charts
One chart is shown for each cluster, showing that focal cluster situated at the center of a network, with each other cluster placed a proportional distance from the focal cluster. Hovering over a node identifying a cluster in a chart shows the number and percentage of records in the cluster, as well as its Goodness and Importance measure values. Goodness is a measure of cohesion and separation, and Importance is a measure of cohesion.
Like your visualization? Why not deploy it? For more information, see Deploy a model.