The TwoStep node can handle mixed field types and is able to handle large datasets efficiently. It also has the ability to test several cluster solutions and choose the best, so you don't need to know how many clusters to ask for at the outset. TwoStep can be set to automatically exclude outliers , or extremely unusual cases that can contaminate your results.
Model Information table
This table provides information about the type of model, inputs and various results from the model. Included is the number of features, distance measure used in fitting the model, the numbers and sizes of the clusters, including percentages of the sample, the ratio of the largest to the smallest cluster size and the Average Silhouette measures of overall clustering quality, which is expressed on a 0-1 scale, with larger values indicating better clustering solutions. Also shown are the total and average within clusters sums of squares and the average between clusters sum of squares.
Predictor Importance chart
This chart displays bars representing the predictors in descending order of relative importance for predicting assigning instances to clusters, as determined by a variance-based sensitivity analysis algorithm. The values for each predictor are scaled so that they add to 1.
Cluster Sizes chart
A horizontal bar chart displaying the relative sizes of the clustering in descending order. Hovering over a bar shows the precise percentage of the total number of instances in that cluster based on the TwoStep clustering model.
Like your visualization? Why not deploy it? For more information, see Deploy a model.