The Auto Cluster node estimates and compares clustering models that identify groups of
records with similar characteristics. The node works in the same manner as other automated modeling
nodes, enabling you to experiment with multiple combinations of options in a single modeling pass.
Models can be compared using basic measures with which to attempt to filter and rank the usefulness
of the cluster models, and provide a measure based on the importance of particular
fields.
Clustering models are often used to identify groups that can be used as inputs in subsequent
analyses. For example, you may want to target groups of customers based on demographic
characteristics such as income, or based on the services they have bought in the past. You can do
this without prior knowledge about the groups and their characteristics -- you may not know how many
groups to look for, or what features to use in defining them. Clustering models are often referred
to as unsupervised learning models, since they do not use a target field, and do not return a
specific prediction that can be evaluated as true or false. The value of a clustering model is
determined by its ability to capture interesting groupings in the data and provide useful
descriptions of those groupings.
Requirements. One or more fields that define characteristics of interest.
Cluster models do not use target fields in the same manner as other models, because they do not make
specific predictions that can be assessed as true or false. Instead, they are used to identify
groups of cases that may be related. For example, you cannot use a cluster model to predict whether
a given customer will churn or respond to an offer. But you can use a cluster model to assign
customers to groups based on their tendency to do those things. Weight and frequency fields are not
used.
Evaluation fields. While no target is used, you can optionally specify one
or more evaluation fields to be used in comparing models. The usefulness of a cluster model may be
evaluated by measuring how well (or badly) the clusters differentiate these fields.
Supported model types
Copy link to section
Supported model types include TwoStep, K-Means, Kohonen, One-Class SVM, and K-Means-AS.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.