SPSS predictive analytics clustering algorithms in notebooks
Last updated: Jan 12, 2024
SPSS predictive analytics clustering algorithms in notebooks
You can use the scalable Two-Step or the Cluster model evaluation algorithm to cluster data in notebooks.
Two-Step Cluster
Copy link to section
Scalable Two-Step is based on the familiar two-step clustering algorithm, but extends both its functionality and performance in several directions.
First, it can effectively work with large and distributed data supported by Spark that provides the Map-Reduce computing paradigm.
Second, the algorithm provides mechanisms for selecting the most relevant features for clustering the given data, as well as detecting rare outlier points. Moreover, it provides an enhanced set of evaluation and diagnostic features for enabling
insight.
The two-step clustering algorithm first performs a pre-clustering step by scanning the entire dataset and storing the dense regions of data cases in terms of summary statistics called cluster features. The cluster features are stored in memory
in a data structure called the CF-tree. Finally, an agglomerative hierarchical clustering algorithm is applied to cluster the set of cluster features.