0 / 0
kmeansasnode properties
Last updated: Sep 10, 2024
kmeansasnode properties

K-Means-AS node iconK-means is one of the most commonly used clustering algorithms. It clusters data points into a predefined number of clusters. The K-Means-AS node in SPSS Modeler is implemented in Spark. For more information about k-means algorithms, see Clustering.1

Note: The K-Means-AS node performs one-hot encoding automatically for categorical variables.
Table 1. kmeansasnode properties
kmeansasnode Properties Values Property description
roleUse string Specify predefined to use predefined roles, or custom to use custom field assignments. Default is predefined.
autoModel Boolean Specify true to use the default name ($S-prediction) for the new generated scoring field, or false to use a custom name. Default is true.
features field List of the field names for input when the roleUse property is set to custom.
name string The name of the new generated scoring field when the autoModel property is set to false.
clustersNum integer The number of clusters to create. Default is 5.
initMode string The initialization algorithm. Possible values are k-means|| or random. Default is k-means||.
initSteps integer The number of initialization steps when initMode is set to k-means||. Default is 2.
advancedSettings Boolean Specify true to make the following four properties available. Default is false.
maxIteration integer Maximum number of iterations for clustering. Default is 20.
tolerance string The tolerance to stop the iterations. Possible settings are 1.0E-1, 1.0E-2, ..., 1.0E-6. Default is 1.0E-4.
setSeed Boolean Specify true to use a custom random seed. Default is false.
randomSeed integer The custom random seed when the setSeed property is true.
displayGraph Boolean Select this option if you want a graph to be included in the output.

1 "Clustering - RDD-based API." Apache Spark. MLlib: Main Guide. Aug 2024.