Last updated: Jan 18, 2024
Hierarchical Density-Based Spatial Clustering (HDBSCAN)© uses unsupervised learning to find clusters, or dense regions, of a data set. The HDBSCAN node in SPSS Modeler exposes the core features and commonly used parameters of the HDBSCAN library. The node is implemented in Python, and you can use it to cluster your dataset into distinct groups when you don't know what those groups are at first.
hdbscannode properties |
Data type | Property description |
---|---|---|
custom_fields | boolean | This option tells the node to use field information specified here instead of that given in any upstream Type node(s). After selecting this option, specify the following fields as required. |
inputs |
field | Input fields for clustering. |
useHPO |
boolean | Specify true or false to enable or disable Hyper-Parameter
Optimization (HPO) based on Rbfopt, which automatically discovers the optimal combination of
parameters so that the model will achieve the expected or lesser error rate on the samples. Default
is false . |
min_cluster_size |
integer | The minimum size of clusters. Specify an integer. Default is 5 . |
min_samples |
integer | The number of samples in a neighborhood for a point to be considered a core point. Specify an
integer. If set to 0 , the min_cluster_size is used. Default is
0 . |
algorithm |
string | Specify which algorithm to use: best , generic ,
prims_kdtree , prims_balltree , boruvka_kdtree , or
boruvka_balltree . Default is best . |
metric |
string | Specify which metric to use when calculating distance between instances in a feature array:
euclidean , cityblock , L1 , L2 ,
manhattan , braycurtis , canberra ,
chebyshev , correlation , minkowski , or
sqeuclidean . Default is euclidean . |
useStringLabel |
boolean | Specify true to use a string cluster label, or false to use
a number cluster label. Default is false . |
stringLabelPrefix |
string | If the useStringLabel parameter is set to true , specify a
value for the string label prefix. Default prefix is cluster . |
approx_min_span_tree |
boolean | Specify true to accept an approximate minimum spanning tree, or
false if you are willing to sacrifice speed for correctness. Default is
true . |
cluster_selection_method |
string | Specify the method to use for selecting clusters from the condensed tree:
eom or leaf . Default is eom (Excess of Mass
algorithm). |
allow_single_cluster |
boolean | Specify true if you want to allow single cluster results. Default is
false . |
p_value |
double | Specify the p value to use if you're using minkowski for
the metric. Default is 1.5 . |
leaf_size |
integer | If using a space tree algorithm (boruvka_kdtree , or
boruvka_balltree ), specify the number of points in a leaf node of the tree. Default
is 40 . |
outputValidity |
boolean | Specify true or false to control whether the Validity Index
chart is included in the model output. |
outputCondensed |
boolean | Specify true or false to control whether the Condensed Tree
chart is included in the model output. |
outputSingleLinkage |
boolean | Specify true or false to control whether the Single Linkage
Tree chart is included in the model output. |
outputMinSpan |
boolean | Specify true or false to control whether the Min Span Tree
chart is included in the model output. |
is_split |