hdbscannode properties

HDBSCAN node iconHierarchical Density-Based Spatial Clustering (HDBSCAN)© uses unsupervised learning to find clusters, or dense regions, of a data set. The HDBSCAN node in SPSS Modeler exposes the core features and commonly used parameters of the HDBSCAN library. The node is implemented in Python, and you can use it to cluster your dataset into distinct groups when you don't know what those groups are at first.

Table 1. hdbscannode properties
hdbscannode properties Data type Property description
custom_fields boolean This option tells the node to use field information specified here instead of that given in any upstream Type node(s). After selecting this option, specify the fields below as required.
inputs field Input fields for clustering.
useHPO boolean Specify true or false to enable or disable Hyper-Parameter Optimization (HPO) based on Rbfopt, which automatically discovers the optimal combination of parameters so that the model will achieve the expected or lower error rate on the samples. Default is false.
min_cluster_size integer The minimum size of clusters. Specify an integer. Default is 5.
min_samples integer The number of samples in a neighborhood for a point to be considered a core point. Specify an integer. If set to 0, the min_cluster_size is used. Default is 0.
algorithm string Specify which algorithm to use: best, generic, prims_kdtree, prims_balltree, boruvka_kdtree, or boruvka_balltree. Default is best.
metric string Specify which metric to use when calculating distance between instances in a feature array: euclidean, cityblock, L1, L2, manhattan, braycurtis, canberra, chebyshev, correlation, minkowski, or sqeuclidean. Default is euclidean.
useStringLabel boolean Specify true to use a string cluster label, or false to use a number cluster label. Default is false.
stringLabelPrefix string If the useStringLabel parameter is set to true, specify a value for the string label prefix. Default prefix is cluster.
approx_min_span_tree boolean Specify true to accept an approximate minimum spanning tree, or false if you are willing to sacrifice speed for correctness. Default is true.
cluster_selection_method string Specify the method to use for selecting clusters from the condensed tree: eom or leaf. Default is eom (Excess of Mass algorithm).
allow_single_cluster boolean Specify true if you want to allow single cluster results. Default is false.
p_value double Specify the p value to use if you're using minkowski for the metric. Default is 1.5.
leaf_size integer If using a space tree algorithm (boruvka_kdtree, or boruvka_balltree), specify the number of points in a leaf node of the tree. Default is 40.
outputValidity boolean Specify true or false to control whether the Validity Index chart is included in the model output.
outputCondensed boolean Specify true or false to control whether the Condensed Tree chart is included in the model output.
outputSingleLinkage boolean Specify true or false to control whether the Single Linkage Tree chart is included in the model output.
outputMinSpan boolean Specify true or false to control whether the Min Span Tree chart is included in the model output.