hdbscannode properties | IBM Cloud Pak for Data as a Service

hdbscannode properties

Last updated: Jan 17, 2024

hdbscannode properties

HDBSCAN node icon Hierarchical Density-Based Spatial Clustering (HDBSCAN)© uses unsupervised learning to find clusters, or dense regions, of a data set. The HDBSCAN node in SPSS Modeler exposes the core features and commonly used parameters of the HDBSCAN library. The node is implemented in Python, and you can use it to cluster your dataset into distinct groups when you don't know what those groups are at first.

Table 1. hdbscannode properties
`hdbscannode` properties	Data type	Property description
custom_fields	boolean	This option tells the node to use field information specified here instead of that given in any upstream Type node(s). After selecting this option, specify the following fields as required.
`inputs`	field	Input fields for clustering.
`useHPO`	boolean	Specify `true` or `false` to enable or disable Hyper-Parameter Optimization (HPO) based on Rbfopt, which automatically discovers the optimal combination of parameters so that the model will achieve the expected or lesser error rate on the samples. Default is `false`.
`min_cluster_size`	integer	The minimum size of clusters. Specify an integer. Default is `5`.
`min_samples`	integer	The number of samples in a neighborhood for a point to be considered a core point. Specify an integer. If set to `0`, the `min_cluster_size` is used. Default is `0`.
`algorithm`	string	Specify which algorithm to use: `best`, `generic`, `prims_kdtree`, `prims_balltree`, `boruvka_kdtree`, or `boruvka_balltree`. Default is `best`.
`metric`	string	Specify which metric to use when calculating distance between instances in a feature array: `euclidean`, `cityblock`, `L1`, `L2`, `manhattan`, `braycurtis`, `canberra`, `chebyshev`, `correlation`, `minkowski`, or `sqeuclidean`. Default is `euclidean`.
`useStringLabel`	boolean	Specify `true` to use a string cluster label, or `false` to use a number cluster label. Default is `false`.
`stringLabelPrefix`	string	If the `useStringLabel` parameter is set to `true`, specify a value for the string label prefix. Default prefix is `cluster`.
`approx_min_span_tree`	boolean	Specify `true` to accept an approximate minimum spanning tree, or `false` if you are willing to sacrifice speed for correctness. Default is `true`.
`cluster_selection_method`	string	Specify the method to use for selecting clusters from the condensed tree: `eom` or `leaf`. Default is `eom` (Excess of Mass algorithm).
`allow_single_cluster`	boolean	Specify `true` if you want to allow single cluster results. Default is `false`.
`p_value`	double	Specify the `p value` to use if you're using `minkowski` for the metric. Default is `1.5`.
`leaf_size`	integer	If using a space tree algorithm (`boruvka_kdtree`, or `boruvka_balltree`), specify the number of points in a leaf node of the tree. Default is `40`.
`outputValidity`	boolean	Specify `true` or `false` to control whether the Validity Index chart is included in the model output.
`outputCondensed`	boolean	Specify `true` or `false` to control whether the Condensed Tree chart is included in the model output.
`outputSingleLinkage`	boolean	Specify `true` or `false` to control whether the Single Linkage Tree chart is included in the model output.
`outputMinSpan`	boolean	Specify `true` or `false` to control whether the Min Span Tree chart is included in the model output.
`is_split`