Last updated: Jan 18, 2024
The Auto Cluster node estimates and compares clustering models, which identify groups of records that have similar characteristics. The node works in the same manner as other automated modeling nodes, allowing you to experiment with multiple combinations of options in a single modeling pass. Models can be compared using basic measures with which to attempt to filter and rank the usefulness of the cluster models, and provide a measure based on the importance of particular fields.
Example
node = stream.create("autocluster", "My node")
node.setPropertyValue("ranking_measure", "Silhouette")
node.setPropertyValue("ranking_dataset", "Training")
node.setPropertyValue("enable_silhouette_limit", True)
node.setPropertyValue("silhouette_limit", 5)
autoclusternode Properties |
Values | Property description |
---|---|---|
evaluation
|
field |
Note: Auto Cluster node only. Identifies the field for which an importance value will be calculated.
Alternatively, can be used to identify how well the cluster differentiates the value of this field
and, therefore, how well the model will predict this field.
|
ranking_measure
|
Silhouette
Num_clusters
Size_smallest_cluster
Size_largest_cluster
Smallest_to_largest
Importance
|
|
ranking_dataset
|
Training
Test
|
|
summary_limit
|
integer | Number of models to list in the report. Specify an integer between 1 and 100. |
enable_silhouette_limit
|
flag | |
silhouette_limit
|
integer | Integer between 0 and 100. |
enable_number_less_limit
|
flag | |
number_less_limit
|
number | Real number between 0.0 and 1.0. |
enable_number_greater_limit
|
flag | |
number_greater_limit
|
number | Integer greater than 0. |
enable_smallest_cluster_limit
|
flag | |
smallest_cluster_units
|
Percentage
Counts
|
|
smallest_cluster_limit_percentage
|
number | |
smallest_cluster_limit_count
|
integer | Integer greater than 0. |
enable_largest_cluster_limit
|
flag | |
largest_cluster_units
|
Percentage
Counts
|
|
largest_cluster_limit_percentage
|
number | |
largest_cluster_limit_count
|
integer | |
enable_smallest_largest_limit
|
flag | |
smallest_largest_limit
|
number | |
enable_importance_limit
|
flag | |
importance_limit_condition
|
Greater_than
Less_than
|
|
importance_limit_greater_than
|
number | Integer between 0 and 100. |
importance_limit_less_than
|
number | Integer between 0 and 100. |
<algorithm>
|
flag | Enables or disables the use of a specific algorithm. |
<algorithm>.<property>
|
string | Sets a property value for a specific algorithm. See Setting algorithm properties for more information. |
number_of_models |
integer | |
enable_model_build_time_limit |
boolean | (K-Means, Kohonen, TwoStep, SVM, KNN, Bayes Net and Decision List models only.)
Sets a maximum time limit for any one model. For example, if a particular model requires an unexpectedly long time to train because of some complex interaction, you probably don't want it to hold up your entire modeling run. |
model_build_time_limit |
integer | Time spent on model build. |
enable_stop_after_time_limit |
boolean | (Neural Network, K-Means, Kohonen, TwoStep, SVM, KNN, Bayes Net and C&R Tree models
only.) Stops a run after a specified number of hours. All models generated up to that point will be included in the model nugget, but no further models will be produced. |
stop_after_time_limit |
double | Run time limit (hours). |
stop_if_valid_model |
boolean | Stops a run when a model passes all criteria specified under the Discard settings. |