Last updated: Jan 18, 2024
TwoStep Cluster is an exploratory tool that's designed to reveal natural groupings (or clusters) within a data set that would otherwise not be apparent. The algorithm that's employed by this procedure has several desirable features that differentiate it from traditional clustering techniques, such as handling of categorical and continuous variables, automatic selection of number of clusters, and scalability.
twostepAS Properties |
Values | Property description |
---|---|---|
inputs
|
[f1 ... fN] | TwoStepAS models use a list of input fields, but no target. Weight and frequency fields are not recognized. |
use_predefined_roles
|
Boolean | Default=True |
use_custom_field_assignments
|
Boolean | Default=False |
cluster_num_auto |
Boolean | Default=True |
min_num_clusters |
integer | Default=2 |
max_num_clusters |
integer | Default=15 |
num_clusters |
integer | Default=5 |
clustering_criterion
|
AIC BIC |
|
automatic_clustering_method
|
use_clustering_criterion_setting Distance_jump Minimum Maximum |
|
feature_importance_method
|
use_clustering_criterion_setting effect_size |
|
use_random_seed
|
Boolean | |
random_seed
|
integer | |
distance_measure
|
Euclidean Loglikelihood |
|
include_outlier_clusters
|
Boolean | Default=True |
num_cases_in_feature_tree_leaf_is_less_than
|
integer | Default=10 |
top_perc_outliers
|
integer | Default=5 |
initial_dist_change_threshold
|
integer | Default=0 |
leaf_node_maximum_branches
|
integer | Default=8 |
non_leaf_node_maximum_branches
|
integer | Default=8 |
max_tree_depth
|
integer | Default=3 |
adjustment_weight_on_measurement_level
|
integer | Default=6 |
memory_allocation_mb
|
number | Default=512 |
delayed_split
|
Boolean | Default=True |
fields_not_to_standardize
|
[f1 ... fN] | |
adaptive_feature_selection
|
Boolean | Default=True |
featureMisPercent
|
integer | Default=70 |
coefRange
|
number | Default=0.05 |
percCasesSingleCategory
|
integer | Default=95 |
numCases
|
integer | Default=24 |
include_model_specifications
|
Boolean | Default=True |
include_record_summary
|
Boolean | Default=True |
include_field_transformations
|
Boolean | Default=True |
excluded_inputs
|
Boolean | Default=True |
evaluate_model_quality
|
Boolean | Default=True |
show_feature_importance bar chart
|
Boolean | Default=True |
show_feature_importance_ word_cloud
|
Boolean | Default=True |
show_outlier_clusters_interactive_table_and_chart
|
Boolean | Default=True |
show_outlier_clusters_pivot_table
|
Boolean | Default=True |
across_cluster_feature_importance
|
Boolean | Default=True |
across_cluster_profiles_pivot_table
|
Boolean | Default=True |
withinprofiles
|
Boolean | Default=True |
cluster_distances
|
Boolean | Default=True |
cluster_label
|
String Number |
|
label_prefix
|
String
|
|
evaluation_maxNum |
integer | The maximum number of outliers to display in the output. If there are more than twenty outlier clusters, a pivot table will be displayed instead. |
across_cluster_profiles_table_and_chart |
Boolean | Table and charts of feature importance and cluster centers for each input (field) used in the cluster solution. Selecting different rows in the table displays a different chart. For categorical fields, a bar chart is displayed. For continuous fields, a chart of means and standard deviations is displayed. |