TwoStep Cluster is an exploratory tool that's designed to
reveal natural groupings (or clusters) within a data set that would otherwise not be apparent. The
algorithm that's employed by this procedure has several desirable features that differentiate it
from traditional clustering techniques, such as handling of categorical and continuous variables,
automatic selection of number of clusters, and scalability.
Table 1. twostepAS properties
twostepAS Properties
Values
Property description
inputs
[f1 ... fN]
TwoStepAS models use a list of input fields, but no target. Weight and frequency fields are
not recognized.
use_predefined_roles
Boolean
Default=True
use_custom_field_assignments
Boolean
Default=False
cluster_num_auto
Boolean
Default=True
min_num_clusters
integer
Default=2
max_num_clusters
integer
Default=15
num_clusters
integer
Default=5
clustering_criterion
AIC BIC
automatic_clustering_method
use_clustering_criterion_setting Distance_jump Minimum Maximum
feature_importance_method
use_clustering_criterion_setting effect_size
use_random_seed
Boolean
random_seed
integer
distance_measure
Euclidean Loglikelihood
include_outlier_clusters
Boolean
Default=True
num_cases_in_feature_tree_leaf_is_less_than
integer
Default=10
top_perc_outliers
integer
Default=5
initial_dist_change_threshold
integer
Default=0
leaf_node_maximum_branches
integer
Default=8
non_leaf_node_maximum_branches
integer
Default=8
max_tree_depth
integer
Default=3
adjustment_weight_on_measurement_level
integer
Default=6
memory_allocation_mb
number
Default=512
delayed_split
Boolean
Default=True
fields_not_to_standardize
[f1 ... fN]
adaptive_feature_selection
Boolean
Default=True
featureMisPercent
integer
Default=70
coefRange
number
Default=0.05
percCasesSingleCategory
integer
Default=95
numCases
integer
Default=24
include_model_specifications
Boolean
Default=True
include_record_summary
Boolean
Default=True
include_field_transformations
Boolean
Default=True
excluded_inputs
Boolean
Default=True
evaluate_model_quality
Boolean
Default=True
show_feature_importance bar chart
Boolean
Default=True
show_feature_importance_ word_cloud
Boolean
Default=True
show_outlier_clusters_interactive_table_and_chart
Boolean
Default=True
show_outlier_clusters_pivot_table
Boolean
Default=True
across_cluster_feature_importance
Boolean
Default=True
across_cluster_profiles_pivot_table
Boolean
Default=True
withinprofiles
Boolean
Default=True
cluster_distances
Boolean
Default=True
cluster_label
String Number
label_prefix
String
evaluation_maxNum
integer
The maximum number of outliers to display in the output. If there are more than twenty
outlier clusters, a pivot table will be displayed instead.
across_cluster_profiles_table_and_chart
Boolean
Table and charts of feature importance and cluster centers for each input (field) used in the
cluster solution. Selecting different rows in the table displays a different chart. For categorical
fields, a bar chart is displayed. For continuous fields, a chart of means and standard deviations is
displayed.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.