Association rules associate a particular conclusion (the
purchase of a particular product, for example) with a set of conditions (the purchase of several
other products, for example).
For example, the rule
beer <= cannedveg & frozenmeal (173, 17.0%, 0.84)
Copy to clipboardCopied to clipboard
states that beer often occurs when cannedveg
and frozenmeal occur together. The rule is 84% reliable and applies to 17% of the
data, or 173 records. Association rule algorithms automatically find the associations that you could
find manually using visualization techniques, such as the Web node.
The advantage of association rule algorithms over the more standard decision
tree algorithms (C5.0 and C&R Trees) is that associations can exist between any of the
attributes. A decision tree algorithm will build rules with only a single conclusion, whereas
association algorithms attempt to find many rules, each of which may have a different
conclusion.
The disadvantage of association algorithms is that they are trying to find
patterns within a potentially very large search space and, hence, can require much more time to run
than a decision tree algorithm. The algorithms use a generate and test method for
finding rules--simple rules are generated initially, and these are validated against the dataset.
The good rules are stored and all rules, subject to various constraints, are then specialized.
Specialization is the process of adding conditions to a rule. These new rules are then
validated against the data, and the process iteratively stores the best or most interesting rules
found. The user usually supplies some limit to the possible number of antecedents to allow in a
rule, and various techniques based on information theory or efficient indexing schemes are used to
reduce the potentially large search space.
At the end of the processing, a table of the best rules is presented. Unlike a
decision tree, this set of association rules cannot be used directly to make predictions in the way
that a standard model (such as a decision tree or a neural network) can. This is due to the many
different possible conclusions for the rules. Another level of transformation is required to
transform the association rules into a classification rule set. Hence, the association rules
produced by association algorithms are known as unrefined models. Although the user can
browse these unrefined models, they cannot be used explicitly as classification models unless the
user tells the system to generate a classification model from the unrefined model. This is done from
the browser through a Generate menu option.
Two association rule algorithms are supported:
The Apriori node extracts a set of rules from the
data, pulling out the rules with the highest information content. Apriori offers five different
methods of selecting rules and uses a sophisticated indexing scheme to process large data sets
efficiently. For large problems, Apriori is generally faster to train; it has no arbitrary limit on
the number of rules that can be retained, and it can handle rules with up to 32 preconditions.
Apriori requires that input and output fields all be categorical but delivers better performance
because it is optimized for this type of data.
The Sequence node discovers association rules in
sequential or time-oriented data. A sequence is a list of item sets that tends to occur in a
predictable order. For example, a customer who purchases a razor and aftershave lotion may purchase
shaving cream the next time he shops. The Sequence node is based on the CARMA association rules
algorithm, which uses an efficient two-pass method for finding sequences.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.