Anomaly detection models are used to identify outliers, or
unusual cases, in the data. Unlike other modeling methods that store rules about unusual cases,
anomaly detection models store information on what normal behavior looks like. This makes it
possible to identify outliers even if they do not conform to any known pattern, and it can be
particularly useful in applications, such as fraud detection, where new patterns may constantly be
emerging. Anomaly detection is an unsupervised method, which means that it does not require a
training dataset containing known cases of fraud to use as a starting point.
While traditional methods of identifying outliers generally look at one or two
variables at a time, anomaly detection can examine large numbers of fields to identify clusters or
peer groups into which similar records fall. Each record can then be compared to others in its peer
group to identify possible anomalies. The further away a case is from the normal center, the more
likely it is to be unusual. For example, the algorithm might lump records into three distinct
clusters and flag those that fall far from the center of any one cluster.
Each record is assigned an anomaly index, which is the ratio of the group
deviation index to its average over the cluster that the case belongs to. The larger the value of
this index, the more deviation the case has than the average. Under the usual circumstance, cases
with anomaly index values less than 1 or even 1.5 would not be considered as anomalies, because the
deviation is just about the same or a bit more than the average. However, cases with an index value
greater than 2 could be good anomaly candidates because the deviation is at least twice the average.
Anomaly detection is an exploratory method designed for quick detection of
unusual cases or records that should be candidates for further analysis. These should be regarded as
suspected anomalies, which, on closer examination, may or may not turn out to be real. You
may find that a record is perfectly valid but choose to screen it from the data for purposes of
model building. Alternatively, if the algorithm repeatedly turns up false anomalies, this may point
to an error or artifact in the data collection process.
Note that anomaly detection identifies unusual records or cases through
cluster analysis based on the set of fields selected in the model without regard for any specific
target (dependent) field and regardless of whether those fields are relevant to the pattern you are
trying to predict. For this reason, you may want to use anomaly detection in combination with
feature selection or another technique for screening and ranking fields. For example, you can use
feature selection to identify the most important fields relative to a specific target and then use
anomaly detection to locate the records that are the most unusual with respect to those fields. (An
alternative approach would be to build a decision tree model and then examine any misclassified
records as potential anomalies. However, this method would be more difficult to replicate or
automate on a large scale.)
Example. In screening agricultural development grants
for possible cases of fraud, anomaly detection can be used to discover deviations from the norm,
highlighting those records that are abnormal and worthy of further investigation. You are
particularly interested in grant applications that seem to claim too much (or too little) money for
the type and size of farm.
Requirements. One or more input fields. Note that only
fields with a role set to Input using a source or Type node can be used as
inputs. Target fields (role set to Target or Both) are
ignored.
Strengths. By flagging cases that do not conform
to a known set of rules rather than those that do, Anomaly Detection models can identify unusual
cases even when they don't follow previously known patterns. When used in combination with feature
selection, anomaly detection makes it possible to screen large amounts of data to identify the
records of greatest interest relatively quickly.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.