Nearest Neighbor Analysis is a method for classifying cases
based on their similarity to other cases. In machine learning, it was developed as a way to
recognize patterns of data without requiring an exact match to any stored patterns, or cases.
Similar cases are near each other and dissimilar cases are distant from each other. Thus, the
distance between two cases is a measure of their dissimilarity.
Cases that are near each other are said to be "neighbors." When a new case
(holdout) is presented, its distance from each of the cases in the model is computed. The
classifications of the most similar cases – the nearest neighbors – are tallied and the new case is
placed into the category that contains the greatest number of nearest neighbors.
You can specify the number of nearest neighbors to examine; this value is
called k. The pictures show how a new case would be classified using two different
values of k. When k = 5, the new case is placed in category
1 because a majority of the nearest neighbors belong to category
1. However, when k = 9, the new case is placed in category
0 because a majority of the nearest neighbors belong to category
0.
Nearest neighbor analysis can also be used to compute values for a continuous
target. In this situation, the average or median target value of the nearest neighbors is used to
obtain the predicted value for the new case.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.