Nearest Neighbor Analysis is a method for classifying cases based on their similarity to other cases. In machine learning, it was developed as a way to recognize patterns of data without requiring an exact match to any stored patterns, or cases. Similar cases are near each other and dissimilar cases are distant from each other. Thus, the distance between two cases is a measure of their dissimilarity.
Cases that are near each other are said to be "neighbors." When a new case (holdout) is presented, its distance from each of the cases in the model is computed. The classifications of the most similar cases – the nearest neighbors – are tallied and the new case is placed into the category that contains the greatest number of nearest neighbors.
You can specify the number of nearest neighbors to examine; this value is
called k
. The pictures show how a new case would be classified using two different
values of k
. When k
= 5, the new case is placed in category
1
because a majority of the nearest neighbors belong to category
1
. However, when k
= 9, the new case is placed in category
0
because a majority of the nearest neighbors belong to category
0
.
Nearest neighbor analysis can also be used to compute values for a continuous target. In this situation, the average or median target value of the nearest neighbors is used to obtain the predicted value for the new case.