Instances that are near each other are said to be "neighbors." You can specify the number of nearest neighbors to examine, known as k. When a new instance (holdout) is presented, its distance from each of the instances in the model is computed. In predictive models with a target, the statuses of the k most similar instances – the k nearest neighbors – are tallied and the new instance is predicted based on the statuses of those nearest neighbors. In classification models, it is placed into the category that contains the greatest number of nearest neighbors. For regression models, the prediction for the target is based on the mean or median of the target values for the k nearest neighbors.
The KNN node allows you to specify use of either Euclidean or City Block (Manhattan) distances. For models without a target, you must specify a fixed value for the number k of nearest neighbors to identify. For models with a target, several other important options are available. You can:
- Compute distances with features weighted by importance.
- Specify either a fixed number k of nearest neighbors, or automatically select from a range of values for k based on the smallest classification or prediction error, using V-fold cross-validation.
- Perform feature selection.
- Combine feature selection with automatic selection of k (in this case cross-validation is not used for performance reasons).
Like your visualization? Why not deploy it? For more information, see Deploy a model.