k-Nearest Neighbors

Table of Contents

An instance of a non parameteric learning algorithm. It doesn't distill the training data into parameters but the data is retained as a part of the algorithm.

1. Basics

When fed a new feature vector, it's clustered into the nearest existing group subject to a closeness criterion summarized as:

  • fetch the k-nearest existing feature vectors to the new vector
  • classify the new one into the cluster holding majority in the k fetches in case of classification
  • average the k fetches numerical label in case of regression to tag the new vector

The closeness criterion most commonly used is the L2-norm (euclidean distance). A popular alternative is cosine similarity when you'd like to capture the notion of an angle between two vectors. Some other criteria that can be considered : Chebychev distance, Mahalanobis distance and Hamming Distance.

The hyperparameters of the algorithm then can be defined to be the choice of the nearness criterion and the value of k.

Tags::ml:ai: