Member-only story

Approximate Nearest Neighbors (ANN) vs K-Nearest Neighbors (KNN) for Large Datasets

Sacrificing some quality in favour of speed

5 min readNov 13, 2024

Photo by NASA Hubble Space Telescope on Unsplash

K-Nearest Neighbors (KNN) is a popular supervised learning algorithm in Machine Learning (ML). It can be used for classification and regression tasks, although it is widely used in classification.

The major challenge in KNN is choosing the right k value which is the number of neighbor instances considered for a new-instance classification. Technically, k is a hyperparameter in KNN. The user needs to define this value.

Let’s assume k=5. The algorithm considers the nearest 5 data points to the new instance. The new instance will be classified to the majority of the class in the nearest neighbours. Let’s say there are three classes (red, green, blue) in the dataset, and the nearest neighbours have 3 red points out of 5. The new instance will be assigned to the red class (Source: Choosing the Right Number of Neighbors for the K-Nearest Neighbors Algorithm).

**KNN new instance classification** (Image by author)

The KNN algorithm utilizes the Euclidean distance formula to calculate the distances between the new instance and all other existing…

Approximate Nearest Neighbors (ANN) vs K-Nearest Neighbors (KNN) for Large Datasets

Sacrificing some quality in favour of speed

Written by Rukshan Pramoditha

No responses yet