Member-only story

Approximate Nearest Neighbors (ANN) vs K-Nearest Neighbors (KNN) for Large Datasets

Sacrificing some quality in favour of speed

Rukshan Pramoditha
5 min readNov 13, 2024

K-Nearest Neighbors (KNN) is a popular supervised learning algorithm in Machine Learning (ML). It can be used for classification and regression tasks, although it is widely used in classification.

The major challenge in KNN is choosing the right k value which is the number of neighbor instances considered for a new-instance classification. Technically, k is a hyperparameter in KNN. The user needs to define this value.

Let’s assume k=5. The algorithm considers the nearest 5 data points to the new instance. The new instance will be classified to the majority of the class in the nearest neighbours. Let’s say there are three classes (red, green, blue) in the dataset, and the nearest neighbours have 3 red points out of 5. The new instance will be assigned to the red class (Source: Choosing the Right Number of Neighbors for the K-Nearest Neighbors Algorithm).

KNN new instance classification (Image by author)

The KNN algorithm utilizes the Euclidean distance formula to calculate the distances between the new instance and all other existing…

--

--

Rukshan Pramoditha
Rukshan Pramoditha

Written by Rukshan Pramoditha

3,000,000+ Views | BSc in Stats (University of Colombo, Sri Lanka) | Top 50 Data Science, AI/ML Technical Writer on Medium

No responses yet