# KNNImputer for Filling Missing Data in Data Preprocessing

## K-Nearest Neighbors (KNN) Algorithm for Handling Missing Data

The K-Nearest Neighbors (hereafter, KNN) is a supervised machine-learning algorithm that uses a ** k** number of nearest (closest) neighbors to classify an instance into a relevant class.

Neighbors of an instance are found using the Euclidean distance. The Euclidean distance between two data points is calculated using the following formula.

**x = (x1, x2, …, xn)**

y = (y1, y2,…, yn)

** n** is the dimension of the space. For example, when n=2, the distance between

**x**and

**y**or

**d(x, y)**is calculated on the 2-dimensional space.

**can be any higher dimension.**

*n*In KNN, ** k** is a hyperparameter that we need to define during the execution of the algorithm. Depending on the value of

**, the same instance may be classified into different classes! So, we need to properly define a value for**

*k***.**

*k*# The intuition behind KNNImputer

The KNNImputer utilizes the KNN algorithm to impute the missing values in the dataset. The replacement values are…