Hands-On K-Means Clustering

With Python, Numpy, Scikit-learn and Yellowbrick

Rukshan Pramoditha
16 min readJul 20, 2020
Photo by Markus Winkler on Unsplash

Today, we discuss another exciting machine learning algorithm — K-Means Clustering. You will get hands-on experience in implementing the K-Means algorithm with Python, Scikit-learn and Yellowbrick.

What is K-Means Clustering?

Clustering is performed to identify distinct groups in the dataset such that the observations within a group are similar to each other but different from observations in other groups. The groups are known as clusters. Clustering is often used to find patterns in unlabeled data.

K-Means Algorithm groups observations into k groups, with each group having roughly equal variance. The number of groups, denoted by k, should be specified as a hyperparameter.

K-means clustering falls under unsupervised machine learning algorithms. To learn more about supervised vs unsupervised learning, you can read my Getting Started with Machine Learning (ML) article. There, you will find the difference between those two learning methods.

In scikit-learn, k-means clustering is implemented using the KMeans() class. When using this class, the user must specify the value of the hyperparameter k by setting the n_clusters parameter to an integer greater than 1 since the…

--

--

Rukshan Pramoditha

3,000,000+ Views | BSc in Stats | Top 50 Data Science, AI/ML Technical Writer on Medium | Data Science Masterclass: https://datasciencemasterclass.substack.com