Hi again! Today, we discuss one of the most popular machine learning algorithms used by every data scientist — ** Principal Component Analysis (PCA)**. Previously, I have written some contents for this topic. If you haven’t read yet, you may also read them at:

**Principal Component Analysis (PCA) with Scikit-learn****Statistical and Mathematical Concepts behind PCA**

In this article, more emphasis will be given to the two programming languages (R and Python) which we use to perform PCA. At the end of the article, you will see the difference between R and Python in terms of performing PCA.

The dataset that we use for PCA is directly available in Scikit-learn. But it is not in the correct format that we want. So, I have done some manipulations and converted it into a CSV file (download here). This dataset contains breast cancer data of 569 females (observations). The dimensionality of the dataset is 30. It means that there are 30 attributes (characteristics) for each female (observation) in the dataset. …

** k-fold cross-validation** is one of the most popular strategies widely used by data scientists. It is a

The ** Random Forest** is one of the most powerful machine learning algorithms available today. It is a

First, we discuss some of the drawbacks of the Decision Tree algorithm. This will motivate you to use Random Forests.

- Small changes to training data can result in a significantly different tree structure.
- It may have the problem of overfitting (the model fits the training data very well but it fails to generalize for new input data) unless you tune the model hyperparameter of
. …*max_depth*

**Decision Trees **are a non-parametric supervised learning method, capable of finding complex nonlinear relationships in the data. They can perform both classification and regression tasks. But in this article, we only focus on decision trees with a regression task. For this, the equivalent Scikit-learn class is ** DecisionTreeRegressor**.

We will start by discussing how to train, visualize and make predictions with Decision Trees for a regression task. We will also discuss how to regularize hyperparameters in decision trees. This will avoid the problem of overfitting. Finally, we will discuss some of the advantages and disadvantages of Decision Trees.

We use the following code convention to import the necessary libraries and set the plot style. …

Welcome back! It’s very exciting to apply the knowledge that we already have to build machine learning models with some real data. ** Polynomial Regression**, the topic that we discuss today, is such a model which may require some complicated workflow depending on the problem statement and the dataset.

Today, we discuss how to build a Polynomial Regression Model, and how to preprocess the data before making the model. Actually, we apply a series of steps in a particular order to build the complete model. All the necessary tools are available in Python Scikit-learn Machine Learning library.

If you’re not familiar with Python, numpy, pandas, machine learning and Scikit-learn, please read my previous articles that are prerequisites for this article. …

As I promised in the previous article, **Principal Component Analysis (PCA) with Scikit-learn**, today, I’ll discuss the mathematics behind the principal component analysis by manually executing the algorithm using the powerful numpy and pandas libraries. This will help you to understand how PCA really works behind the scenes.

Before proceeding to read this one, I highly recommend you to read the following article:

This is because this article is continued from the above article.

In this article, I first review some statistical and mathematical concepts which are required to execute the PCA calculations.

The ** mean** (also called the

Hi everyone! This is the second unsupervised machine learning algorithm that I’m discussing here. This time, the topic is Principal Component Analysis (PCA). At the very beginning of the tutorial, I’ll explain the dimensionality of a dataset, what dimensionality reduction means, main approaches to dimensionality reduction, reasons for dimensionality reduction and what PCA means. Then, I will go deeper into the topic PCA by implementing the PCA algorithm with Scikit-learn machine learning library. This will help you to easily apply PCA to a real-world dataset and get results very fast.

In a separate article (not in this one), I will discuss the mathematics behind the principal component analysis by manually executing the algorithm using the powerful numpy and pandas libraries. This will help you to understand how PCA really works behind the scenes. …

You’re all welcome to another exciting ML topic — **K-Means Clustering**. To implement the algorithm to a real-world data set, I’ll use the Scikit-learn machine learning library in Python.

** Clustering** is the task of partitioning a dataset into groups, called

** K-Means Algorithm** is one of the simplest and most commonly used clustering algorithms. In k-means clustering, the algorithm attempts to group observations into

Hello friends! This is the 14th article on **Data Science 365** blog. So far, we’ve come a long journey in Data Science and Machine Learning by discussing theory and applying them to real problems. If you haven’t read my previous articles published on **Data Science 365**, please read them to learn something new about Data Science and Machine Learning.

You are ** ALL** welcome to another exciting tutorial at

It’s time to practically apply all these things that I’ve discussed so far at **Data Science 365**. I highly recommend you to read my previous articles published there before reading this one. Today, in this tutorial, I will discuss the most fundamental Machine Learning algorithm called ** Linear Regression** by following the steps of the Predictive Analytics process. …

About