# 6 Key differences between np.ndarray and np.matrix objects

## These two are technically different even if they seem to be the same in appearance

Numpy is the foundational Python library that is widely used for numerical calculations and linear algebra. ndarray and matrix objects are commonly used numpy objects. ndarray objects are created from the numpy ndarray class. matrix objects are created from the numpy matrix class. If you’re new to numpy, you may get confused with numpy ndarray and numpy matrix objects. They are two different things if they seem to be the same in appearance. Today, we’ll discuss 6 such differences between them.

I recommend you to read the following content written by me.

# 4 Ways to Visualize Individual Decision Trees in a Random Forest

## Using sklearn, graphviz and dtreeviz Python packages for fancy visualization of decision trees

Data visualization plays a key role in data analysis and machine learning fields as it allows you to reveal the hidden patterns behind the data. Model visualization allows you to interpret the model. The visualization process is now easy with plenty of available Python packages today.

Tree-based models such as Decision Trees, Random Forests and XGBoost are more popular for supervised learning (classification and repression) tasks. This is because those models are well fitted on non-linear data which are frequently used in real-world applications.

The baseline model for any tree-based model is the Decision Tree. Random Forests consist of multiple…

# 11 Dimensionality reduction techniques you should know in 2021

## Reduce the size of your dataset while keeping as much of the variation as possible

In both Statistics and Machine Learning, the number of attributes, features or input variables of a dataset is referred to as its dimensionality. For example, let’s take a very simple dataset containing 2 attributes called Height and Weight. This is a 2-dimensional dataset and any observation of this dataset can be plotted in a 2D plot.

# Plotting the Learning Curve with a Single Line of Code

## To see how much your model benefits from adding more training data

The Learning Curve is another great tool to have in any data scientist’s toolbox. It is a visualization technique that can be to see how much our model benefits from adding more training data. It shows the relationship between the training score and the test score for a machine learning model with a varying number of training samples. Generally, the cross-validation procedure is taken into effect when plotting the learning curve.

A good ML model fits the training data very well and is generalizable to new input data as well. Sometimes, an ML model may require more training instances in…

# Image Compression Using Principal Component Analysis (PCA)

## Dimensionality Reduction in Action

Principal Component Analysis (PCA) is a linear dimensionality reduction technique (algorithm) that transform a set of correlated variables (p) into a smaller k (k<p) number of uncorrelated variables called principal components while keeping as much of the variability in the original data as possible.

One of the use cases of PCA is that it can be used for image compression — a technique that minimizes the size in bytes of an image while keeping as much of the quality of the image as possible. In this post, we will discuss that technique by using the MNIST dataset of handwritten digits…

# 9 Guidelines to master Scikit-learn without giving up in the middle

## Learn the way that worked for me

Undoubtedly, Scikit-learn is one of the best machine learning libraries available today. There are several reasons for that. The consistency among Scikit-learn estimators is one reason. You cannot find such consistency in any other machine learning library. The .fit()/.predict() paradigm best describes the consistency. Another reason is that Scikit-learn has a variety of uses. It can be used for classification, regression, clustering, dimensionality reduction, anomaly detection.

Therefore, Scikit-learn is a must-have Python library in your data science toolkit. But, learning to use Scikit-learn is not straightforward. It’s not simple as you imagine. You have to set up some background before…

# 4 Useful clustering methods you should know in 2021

## Form groups of similar observations based on distance

The main objective of the cluster analysis is to form groups (called clusters) of similar observations usually based on the euclidean distance. In machine learning terminology, clustering is an unsupervised task. Today, we discuss 4 useful clustering methods which belong to two main categories — Hierarchical clustering and Non-hierarchical clustering.

Under hierarchical clustering, we will discuss 3 agglomerative hierarchical methods — Single Linkage, Complete Linkage and Average Linkage. Under non-hierarchical clustering methods, we will discuss the K-Means Clustering.

# 4 Machine learning techniques for outlier detection in Python

## Machine learning-based outlier detection

Based on the feedback given by readers after publishing “Two outlier detection techniques you should know in 2021”, I have decided to make this post which includes four different machine learning techniques (algorithms) for outlier detection in Python. Here, I will use the I-I (Intuition-Implementation) approach for each technique. That will help you to understand how each algorithm works behind the scenes without going deeper into the algorithm mathematics (the Intuition part) and implement each algorithm with the Scikit-learn machine learning library (the Implementation part). I will also use some graphical techniques to describe each algorithm and its output. At…

# Top 10 Matrix Operations in Numpy with Examples

## Perform Linear Algebra with Python

About 30–40% of the mathematical knowledge required for Data Science and Machine Learning comes from linear algebra. Matrix operations play a significant role in linear algebra. Today, we discuss 10 of such matrix operations with the help of the powerful numpy library. Numpy is generally used to perform numerical calculations in Python. It also has special classes and sub-packages for matrix operations. The use of vectorization allows numpy to perform matrix operations more efficiently by avoiding many for loops.

I will include the meaning, background description and code examples for each matrix operation discussing in this article. The “Key Takeaways”…

# Two outlier detection techniques you should know in 2021

## Elliptic Envelope and IQR-based detection 