An essential guide to writing unique contents that add value for readers, in both context and appearance

Photo by Mahbod Akhzami on Unsplash

I really like the add-free, simple but eye-pleasing interface on Medium. It easily attracts any reader who comes to the platform to read something interesting. It also provides simple but useful tools for writers to create eye-pleasing contents.

I have been writing on Medium for months. Since I prefer Data Science and Machine Learning, I often write contents related to those fields. As a writer, I always try to improve the quality of the contents by considering various aspects.

In this post, I will share 10 best practices that I always follow when writing contents on Medium. These best practices…


A guide to using Scikit-learn GridSearchCV and RandomizedSearchCV functions for hyperparameter optimization

Photo by DiChatz on Unsplash

You cannot get the best out of your machine learning model without doing any hyperparameter optimization (tuning). The default hyperparameter values do not make the best model for your data. Sikit-learn — the Python machine learning library provides two special functions for hyperparameter optimization:

  • GridSearchCV — for Grid Search
  • RandomizedSearchCV — for Random Search

If you’re new to Data Science and Machine Learning fields, you may be not familiar with these words. In this post, I’ll try to give more emphasis on Python implementation of Grid Search and Random Search and explain the difference between them. …


Let’s solve linear systems with a Unique solution, No solution or Infinitely many solutions

Photo by Antoine Dautry on Unsplash

In linear algebra, a system of linear equations is defined as a collection of two or more linear equations having the same set of variables. All equations in the system are considered simultaneously. Systems of linear equations are used in different sectors such as Manufacturing, Marketing, Business, Transportation, etc.

The solving process of a system of linear equations will become more complicated when the number of equations and variables are increased. The solution must satisfy every equation in the system. In Python, NumPy (Numerical Python), SciPy (Scientific Python) and SymPy (Symbolic Python) libraries can be used to solve systems of…


Let’s walk through the steps of the machine learning process to find out “Why?”

Photo by Alex Knight on Unsplash

Many people who are already data scientists or new to the field of data science are looking at an answer to the question “Will AutoML (Automated Machine Learning) replace data scientists?” Asking a question like this is very reasonable because Automation has already been introduced to Machine Learning and it plays a key role in the modern world. In addition to that, people who want to become data scientists are thinking about ways to secure a spot in the job market for a long period of time.

AutoML will NOT replace your data science profession. It’s just here to make…


Reduce the size of your dataset while keeping as much of the variation as possible

Photo by Nika Benedictova on Unsplash

In both Statistics and Machine Learning, the number of attributes, features or input variables of a dataset is referred to as its dimensionality. For example, let’s take a very simple dataset containing 2 attributes called Height and Weight. This is a 2-dimensional dataset and any observation of this dataset can be plotted in a 2D plot.


PCA in action to remove multicollinearity

Photo by Gabriella Clare Marino on Unsplash

Multicollinearity occurs when features (input variables) are highly correlated with one or more of the other features in the dataset. It affects the performance of regression and classification models. PCA (Principal Component Analysis) takes advantage of multicollinearity and combines the highly correlated variables into a set of uncorrelated variables. Therefore, PCA can effectively eliminate multicollinearity between features.

In this post, we’ll build a logistic regression model on a classification dataset called breast_cancer data. The initial model can be considered as the base model. Then, we’ll apply PCA on breast_cancer data and build the logistic regression model again. After that, we’ll…


In reality, it is far different than you imagine

Photo by Nikita Tikhomirov on Unsplash

Data scientist was the number 1 job role in 2020. However, in 2021, machine learning engineer is trending. There are still unfilled vacancies for data science professions in many countries.

Most people are studying data science and machine learning nowadays. Their ultimate goal will be getting a dream data science job. However, most of them don’t know the reality of a data science job as they are not dealing with real-world things while they’re learning the subject.

You may be familiar with different machine learning algorithms. You may also know the behind the scene process of each algorithm. You may…


Real-world machine learning is far different than you imagine

Photo by Chor Tsang on Unsplash

Previously, I’ve published an article called “10 Real Truths about Machine Learning”. Today, I’ll list down 13 real-world insights in machine learning which are not within the list in the previous article. In this short (but useful) article, more emphasis will be given considering real-world insights.

Let’s go through the list. Wherever possible, I’ll add the links to my previous articles so that you can visit them to find more information on a specific point.

  • Data Scientist and Machine Learning Engineer are two completely different roles.
  • The machine learning engineer is trending and will be the number 1 job in…


Choose the Right Laptop for Data Science and Machine Learning

Photo by XPS on Unsplash

If you’re learning Data Science and Machine Learning, you definitely need a laptop. This is because you need to write and run your own code to get hands-on experience. When you also consider portability, the laptop is the best option instead of a desktop.

A traditional laptop may not be perfect for your data science and machine learning tasks. You need to consider laptop specifications carefully to choose the right laptop. If you’re looking to buy a laptop for data science and machine learning tasks, this post is for you! …


With Selection, Slicing, Indexing and Filtering

Photo by Hans-Peter Gauster on Unsplash

In part 1 and part 2, we’ve learned how to inspect, describe and summarize a Pandas DataFrame. Today, we’ll learn how to extract a subset of a Pandas DataFrame. This is very useful because we often want to perform operations on subsets of our data. There are many different ways of subsetting a Pandas DataFrame. You may need to select specific columns with all rows. Sometimes, you want to select specific rows with all columns or select rows and columns that meet a specific criterion, etc.

All different ways of subsetting can be divided into 4 categories: Selection, Slicing, Indexing…

Rukshan Pramoditha

Data Analyst with Python || Bring data into actionable insights

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store