Member-only story

Data Preprocessing

Introducing a New Series of Articles for Data Preprocessing

Make your data ready for analysis

Rukshan Pramoditha
2 min readJul 18, 2021

--

Data is the most valuable asset in any machine learning or deep learning model. The quality of data directly influences the performance of your models. Most of us highly consider performing hyperparameter tuning for model optimization. In most cases, hyperparameter tuning can increase only 2–5% of the model’s performance. However, data quality issues can terminate your project right away.

Real-world data are not in the shape that you want. They are not ready for analysis. In most cases, data has missing values and outliers which is the second worse problem for data scientists, after overfitting. Categorical variables contain non-numerical values that cannot be used in machine learning algorithms. We need to encode them. In some cases, data values are not measured on a similar scale. Sometimes, your data may contain a higher number of features. All these lead to a new step of the model’s lifecycle, which is Data Preprocessing!

Data Preprocessing is the process of getting your data ready for analysis. The same idea can be referred by other terms such as Data Wrangling, Data Munging. The major tasks of data preprocessing include:

  • Data cleaning

--

--

Rukshan Pramoditha
Rukshan Pramoditha

Written by Rukshan Pramoditha

3,000,000+ Views | BSc in Stats (University of Colombo, Sri Lanka) | Top 50 Data Science, AI/ML Technical Writer on Medium

No responses yet