Member-only story
Data Preprocessing
Introducing a New Series of Articles for Data Preprocessing
Make your data ready for analysis
Data is the most valuable asset in any machine learning or deep learning model. The quality of data directly influences the performance of your models. Most of us highly consider performing hyperparameter tuning for model optimization. In most cases, hyperparameter tuning can increase only 2–5% of the model’s performance. However, data quality issues can terminate your project right away.
Real-world data are not in the shape that you want. They are not ready for analysis. In most cases, data has missing values and outliers which is the second worse problem for data scientists, after overfitting. Categorical variables contain non-numerical values that cannot be used in machine learning algorithms. We need to encode them. In some cases, data values are not measured on a similar scale. Sometimes, your data may contain a higher number of features. All these lead to a new step of the model’s lifecycle, which is Data Preprocessing!
Data Preprocessing is the process of getting your data ready for analysis. The same idea can be referred by other terms such as Data Wrangling, Data Munging. The major tasks of data preprocessing include:
- Data cleaning