How to Cross-Validate a Logistic Regression Model Trained on Class-Imbalanced Data

To mitigate overfitting and improve generalizing capability

Rukshan Pramoditha
6 min readOct 10, 2023

A good machine learning model should generalize to new unseen data.

When the model is trained too well on the training data, it tends to overfit the training data and fails to generalize to new unseen data.

One way to avoid this problem is to cross-validate our models.

Cross-validation refers to splitting the training set (or sometimes, the entire dataset) into multiple folds (subsets) and using one fold as a validation set and the remaining folds as a train set. We change the validation fold at different iterations. The evaluation scores at each iteration are averaged out to get a more robust evaluation score for the model — by the author

There are many types of cross-validation (CV) techniques, but here we use only two of them.

k-fold CV is the most popular one. You will get consistent results with low variance, but it will not…

--

--

Rukshan Pramoditha

3,000,000+ Views | BSc in Stats | Top 50 Data Science, AI/ML Technical Writer on Medium | Data Science Masterclass: https://datasciencemasterclass.substack.com