Recursive Feature Elimination (RFE) in Regression and Classification Models

Using the RFECV() function in Scikit-learn and Yellowbrick

Rukshan Pramoditha

--

Photo by Visax on Unsplash

Now all features in the dataset contribute the same to machine learning models.

We can remove the unwanted features from the model by using special feature selection techniques. Recursive feature elimination (RFE) is one of them. Doing so will give you the following benefits.

  • Reduce the complexity of the model: This will enhance the training speed and interpretability of the model.
  • Remove unnecessary noise generated from less important features: This will regularize the model and prevents overfitting.
  • Remove dependencies and collinearity between the input features: This will also regularize the model and prevents overfitting.

In short, recursive feature elimination (RFE) recursively eliminates one feature or a small set of features at a time using cross-validation (CV).

RFE is also a dimensionality reduction method as it reduces the number of features in the model by removing unwanted features.

RFE is always combined with cross-validation which helps to find the best number of features to keep during RFE.

In RFE, first, we train a regression or classification model with all features in the dataset. Then, the features are ranked by the feature importance scores by using the model's coef_(in regression models) or feature_importances_(in classification models) attribute.

Then, we evaluate the model with all the features on the test set by using a proper evaluation metric (score). Here, we use cross-validation (CV) to evaluate the model.

Then, we remove one feature or a small set of features at a time and re-train the model. After evaluating with cross-validation, if we get a better evaluation score, we continue to remove features from the model. If the score becomes worse, we put the removed feature(s) back into the model and keep those features as the best.

RFECV() function in Scikit-learn and Yellowbrick

--

--

Rukshan Pramoditha

2,000,000+ Views | BSc in Stats | Top 50 Data Science, AI/ML Technical Writer on Medium | Data Science Masterclass: https://datasciencemasterclass.substack.com/