Data ScienceStatistics 2025-05-10

Backward Elimination: Building Simpler, Smarter Models

Master stepwise feature selection using p-values and Adjusted R². Learn the iterative process of removing insignificant predictors and avoiding common pitfalls with backward elimination.

Backward Elimination: Building Simpler, Smarter Models

Learn how to remove less useful features step-by-step using P-values and Adjusted R².

Backward Elimination: Building Simpler, Smarter Models

When building a Multiple Linear Regression model, we often start by including many potential input features. But are all of them truly useful? Sometimes, adding more features doesn’t improve the model and can even make it worse (overfitting). How do we find the best, simplest set of features?

Backward Elimination is a popular technique to help us with this. It’s a stepwise regression method that starts with all potential features and systematically removes the least useful ones one by one, until only significant features remain.

Main Technical Concept: Backward elimination is a feature selection technique used primarily with Multiple Linear Regression. It starts with a full model (all predictors) and iteratively removes the least statistically significant predictor (usually based on its p-value) until all remaining predictors meet a chosen significance level.

Why Simplify Your Model?

  • Improved Interpretability: Fewer features = easier to understand what actually matters
  • Reduced Overfitting: Removing irrelevant features can prevent the model from fitting noise
  • Lower Complexity: Simpler models train faster and use less memory
  • Addresses Multicollinearity: Removing redundant features can help reduce correlation issues

The Step-by-Step Process

  1. Select a Significance Level (SL): Choose a threshold (commonly SL = 0.05 = 95% confidence)
  2. Fit the Full Model: Train a Multiple Linear Regression model using all potential features
  3. Check Predictor Significance: Look at the P-value of each predictor’s coefficient
    • Low p-value (< SL) = statistically significant (likely a real effect)
    • High p-value (> SL) = not statistically significant (effect might be random chance)
  4. Identify Worst Predictor: Find the predictor with the highest p-value above SL
  5. Remove or Keep?:
    • If highest p-value > SL: Remove that predictor, refit the model, go back to Step 3
    • If all remaining p-values ≤ SL: Stop — you’ve found your optimal feature set

P-values vs. Adjusted R² in Backward Elimination

  • P-value: The Decision Maker. Used to decide which variable to remove (highest p-value above SL)
  • Adjusted R²: The Monitor. Shows overall model quality after each removal
    • Should stay relatively stable or increase as you remove useless variables
    • Significant drop = the removed variable was actually useful

Benefits & Tips

Best Practices:

  • Common SL Values: 0.05 (most common), 0.10 (more lenient), 0.01 (stricter)
  • Alternative Methods: Forward Selection (add features), Stepwise Regression (both directions)
  • Domain Knowledge: Don’t blindly follow statistics; use your domain expertise
  • Cross-Validation: Perform backward elimination within cross-validation for robustness
  • Use statsmodels: Provides p-values directly; scikit-learn doesn’t

Backward Elimination: Key Takeaways

  • Stepwise feature selection starting with all features and removing the least significant
  • Significance determined primarily by P-value of coefficients
  • Variable with highest p-value above significance level is removed at each step
  • Stops when all remaining features have p-values ≤ significance level
  • Adjusted R² is monitored to ensure model quality isn’t drastically reduced
  • Aims for a simpler, more interpretable model with statistically significant predictors
  • statsmodels library provides excellent OLS summaries with p-values for this task
← All articles
Nerchuko Academy · Free DS Interview Prep