Backward Elimination: Building Simpler, Smarter Models

Learn how to remove less useful features step-by-step using P-values and Adjusted R².

Backward Elimination: Building Simpler, Smarter Models

When building a Multiple Linear Regression model, we often start by including many potential input features. But are all of them truly useful? Sometimes, adding more features doesn’t improve the model and can even make it worse (overfitting). How do we find the best, simplest set of features?

Backward Elimination is a popular technique to help us with this. It’s a stepwise regression method that starts with all potential features and systematically removes the least useful ones one by one, until only significant features remain.

Main Technical Concept: Backward elimination is a feature selection technique used primarily with Multiple Linear Regression. It starts with a full model (all predictors) and iteratively removes the least statistically significant predictor (usually based on its p-value) until all remaining predictors meet a chosen significance level.

Why Simplify Your Model?

Improved Interpretability: Fewer features = easier to understand what actually matters
Reduced Overfitting: Removing irrelevant features can prevent the model from fitting noise
Lower Complexity: Simpler models train faster and use less memory
Addresses Multicollinearity: Removing redundant features can help reduce correlation issues

The Step-by-Step Process

Select a Significance Level (SL): Choose a threshold (commonly SL = 0.05 = 95% confidence)
Fit the Full Model: Train a Multiple Linear Regression model using all potential features
Check Predictor Significance: Look at the P-value of each predictor’s coefficient
- Low p-value (< SL) = statistically significant (likely a real effect)
- High p-value (> SL) = not statistically significant (effect might be random chance)
Identify Worst Predictor: Find the predictor with the highest p-value above SL
Remove or Keep?:
- If highest p-value > SL: Remove that predictor, refit the model, go back to Step 3
- If all remaining p-values ≤ SL: Stop — you’ve found your optimal feature set

P-values vs. Adjusted R² in Backward Elimination

P-value: The Decision Maker. Used to decide which variable to remove (highest p-value above SL)
Adjusted R²: The Monitor. Shows overall model quality after each removal
- Should stay relatively stable or increase as you remove useless variables
- Significant drop = the removed variable was actually useful

Benefits & Tips

Best Practices:

Common SL Values: 0.05 (most common), 0.10 (more lenient), 0.01 (stricter)
Alternative Methods: Forward Selection (add features), Stepwise Regression (both directions)
Domain Knowledge: Don’t blindly follow statistics; use your domain expertise
Cross-Validation: Perform backward elimination within cross-validation for robustness
Use statsmodels: Provides p-values directly; scikit-learn doesn’t

Backward Elimination: Key Takeaways

Stepwise feature selection starting with all features and removing the least significant
Significance determined primarily by P-value of coefficients
Variable with highest p-value above significance level is removed at each step
Stops when all remaining features have p-values ≤ significance level
Adjusted R² is monitored to ensure model quality isn’t drastically reduced
Aims for a simpler, more interpretable model with statistically significant predictors
statsmodels library provides excellent OLS summaries with p-values for this task