R-Squared vs. Adjusted R-Squared: The Key Difference Explained

📌 "R-squared always increases when you add more variables, even useless ones. Adjusted R-squared fixes this flaw." This single sentence captures the entire purpose of Adjusted R-squared. This article breaks down both metrics with clear examples.

In econometrics and quantitative research, regression models help us understand relationships between variables. Two common metrics to evaluate how well a model fits the data are R-squared and Adjusted R-squared. While they look similar, they serve different purposes. R-squared tells you the proportion of variance explained, but it has a critical weakness: it never decreases when you add more predictors. Adjusted R-squared corrects this by penalizing model complexity.

What is R-Squared?

R-squared, also known as the coefficient of determination, is a statistical measure that shows the percentage of the dependent variable's variation that is explained by the independent variables in the model. Its value ranges from 0 to 1 (or 0% to 100%). A higher R-squared generally indicates a better fit.

Example 1 Simple Salary Prediction Model

Model: Salary = β₀ + β₁(Experience) + ε
Result: R-squared = 0.65

🔍 Explanation: An R-squared of 0.65 means that 65% of the variation in salaries across the sample can be explained by differences in years of experience. The remaining 35% is due to other factors not in the model (like education, location, or random noise).

Example 2 GDP Growth Model

Model: GDP Growth = β₀ + β₁(Investment) + β₂(Labor Force) + ε
Result: R-squared = 0.82

🔍 Explanation: Here, 82% of the variation in GDP growth rates is explained by the combined effect of investment levels and labor force size. This is a relatively high R-squared, suggesting these two variables capture most of the key drivers.

The Problem with R-Squared: It Always Increases

The formula for R-squared is: R² = 1 - (SS_res / SS_tot). Because adding a new variable (even a random one) will always reduce the residual sum of squares (SS_res) by at least a tiny amount, R-squared will never decrease when you add predictors. This makes it useless for comparing models with different numbers of variables.

⚠️ The Key Pitfall of R-Squared

Flaw: R-squared mechanically increases with every added variable, encouraging overfitting.
Consequence: You could add irrelevant predictors (like "number of pets owned" to a salary model) and still see R-squared rise slightly, misleading you into thinking the model improved.
Solution: Use Adjusted R-squared for model comparison.

What is Adjusted R-Squared?

Adjusted R-squared adjusts the R-squared value based on the number of predictors (k) and the sample size (n). Its formula is: Adjusted R² = 1 - [(1 - R²) * (n - 1) / (n - k - 1)]. Unlike R-squared, Adjusted R-squared can decrease if a new predictor doesn't improve the model enough to justify the added complexity.

Example 3 Adding a Useless Predictor

Original Model (2 predictors): House Price = β₀ + β₁(Size) + β₂(Bedrooms) + ε
R-squared: 0.75
Adjusted R-squared: 0.743

New Model (3 predictors): House Price = β₀ + β₁(Size) + β₂(Bedrooms) + β₃(Random Noise) + ε
R-squared: 0.751 (increased slightly)
Adjusted R-squared: 0.741 (decreased!)

🔍 Explanation: Adding a random noise variable slightly increased R-squared (from 0.75 to 0.751) due to mathematical necessity. However, Adjusted R-squared correctly penalized the model for the useless third variable, dropping from 0.743 to 0.741. This tells you the new model is actually worse.

Example 4 Adding a Meaningful Predictor

Original Model (1 predictor): Car Fuel Efficiency = β₀ + β₁(Engine Size) + ε
R-squared: 0.60
Adjusted R-squared: 0.595

New Model (2 predictors): Car Fuel Efficiency = β₀ + β₁(Engine Size) + β₂(Weight) + ε
R-squared: 0.78 (increased)
Adjusted R-squared: 0.775 (also increased)

🔍 Explanation: Adding the car's weight as a predictor significantly improved the model's explanatory power. Both R-squared and Adjusted R-squared increased. The Adjusted R-squared increase confirms that the new variable adds real value beyond just increasing model complexity.

When to Use Which Metric?

R-Squared vs. Adjusted R-Squared: Decision Guide
Scenario	Use R-Squared	Use Adjusted R-Squared
Describing a single model's fit	✅ Yes. "This model explains 70% of the variance."	Optional, but Adjusted R-squared is more honest.
Comparing models with different # of predictors	❌ No. It will mislead you.	✅ Yes. This is its primary purpose.
Checking if a new variable improves the model	❌ No. It will always say yes.	✅ Yes. It will only increase if the variable adds sufficient explanatory power.
Reporting results in academic papers	Often reported alongside.	✅ Almost always required as the primary fit statistic.

Key Takeaways

1. R-squared measures fit, Adjusted R-squared measures fit per predictor. R-squared tells you how good the model is. Adjusted R-squared tells you how good it is given how many variables you used.

2. Adjusted R-squared is always lower than or equal to R-squared. The gap widens as you add more variables relative to your sample size.

3. The rule for model selection is simple: When comparing models, choose the one with the highest Adjusted R-squared. This automatically balances explanatory power with model simplicity.

In practice, serious quantitative analysis always uses Adjusted R-squared for model comparison. R-squared alone is an incomplete and potentially misleading statistic.