This tutorial talks about interpretation of the most fundamental measure reported for models which is R Squared and Adjusted R Squared. We will try to give a clear guidelines for interpreting R Squared and Adjusted R Squared
Once we have fitted our model to data using Regression , we have to find out how well our model fits the data. R gives many goodness of fit statistic out of the box when we create a model. In this tutorial we will discuss about an important statistic called R-Squared ( R² ). We will also try to bust myths that Low R Squared values are always bad and High R Squared values are always good.
By the way you should look at R Squared only once your model passes Residual analysis test as mentioned R Tutorial : Residual Analysis for Regression and R Tutorial : How to use Diagnostic Plots for Regression Models
What is R Squared ?
R Squared is a measure which tells us how well our regression equation explains observed data values.
R Squared = ( Explained Variation in Observed Values) / (Total variation in Observed Values)
0% < = R Squared <= 100%
So R² = 67% implies that you have a regression equation which can explain 67% variation of observed values around mean.
Obviously when you add more predictor variables to regression equation which explain more variance you will get a higher R². Does it mean that when we compare 2 models on same data , the model with higher R² is always better than the model with lower R² ?
The answer is NO . Not always ! More predictor variables in a model implies more complexity which may have a side effect of Over fitting. So pure R² is not a very reliable measure. We need a measure which can tell us in absolute terms whether addition of new variable can explain variance worth of the additional Complexity.
Its for this reason that we use Adjusted R² .
What is Adjusted R² ?
Adjusted R² is a measure derived from R² which penalizes each addition of variable for additional complexity.
N = Sample Size
p = number of predictors
Please note that p is in denominator and increased p would b=mean a decreased Adj R² if R² does not increase enough and everything else remains constant.
Is Low R² always bad ?
NO. Desirable range of R² is highly domain dependent. Any model which attempts to predict Human behavior is seldom very precise and hence lower R² is expected. Where as for models in medicine and pharma R² values above 90% are very common.
Is High R² always good ?
NO. As mentioned in R Tutorial : Residual Analysis for Regression and R Tutorial : How to use Diagnostic Plots for Regression Models even if you have High R² but you have some inherent Residual pattern or the residuals are Heteroscedastic or if residuals are not normally distributed then the model is not considered good enough.
As a next step you should look at interpretation of F Statistic.
Enjoy Life and Keep Learning !