R Tutorial : Interpretation of R Squared and Adjusted R Squared in Regression

This tutorial talks about interpretation of the most fundamental measure reported for models which is R Squared and Adjusted R Squared.  We will try to give a clear guidelines for interpreting R Squared and Adjusted R Squared

Once we have fitted our model to data using Regression , we have to find out how well our model fits the data. R gives many goodness of fit statistic out of the box when we create a model. In this tutorial we will discuss about an important statistic called R-Squared ( R² ). We will also try to bust myths that Low R Squared values are always bad and High R Squared values are always good.

By the way you should look at R Squared only once your model passes Residual analysis test as mentioned R Tutorial : Residual Analysis for Regression and R Tutorial : How to use Diagnostic Plots for Regression Models

What is R Squared ?

R Squared is a measure which tells us how well our regression equation explains observed data values.

R Squared =  ( Explained Variation in Observed Values) / (Total variation in Observed Values)

rsq35

0% < =  R Squared   <= 100%

So R² = 67% implies that you have a regression equation which can explain 67% variation of observed values around mean.

Obviously when you add more predictor variables to regression equation which explain more variance you will get a higher R². Does it mean that when we compare 2 models on same data , the model with higher R² is always better than the model with lower R² ?

The answer is NO . Not always ! More predictor variables in a model implies more complexity which may have a side effect of Over fitting. So pure R² is not a very reliable measure. We need a measure which can tell us in absolute terms whether addition of new variable can explain variance worth of the additional Complexity.

Its for this reason that we use Adjusted R² .

What is Adjusted R²  ?

Adjusted R² is a measure derived from R² which penalizes each addition of variable for additional complexity.

adjr2

N = Sample Size

p = number of predictors

Please note that p is in denominator and increased p would b=mean a decreased Adj R² if R² does not increase enough and everything else remains constant.

Is Low R² always bad ?

NO. Desirable range of R² is highly domain dependent. Any model which attempts to predict Human behavior is seldom very precise and hence lower R² is expected. Where as for models in medicine and pharma R² values above 90% are very common.

Is High  R² always good ?

NO. As mentioned in R Tutorial : Residual Analysis for Regression and R Tutorial : How to use Diagnostic Plots for Regression Models even if you have High R² but you have some inherent Residual pattern or the residuals are Heteroscedastic or if residuals are not normally distributed then the model is not considered good enough.

As a next step you should look at interpretation of F Statistic.

Enjoy Life and Keep Learning !

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s