This tutorial talks about interpretation of the most fundamental measure reported for models which is R Squared and Adjusted R Squared. We will try to give a clear guidelines for interpreting R Squared and Adjusted R Squared

Once we have fitted our model to data using Regression , we have to find out how well our model fits the data. R gives many goodness of fit statistic out of the box when we create a model. In this tutorial we will discuss about an important statistic called R-Squared ( R² ). We will also try to bust myths that Low R Squared values are always bad and High R Squared values are always good.

By the way you should look at R Squared only once your model passes Residual analysis test as mentioned R Tutorial : Residual Analysis for Regression and R Tutorial : How to use Diagnostic Plots for Regression Models

### What is R Squared ?

R Squared is a measure which tells us how well our regression equation explains observed data values.

R Squared = ( Explained Variation in Observed Values) / (Total variation in Observed Values)

**0% < = R Squared <= 100%**

So **R² = 67%** implies that you have a regression equation which can explain 67% variation of observed values around mean.

Obviously when you add more predictor variables to regression equation which explain more variance you will get a higher R². Does it mean that when we compare 2 models on same data , the model with higher R² is always better than the model with lower R² ?

The answer is **NO** . Not always ! More predictor variables in a model implies more complexity which may have a side effect of Over fitting. So pure R² is not a very reliable measure. We need a measure which can tell us in absolute terms whether addition of new variable can explain variance worth of the additional Complexity.

Its for this reason that we use Adjusted R² .

### What is Adjusted R² ?

Adjusted R² is a measure derived from R² which penalizes each addition of variable for additional complexity.

N = Sample Size

p = number of predictors

Please note that p is in denominator and increased p would b=mean a decreased Adj R² if R² does not increase enough and everything else remains constant.

### Is Low R² always bad ?

NO. Desirable range of R² is highly domain dependent. Any model which attempts to predict Human behavior is seldom very precise and hence lower R² is expected. Where as for models in medicine and pharma R² values above 90% are very common.

### Is High R² always good ?

NO. As mentioned in R Tutorial : Residual Analysis for Regression and R Tutorial : How to use Diagnostic Plots for Regression Models even if you have High R² but you have some inherent Residual pattern or the residuals are Heteroscedastic or if residuals are not normally distributed then the model is not considered good enough.

As a next step you should look at interpretation of F Statistic.

Enjoy Life and Keep Learning !