In this tutorial we will learn how to interpret another very important measure called F-Statistic which is thrown out to us in the summary of regression model by R.
We have already seen R Tutorial : Multiple Linear Regression and then we saw as next step R Tutorial : Residual Analysis for Regression and R Tutorial : How to use Diagnostic Plots for Regression Models . Once our model passes the residual analysis we can go ahead and check R Squared and Adjusted R Squared . As a last step of analysis of model we have to interpret and understand an important measure called F Statistic.
What is F Statistic in Regression Models ?
We have already discussed in R Tutorial : Multiple Linear Regression how to interpret P-values of t test for individual predictor variables to check if they are significant in the model or not.
Instead of judging coefficients of individual variables on their own for significance using t test , F statistic ( aka F- Test for overall significance in Regression ) judges on multiple coefficients taken together at the same time.
The model with zero predictor variables is also called “Intercept Only Model”. F – Test for overall significance compares a intercept only regression model with the current model. And then tries to comment on whether addition of these variables together is significant enough for them to be there or not.
The Hypothesis for F-Test for significance can be constructed as –
H0 : The fit of intercept only model and the current model is same. i.e. Additional variables do not provide value taken together
Ha : The fit of intercept only model is significantly less compared to our current model. i.e. Additional variables do make the model significantly better.
Without going into actual derivation of F statistic here is the short formula for calculating F statistic of a model –
Multiple R-squared: 0.9848, Adjusted R-squared: 0.9834
F-statistic: 670.4 on 3 and 31 DF, p-value: < 2.2e-16
Here in this example we had –
n = 35 ( Total number of observations )
k = 4 ( no of variables + 1 for intercept )
So degrees of freedom that we get are
DF Numerator = (k-1) = 3 – Matches with our DF as provided by R output
DF Denominator = (n-k ) = (35 – 4 ) = 31 – Matches with our DF as provided by R putput
F statistic that we get is –
F = [R² / (k-1)] / [ (1-R²) (n-k) ]
F = [ 0.9848 / 3 ] / [0.0152 /31 ]
F = 670 – Matches with the F Statistic as provided by R
P Value of F Statistic 670 for DF 3 and 31 is extremely small, i.e smaller that 0.001 so we can reject H0 and say that overall addition of variables is significantly improving the model. Which in a way implies that by adding those extra variables we were able to improve the fit of our model significantly.
How is F Statistic different from R Squared ?
R squared provides a measure of strength of relationship between our predictors and our response variable and it does not comment on whether the relationship is statistically significant. F Statistic gives us a power to judge whether that relationship is statistically significant in other words it comments on whether or R² is significant or not.
What should i do with F statistic in Regression model ?
- If my F-statistic is significant that gives me extra confidence on the R² value that i have got .
- In case i get insignificant F-Statistic or if p values for F are greater that level of significance ( say 0.05 or 0.01 ) then personally i would stay away from that model since i will not be able to confidently comment on the R² values
Hope you have learnt few intricacies of regression models by now. Next up I will be writing about Logistic regression models.
Till then Enjoy Life and Keep Learning !
Other previous articles that you may like –