In this tutorial we will try our hands on a very basic 2 variable linear regression using R. We will also learn how to interpret output given by R and tryout various visualizations required for interpreting simple Linear regression.

##### Please also read though following Tutorials to get more familiarity on R and Linear regression background.

R : Basic Data Analysis – Part 1

R Tutorial : Intermediate Data Analysis – Part 2

Tutorial : Concept of Linearity in Linear Regression

Tutorial : Linear Regression Construct

Technique :2 variable Linear Regression

When to use :When our output variable is numeric

No of variables :2

Model Readability :High

For this tutorial we will be using csv version of the excel file uploaded here INCOME-SAVINGS . As always please save the file and then convert it to .csv using save as from excel.

### Step 1 : Read File

income = read.csv (&amp;amp;amp;quot;INCOME-SAVINGS.csv&amp;amp;amp;quot;) str( income )

'data.frame': 22 obs. of 3 variables: $ YEAR : int 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 ... $ SAVINGS: int 12298 14196 17320 19995 23601 24213 26881 30896 33787 38091 ... $ INCOME : int 64968 69233 73824 85267 91507 99632 123067 142181 157291 185749 ...

So we have 3 variables YEAR, SAVINGS and INCOME

### Step 2 : Identify the output variable and input variable

since : savings = f (income)

**output variable** = SAVINGS

**Input variable** : INCOME

### Step 3 : Scatter plot

plot(income$SAVINGS,income$INCOME,xlab=&amp;amp;quot;Income&amp;amp;quot;,ylab = &amp;amp;quot;Savings&amp;amp;quot; , main = &amp;amp;quot;Savings vs Income&amp;amp;quot;, col ='red')

From the plot it is clear that we have a positive linear relationship between Income and savings and we can use linear regression to predict Savings given the Incomes.

### Step 4 : Construct a Linear model using R

linearmodel = lm (SAVINGS ~ INCOME , data = income) summary(linearmodel)

Please Note the first argument to function is SAVINGS ~ INCOME . This argument is of type formula and is usually of the form

**Dependent_Variable ~ Independent_Variables**

Output of Summary(linearmodel) :Call: lm(formula = SAVINGS ~ INCOME, data = income) Residuals: Min 1Q Median 3Q Max -13036.3 -4958.9 -316.9 5368.1 16969.3 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -1.099e+04 2.459e+03 -4.469 0.000235 *** INCOME 2.970e-01 6.012e-03 49.402 < 2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 7327 on 20 degrees of freedom Multiple R-squared: 0.9919, Adjusted R-squared: 0.9915 F-statistic: 2441 on 1 and 20 DF, p-value: < 2.2e-16

### Step 5 : Interpretation of output

The Model that is generated for us is ( Numbers in RED are coefficients of variables in our Linear regression equation )

**SAVINGS = -10990 + 0.297 * INCOME**

Please notice that the p-value for INCOME ( values in GREEN) i.e Pr(>|t|) is significant ( i.e less than 0.05 ) and hence the variable is significant in predicting the SAVINGS. If we do not have significant p-value corresponding to the variable we may choose to ignore that variable.

Next number that we have to be aware of is R-squared . In our case Adj R Squared is 0.9915 which implies that the model is able to explain 99% variation in our data . The ideal R-squared value is domain specific. But typically anything above 70% is assumed to be very good and the model is supposed to be a good model for prediction.

We will delve into details of R Squared , t value , residuals and F statistic in subsequent tutorial. For this discussion we can safely ignore them.

The model can be interpreted as – ” When Income rises by 1 unit , the Savings rise by 0.297 units”

Now whenever we have any value of INCOME we can calculate SAVINGS using the equation –

**SAVINGS = -10990 + 0.297 * INCOME**

I sincerely hope you enjoyed the tutorial , please post your feedback and comments and share other articles on the site.

As a next step to analyzing model you should also go through Residual Analysis , look at Adjusted R Squared values and interpret F statistic.

Till next time Happy Learning.

Next in the series :

R Tutorial : Multiple Linear Regression

R Tutorial : Residual Analysis for Regression

R Tutorial : How to use Diagnostic Plots for Regression Models