# R Tutorial : Basic 2 variable Linear Regression

In this tutorial we will try our hands on a very basic 2 variable linear regression using R. We will also learn how to interpret output given by R and tryout various visualizations required for interpreting simple Linear regression.

##### Please also read though following Tutorials to get more familiarity on R and Linear regression background.

R : Basic Data Analysis – Part 1

R Tutorial : Intermediate Data Analysis – Part 2

Tutorial : Concept of Linearity in Linear Regression

Tutorial : Linear Regression Construct

Technique : 2 variable Linear Regression

When to use : When our output variable is numeric

No of variables : 2

For this tutorial we will be using csv version of the excel file uploaded here INCOME-SAVINGS . As always please save the file and then convert it to .csv using save as from excel.

### Step 1 : Read File

```income = read.csv (&amp;amp;amp;amp;quot;INCOME-SAVINGS.csv&amp;amp;amp;amp;quot;)
str( income )
```
```'data.frame': 22 obs. of 3 variables:
\$ YEAR : int 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 ...
\$ SAVINGS: int 12298 14196 17320 19995 23601 24213 26881 30896 33787 38091 ...
\$ INCOME : int 64968 69233 73824 85267 91507 99632 123067 142181 157291 185749 ...

```

So we have 3 variables YEAR, SAVINGS and INCOME

### Step 2 : Identify the output variable and input variable

since : savings  = f (income)

output variable = SAVINGS

Input variable : INCOME

### Step 3 : Scatter plot

```plot(income\$SAVINGS,income\$INCOME,xlab=&amp;amp;amp;quot;Income&amp;amp;amp;quot;,ylab = &amp;amp;amp;quot;Savings&amp;amp;amp;quot; , main = &amp;amp;amp;quot;Savings vs Income&amp;amp;amp;quot;, col ='red')
```

### From the plot it is clear that we have a positive linear relationship between Income and savings and we can use linear regression to predict Savings given the Incomes.

### Step 4 : Construct a Linear model using R

```linearmodel = lm (SAVINGS ~ INCOME , data = income)
summary(linearmodel)
```

Please Note the first argument to function is SAVINGS ~ INCOME . This argument is of type formula and is usually of the form
Dependent_Variable ~ Independent_Variables

```Output of Summary(linearmodel) :
Call:
lm(formula = SAVINGS ~ INCOME, data = income)

Residuals:
Min       1Q   Median       3Q      Max
-13036.3  -4958.9   -316.9   5368.1  16969.3

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.099e+04  2.459e+03  -4.469 0.000235 ***
INCOME       2.970e-01  6.012e-03  49.402  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7327 on 20 degrees of freedom
Multiple R-squared:  0.9919,	Adjusted R-squared:  0.9915
F-statistic:  2441 on 1 and 20 DF,  p-value: < 2.2e-16
```

### Step 5 : Interpretation of output

The Model that is generated for us is  ( Numbers in RED are coefficients of variables in our Linear regression equation )

SAVINGS = -10990 + 0.297 * INCOME

Please notice that the p-value for INCOME ( values in GREEN) i.e Pr(>|t|) is significant ( i.e less than 0.05 ) and hence the variable is significant in predicting the SAVINGS. If we do not have significant p-value corresponding to the variable we may choose to ignore that variable.

Next number that we have to be aware of is R-squared . In our case Adj R Squared is 0.9915 which implies that the model is able to explain 99% variation in our data . The ideal R-squared value is domain specific. But typically anything above 70% is assumed to be very good and the model is supposed to be a good model for prediction.

We will delve into details of R Squared , t value , residuals  and F statistic in subsequent tutorial. For this discussion we can safely ignore them.

The model can be interpreted as – ” When Income rises by 1 unit , the Savings rise by 0.297 units”

Now whenever we have any value of INCOME we can calculate SAVINGS using the equation –

SAVINGS = -10990 + 0.297 * INCOME

I sincerely hope you enjoyed the tutorial , please post your feedback and comments and share other articles on the site.

As a next step to analyzing model you should also go through Residual Analysis , look at Adjusted R Squared values and interpret F statistic.

Till next time Happy Learning.

Next in the series :

R Tutorial : Multiple Linear Regression

R Tutorial : Residual Analysis for Regression

R Tutorial : How to use Diagnostic Plots for Regression Models