# Tutorial : Linear Regression Construct

In  this tutorial we will discuss about structure of Linear regression and how a Linear regression Equation is constructed for 2 variable model.

Please go through the Tutorial on Concept of Linearity to understand the basic requirement of linear regression viz Linearity.

Lets consider a very simple data where

Price  = f (Demand )

 Price Demand Price Demand 1 48 3 44 1 49 4 35 1 50 4 38 1 51 4 42 2 44 5 36 2 45 5 39 2 46 5 40 2 47 6 32 2 48 6 35 3 40 6 37 3 42 6 36

Using excel scatter plot we plot the points and then add a linear Trendline which is nothing but a line using best fit linear regression equation.

All the scatter plot points are the actual observation values of  demand given a Price. When we created the linear regression line with equation y = -2.852x +51.59 , we  essentially created a prediction of demand at each price point and all our predictions lie on the line represented by the equation. Please Note our regression equation is of the form

Ý = b1 + b2X

For example our prediction for price 4 is 40, where as 3 of our observations for price 4 has actual demand as 35, 38 and 42 . This means that for every point which was observed, when we generated a prediction , we incurred error while generating the prediction.

For Sample –

Let us represent this error term by ei. Lets represent our Actual Demand as Yi for each i and our predicted demand for each i as Ýi . So we can represent our actual values as –

Yi = Ýi + ei                : This can be also written as

Yi = b1 + b2Xi + ei

Or

ei = Yi – b1 – b2 Xi                 ————————- ( I  )

Now this is based on limited finite sample so the key question is – Can we find b1 and b2 such that our overall error is minimized. The technique for doing this is called  Ordinary Least Squares (OLS)

So Here is what we want to do

Minimize  ∑ei²  =  ∑ ( Yi  –  Ýi )²  ———————( II )

Where : Yi = Actual Y value for ith item

Ýi = Predicted Y value for ith Item

Now we know from ( I ) above ei = Yi – b1 – b2 Xi   and ( II ) above

∑ei²  =   ( Yi – b1 – b2 Xi  )²           ——————–( III )

Hence ∑ei² = f(b1,b2)

So for given set of data different values of b1 and b2 will give rise to different ei values and thus a different ∑ei²

The OLS method is used to choose b1 and b2 in such a manner that we get a minimum ∑ei². OLS method uses differential calculus to get b1 and b2. Values of b1 and b2 that minimize        are obtained by solving the following two simultaneous equations :

∑Yi  = nb1 + b2 ∑Xi    and

∑YiXi  =  b1 ∑Xi  +  b2 ∑Xi ²

These are called least Squares Normal Equations. Solving these for b1 and b2 we get –

## Properties of OLS Estimators b1 and b2

1. Linearity – OLS Estimators are linear function of independent and dependent variables . i.e Y and X in our case.
2. Unbiasedness – Average of the estimators is equal to true population parameter.
3. Minimum variance – It has minimum variance in the class of all such linear unbiased estimators.

## Assumptions for OLS

1. The regression model is linear in Parameters
2. ei s do not systematically affect Yi. i.e there is no pattern to ei and all ei s are random.
3. The variance of ei for all observations for a given Xi are same.
4. There is no auto-correlation between error terms .
5. There is no correlation between e and the outcome variable Y
6. Number of observations > Number of parameters to be estimated
7. X (or Y) values must not all be the same ( Different X or Y are required )
8. Input variables should not have linear relationships with each other ( Multi – Co-linearity)

Next in the series :

R Tutorial : Basic 2 variable Linear Regression

R Tutorial : Multiple Linear Regression

R Tutorial : Residual Analysis for Regression

R Tutorial : How to use Diagnostic Plots for Regression Models

Reference : Based on Lectures by Dr. Manish Sinha. ( Associate Prof. SCMHRD )