In this tutorial we will discuss about structure of Linear regression and how a Linear regression Equation is constructed for 2 variable model.
Please go through the Tutorial on Concept of Linearity to understand the basic requirement of linear regression viz Linearity.
Lets consider a very simple data where
Price = f (Demand )
Price | Demand | Price | Demand | |
1 | 48 | 3 | 44 | |
1 | 49 | 4 | 35 | |
1 | 50 | 4 | 38 | |
1 | 51 | 4 | 42 | |
2 | 44 | 5 | 36 | |
2 | 45 | 5 | 39 | |
2 | 46 | 5 | 40 | |
2 | 47 | 6 | 32 | |
2 | 48 | 6 | 35 | |
3 | 40 | 6 | 37 | |
3 | 42 | 6 | 36 |
Using excel scatter plot we plot the points and then add a linear Trendline which is nothing but a line using best fit linear regression equation.
All the scatter plot points are the actual observation values of demand given a Price. When we created the linear regression line with equation y = -2.852x +51.59 , we essentially created a prediction of demand at each price point and all our predictions lie on the line represented by the equation. Please Note our regression equation is of the form
Ý = b1 + b2X
For example our prediction for price 4 is 40, where as 3 of our observations for price 4 has actual demand as 35, 38 and 42 . This means that for every point which was observed, when we generated a prediction , we incurred error while generating the prediction.
For Sample –
Let us represent this error term by ei. Lets represent our Actual Demand as Yi for each i and our predicted demand for each i as Ýi . So we can represent our actual values as –
Yi = Ýi + ei : This can be also written as
Yi = b1 + b2Xi + ei
Or
ei = Yi – b1 – b2 Xi ————————- ( I )
Now this is based on limited finite sample so the key question is – Can we find b1 and b2 such that our overall error is minimized. The technique for doing this is called Ordinary Least Squares (OLS)
So Here is what we want to do
Minimize ∑ei² = ∑ ( Yi – Ýi )² ———————( II )
Where : Yi = Actual Y value for ith item
Ýi = Predicted Y value for ith Item
Now we know from ( I ) above ei = Yi – b1 – b2 Xi and ( II ) above
∑ei² = ( Yi – b1 – b2 Xi )² ——————–( III )
Hence ∑ei² = f(b1,b2)
So for given set of data different values of b1 and b2 will give rise to different ei values and thus a different ∑ei²
The OLS method is used to choose b1 and b2 in such a manner that we get a minimum ∑ei². OLS method uses differential calculus to get b1 and b2. Values of b1 and b2 that minimize are obtained by solving the following two simultaneous equations :
∑Yi = nb1 + b2 ∑Xi and
∑YiXi = b1 ∑Xi + b2 ∑Xi ²
These are called least Squares Normal Equations. Solving these for b1 and b2 we get –
Properties of OLS Estimators b1 and b2
- Linearity – OLS Estimators are linear function of independent and dependent variables . i.e Y and X in our case.
- Unbiasedness – Average of the estimators is equal to true population parameter.
- Minimum variance – It has minimum variance in the class of all such linear unbiased estimators.
Assumptions for OLS
- The regression model is linear in Parameters
- ei s do not systematically affect Yi. i.e there is no pattern to ei and all ei s are random.
- The variance of ei for all observations for a given Xi are same.
- There is no auto-correlation between error terms .
- There is no correlation between e and the outcome variable Y
- Number of observations > Number of parameters to be estimated
- X (or Y) values must not all be the same ( Different X or Y are required )
- Input variables should not have linear relationships with each other ( Multi – Co-linearity)
Next in the series :
R Tutorial : Basic 2 variable Linear Regression
R Tutorial : Multiple Linear Regression
R Tutorial : Residual Analysis for Regression
R Tutorial : How to use Diagnostic Plots for Regression Models
Reference : Based on Lectures by Dr. Manish Sinha. ( Associate Prof. SCMHRD )
There is quotation from Tagore ( i might have mentioned in the past ) which is relevant here. In these methods we acknowledge the presence or inevitability of error and think abt how to minimize the impact of error. If you shut the door on error , truth , also, willbe shut out. … Ravindranath Tagore Or in popular terms one shpuls not throw out the bahy with the bath water. Concept of linearity is subtle y = m x + c is not linear function of x even though thee graph is a straight line! I have not read all the posts, but intend to do so. Bye Sudir
Sent from Yahoo Mail on Android
LikeLiked by 1 person
Agreeing completely here.
A beautiful book called “Fooled by Randomness ” by Naseem taleb understcores the fact of inevitability of Randomness in life and how and why one should not try to find meaning from such randomness.
LikeLike