Upload
pulkit-arora
View
217
Download
0
Embed Size (px)
Citation preview
7/29/2019 best2013-classxiandxii
1/15
Business Econometrics using SAS
Tools (BEST)
Class XI and XII OLS BLUE and
Assumption Errors
7/29/2019 best2013-classxiandxii
2/15
Multicollinearity
Multicollinearity occurs when two or more predictorsin the model are correlated and provide redundantinformation about the response.
Example of multicollinear predictors are height andweight of a person, years of education and income, andassessed value and square footage of a home.
Consequences of high multicollinearity:
1. Increased standard error of estimates of the s(decreased reliability).
2. Often confusing and misleading results.
7/29/2019 best2013-classxiandxii
3/15
Example
Suppose that we fit a model with x1 = income and x2 =years of education as predictors and y intake of fruitsand vegetables as the response.
Years of education and income are correlated, and we
expect a positive association of both with intake offruits and vegetables.
ttests for the individual 1, 2 might suggest thatnone of the two predictors are significantly associatedto y, while the Ftest indicates that the model is usefulfor predicting y
Contradiction? No! This has to do with interpretationofregression coefficients.
7/29/2019 best2013-classxiandxii
4/15
F or t
1 is the expected change in y due to x1 given x2 is alreadyin the
model.
2 is the expected change in y due to x2 given x1 is already
in the model.
Since both x1 and x2 contribute redundant informationabout y once one of the predictors is in the model, theother one does not have much more to contribute.
This is why the Ftest indicates that at least one of thepredictors is important yet the individual tests indicate thatthe contribution of the predictor, given that the other onehas already been included, is not really important.
7/29/2019 best2013-classxiandxii
5/15
Detecting Multicollinearity
Easy way: compute correlations between all pairs ofpredictors. If some r are close to 1 or -1, remove one ofthe two correlated predictors from the model.
Another way: calculate the variance inflation factors foreach predictor x:
VIF = 1/(1-R-squarej)
Where R-squarej = is the coefficient of determination
of the model that includes all predictors except the jthpredictor.
If VIF 10 then there is a problem withmulticollinearity.
7/29/2019 best2013-classxiandxii
6/15
Solution for Multicollinearity
If interest is only in estimation and prediction,multicollinearity can be ignored since it does not affecty or its standard error (either y or yy).
Above is true only if the xp at which we wantestimation or prediction is within the range of the data.
If the wish is to establish association patterns betweeny and the predictors, then analyst can:
Eliminate some predictors from the model. Design an experiment in which the pattern ofcorrelation is broken.
7/29/2019 best2013-classxiandxii
7/15
Multicollinearity on Polynomial
Models
Multicollinearity is a problem in polynomialregression (with terms of
second and higher order): x and x-squared
tend to be highly correlated. A special solution in polynomial models is to
use zi = xi x bari instead of just xi
That is, first subtract each predictor from itsmean and then use the deviations in themodel.
7/29/2019 best2013-classxiandxii
8/15
Heteroscedasticity
When the variance of ei depends on the value of yi we saythat the variances are heteroscedastic (unequal).
Heteroscedasticity may occur for many reasons, buttypically occurs when responses are not normally
distributed or when the errors are not additive in themodel.
Examples of non-normal responses are:
Poisson counts: number of sick days taken perperson/month, or number of traffic crashes per
intersection/year. Binomial proportions: proportion of felons who arerepeat offenders or proportion of consumers whoprefer low carb foods.
7/29/2019 best2013-classxiandxii
9/15
Checking normality assumption
Even though we assume that e N(0, 2), moderatedepartures from the assumption of normality do nothave a big effect on the reliability of hypothesis tests oraccuracy of confidence intervals.
There are several statistical tests to check normality,but they are reliable only in very large samples.
Easiest type of check is graphical:
Fit model and estimate residuals ei = yi yi
Get a histogram of residuals and see if they looknormal.
7/29/2019 best2013-classxiandxii
10/15
Solutions for Heteroscedasticity
Unequal variances can often be ameliorated bytransforming the data:
Appropriate transformation
- Poisson counts: y- Binomial proportions: sin1 y
- Multiplicative errors: log(y)
Fit the model to the transformed response.
Typically, transformed data will satisfy theassumption of homoscedasticity
7/29/2019 best2013-classxiandxii
11/15
Serial Correlation (or Autocorrelation)
Serial correlation occurs when oneobservations error term (i) is correlated withanother observations error term (j):
Corr(i, j) 0 This usually happens because there is an
important relationship (economic or
otherwise) between the observations Especially problematic with Time Series data
and Cluster Sampling
7/29/2019 best2013-classxiandxii
12/15
Problem with Autocorrelation
OLS estimates remain unbiased
But, the OLS estimator is no longer the best(minimum variance) linear unbiased estimator
Serial correlation implies that errors are partlypredictable.
For example, with positive serial correlation, thena positive error today implies tomorrows error islikely to be positive also.
The OLS estimator ignores this information; moreefficient estimators are available that do not.
7/29/2019 best2013-classxiandxii
13/15
Testing for Serial Correlation
There are a number of formal tests available
In addition to formal tests always look at
residual plots!!
The most common test is the Durbin-Watson
(DW) d Test
The model needs to have an intercept term,
and cant have a lagged dependent variable
(i.e., Yt-1 as one of the independent variables)
7/29/2019 best2013-classxiandxii
14/15
Durbin Watson Test
if = 1, then we expect et= e
t-1. When this is true, d 0.
if = -1, then we expect et= - e
t-1and e
t- e
t-1= -2 e
t-1.
When this is true, d 4.
if = 0, then we expect d 2.
Hence values of the test statistic far from 2 indicate
that serial correlation is likely present.
7/29/2019 best2013-classxiandxii
15/15
Solution
Use OLS to estimate the regression and fix the standard errors
A. We know OLS is unbiased, its just that the usual formula for
the standard errors is wrong (and hence tests can be misleading)
B. We can get consistent estimates of the standard errors (as
the sample size goes to infinity, a consistent estimator getsarbitrarily close to the true value in a probabilistic sense)
called Newey-West standard errors
C. When specifying the regression, use coefficient covariance
matrix option, and the HAC Newey-West method
D. Most of the time, this approach is sufficient.
E. If not, we cannot use OLS and need to use Generalized Least
Squares.