best2013-classxiandxii

7/29/2019 best2013-classxiandxii

1/15

Business Econometrics using SAS

Tools (BEST)

Class XI and XII OLS BLUE and

Assumption Errors


2/15

Multicollinearity

Multicollinearity occurs when two or more predictorsin the model are correlated and provide redundantinformation about the response.

Example of multicollinear predictors are height andweight of a person, years of education and income, andassessed value and square footage of a home.

Consequences of high multicollinearity:

1. Increased standard error of estimates of the s(decreased reliability).

2. Often confusing and misleading results.


3/15

Example

Suppose that we fit a model with x1 = income and x2 =years of education as predictors and y intake of fruitsand vegetables as the response.

Years of education and income are correlated, and we

expect a positive association of both with intake offruits and vegetables.

ttests for the individual 1, 2 might suggest thatnone of the two predictors are significantly associatedto y, while the Ftest indicates that the model is usefulfor predicting y

Contradiction? No! This has to do with interpretationofregression coefficients.


4/15

F or t

1 is the expected change in y due to x1 given x2 is alreadyin the

model.

2 is the expected change in y due to x2 given x1 is already

in the model.

Since both x1 and x2 contribute redundant informationabout y once one of the predictors is in the model, theother one does not have much more to contribute.

This is why the Ftest indicates that at least one of thepredictors is important yet the individual tests indicate thatthe contribution of the predictor, given that the other onehas already been included, is not really important.


5/15

Detecting Multicollinearity

Easy way: compute correlations between all pairs ofpredictors. If some r are close to 1 or -1, remove one ofthe two correlated predictors from the model.

Another way: calculate the variance inflation factors foreach predictor x:

VIF = 1/(1-R-squarej)

Where R-squarej = is the coefficient of determination

of the model that includes all predictors except the jthpredictor.

If VIF 10 then there is a problem withmulticollinearity.


6/15

Solution for Multicollinearity

If interest is only in estimation and prediction,multicollinearity can be ignored since it does not affecty or its standard error (either y or yy).

Above is true only if the xp at which we wantestimation or prediction is within the range of the data.

If the wish is to establish association patterns betweeny and the predictors, then analyst can:

Eliminate some predictors from the model. Design an experiment in which the pattern ofcorrelation is broken.


7/15

Multicollinearity on Polynomial

Models

Multicollinearity is a problem in polynomialregression (with terms of

second and higher order): x and x-squared

tend to be highly correlated. A special solution in polynomial models is to

use zi = xi x bari instead of just xi

That is, first subtract each predictor from itsmean and then use the deviations in themodel.


8/15

Heteroscedasticity

When the variance of ei depends on the value of yi we saythat the variances are heteroscedastic (unequal).

Heteroscedasticity may occur for many reasons, buttypically occurs when responses are not normally

distributed or when the errors are not additive in themodel.

Examples of non-normal responses are:

Poisson counts: number of sick days taken perperson/month, or number of traffic crashes per

intersection/year. Binomial proportions: proportion of felons who arerepeat offenders or proportion of consumers whoprefer low carb foods.


9/15

Checking normality assumption

Even though we assume that e N(0, 2), moderatedepartures from the assumption of normality do nothave a big effect on the reliability of hypothesis tests oraccuracy of confidence intervals.

There are several statistical tests to check normality,but they are reliable only in very large samples.

Easiest type of check is graphical:

Fit model and estimate residuals ei = yi yi

Get a histogram of residuals and see if they looknormal.


10/15

Solutions for Heteroscedasticity

Unequal variances can often be ameliorated bytransforming the data:

Appropriate transformation

- Poisson counts: y- Binomial proportions: sin1 y

- Multiplicative errors: log(y)

Fit the model to the transformed response.

Typically, transformed data will satisfy theassumption of homoscedasticity


11/15

Serial Correlation (or Autocorrelation)

Serial correlation occurs when oneobservations error term (i) is correlated withanother observations error term (j):

Corr(i, j) 0 This usually happens because there is an

important relationship (economic or

otherwise) between the observations Especially problematic with Time Series data

and Cluster Sampling


12/15

Problem with Autocorrelation

OLS estimates remain unbiased

But, the OLS estimator is no longer the best(minimum variance) linear unbiased estimator

Serial correlation implies that errors are partlypredictable.

For example, with positive serial correlation, thena positive error today implies tomorrows error islikely to be positive also.

The OLS estimator ignores this information; moreefficient estimators are available that do not.


13/15

Testing for Serial Correlation

There are a number of formal tests available

In addition to formal tests always look at

residual plots!!

The most common test is the Durbin-Watson

(DW) d Test

The model needs to have an intercept term,

and cant have a lagged dependent variable

(i.e., Yt-1 as one of the independent variables)


14/15

Durbin Watson Test

if = 1, then we expect et= e

t-1. When this is true, d 0.

if = -1, then we expect et= - e

t-1and e

t- e

t-1= -2 e

t-1.

When this is true, d 4.

if = 0, then we expect d 2.

Hence values of the test statistic far from 2 indicate

that serial correlation is likely present.


15/15

Solution

Use OLS to estimate the regression and fix the standard errors

A. We know OLS is unbiased, its just that the usual formula for

the standard errors is wrong (and hence tests can be misleading)

B. We can get consistent estimates of the standard errors (as

the sample size goes to infinity, a consistent estimator getsarbitrarily close to the true value in a probabilistic sense)

called Newey-West standard errors

C. When specifying the regression, use coefficient covariance

matrix option, and the HAC Newey-West method

D. Most of the time, this approach is sufficient.

E. If not, we cannot use OLS and need to use Generalized Least

Squares.

Documents

best2013-classxiandxii