best2013-classxiandxii

Embed Size (px)

Citation preview

  • 7/29/2019 best2013-classxiandxii

    1/15

    Business Econometrics using SAS

    Tools (BEST)

    Class XI and XII OLS BLUE and

    Assumption Errors

  • 7/29/2019 best2013-classxiandxii

    2/15

    Multicollinearity

    Multicollinearity occurs when two or more predictorsin the model are correlated and provide redundantinformation about the response.

    Example of multicollinear predictors are height andweight of a person, years of education and income, andassessed value and square footage of a home.

    Consequences of high multicollinearity:

    1. Increased standard error of estimates of the s(decreased reliability).

    2. Often confusing and misleading results.

  • 7/29/2019 best2013-classxiandxii

    3/15

    Example

    Suppose that we fit a model with x1 = income and x2 =years of education as predictors and y intake of fruitsand vegetables as the response.

    Years of education and income are correlated, and we

    expect a positive association of both with intake offruits and vegetables.

    ttests for the individual 1, 2 might suggest thatnone of the two predictors are significantly associatedto y, while the Ftest indicates that the model is usefulfor predicting y

    Contradiction? No! This has to do with interpretationofregression coefficients.

  • 7/29/2019 best2013-classxiandxii

    4/15

    F or t

    1 is the expected change in y due to x1 given x2 is alreadyin the

    model.

    2 is the expected change in y due to x2 given x1 is already

    in the model.

    Since both x1 and x2 contribute redundant informationabout y once one of the predictors is in the model, theother one does not have much more to contribute.

    This is why the Ftest indicates that at least one of thepredictors is important yet the individual tests indicate thatthe contribution of the predictor, given that the other onehas already been included, is not really important.

  • 7/29/2019 best2013-classxiandxii

    5/15

    Detecting Multicollinearity

    Easy way: compute correlations between all pairs ofpredictors. If some r are close to 1 or -1, remove one ofthe two correlated predictors from the model.

    Another way: calculate the variance inflation factors foreach predictor x:

    VIF = 1/(1-R-squarej)

    Where R-squarej = is the coefficient of determination

    of the model that includes all predictors except the jthpredictor.

    If VIF 10 then there is a problem withmulticollinearity.

  • 7/29/2019 best2013-classxiandxii

    6/15

    Solution for Multicollinearity

    If interest is only in estimation and prediction,multicollinearity can be ignored since it does not affecty or its standard error (either y or yy).

    Above is true only if the xp at which we wantestimation or prediction is within the range of the data.

    If the wish is to establish association patterns betweeny and the predictors, then analyst can:

    Eliminate some predictors from the model. Design an experiment in which the pattern ofcorrelation is broken.

  • 7/29/2019 best2013-classxiandxii

    7/15

    Multicollinearity on Polynomial

    Models

    Multicollinearity is a problem in polynomialregression (with terms of

    second and higher order): x and x-squared

    tend to be highly correlated. A special solution in polynomial models is to

    use zi = xi x bari instead of just xi

    That is, first subtract each predictor from itsmean and then use the deviations in themodel.

  • 7/29/2019 best2013-classxiandxii

    8/15

    Heteroscedasticity

    When the variance of ei depends on the value of yi we saythat the variances are heteroscedastic (unequal).

    Heteroscedasticity may occur for many reasons, buttypically occurs when responses are not normally

    distributed or when the errors are not additive in themodel.

    Examples of non-normal responses are:

    Poisson counts: number of sick days taken perperson/month, or number of traffic crashes per

    intersection/year. Binomial proportions: proportion of felons who arerepeat offenders or proportion of consumers whoprefer low carb foods.

  • 7/29/2019 best2013-classxiandxii

    9/15

    Checking normality assumption

    Even though we assume that e N(0, 2), moderatedepartures from the assumption of normality do nothave a big effect on the reliability of hypothesis tests oraccuracy of confidence intervals.

    There are several statistical tests to check normality,but they are reliable only in very large samples.

    Easiest type of check is graphical:

    Fit model and estimate residuals ei = yi yi

    Get a histogram of residuals and see if they looknormal.

  • 7/29/2019 best2013-classxiandxii

    10/15

    Solutions for Heteroscedasticity

    Unequal variances can often be ameliorated bytransforming the data:

    Appropriate transformation

    - Poisson counts: y- Binomial proportions: sin1 y

    - Multiplicative errors: log(y)

    Fit the model to the transformed response.

    Typically, transformed data will satisfy theassumption of homoscedasticity

  • 7/29/2019 best2013-classxiandxii

    11/15

    Serial Correlation (or Autocorrelation)

    Serial correlation occurs when oneobservations error term (i) is correlated withanother observations error term (j):

    Corr(i, j) 0 This usually happens because there is an

    important relationship (economic or

    otherwise) between the observations Especially problematic with Time Series data

    and Cluster Sampling

  • 7/29/2019 best2013-classxiandxii

    12/15

    Problem with Autocorrelation

    OLS estimates remain unbiased

    But, the OLS estimator is no longer the best(minimum variance) linear unbiased estimator

    Serial correlation implies that errors are partlypredictable.

    For example, with positive serial correlation, thena positive error today implies tomorrows error islikely to be positive also.

    The OLS estimator ignores this information; moreefficient estimators are available that do not.

  • 7/29/2019 best2013-classxiandxii

    13/15

    Testing for Serial Correlation

    There are a number of formal tests available

    In addition to formal tests always look at

    residual plots!!

    The most common test is the Durbin-Watson

    (DW) d Test

    The model needs to have an intercept term,

    and cant have a lagged dependent variable

    (i.e., Yt-1 as one of the independent variables)

  • 7/29/2019 best2013-classxiandxii

    14/15

    Durbin Watson Test

    if = 1, then we expect et= e

    t-1. When this is true, d 0.

    if = -1, then we expect et= - e

    t-1and e

    t- e

    t-1= -2 e

    t-1.

    When this is true, d 4.

    if = 0, then we expect d 2.

    Hence values of the test statistic far from 2 indicate

    that serial correlation is likely present.

  • 7/29/2019 best2013-classxiandxii

    15/15

    Solution

    Use OLS to estimate the regression and fix the standard errors

    A. We know OLS is unbiased, its just that the usual formula for

    the standard errors is wrong (and hence tests can be misleading)

    B. We can get consistent estimates of the standard errors (as

    the sample size goes to infinity, a consistent estimator getsarbitrarily close to the true value in a probabilistic sense)

    called Newey-West standard errors

    C. When specifying the regression, use coefficient covariance

    matrix option, and the HAC Newey-West method

    D. Most of the time, this approach is sufficient.

    E. If not, we cannot use OLS and need to use Generalized Least

    Squares.