44
Intermediate Econometrics Nguyễn Ngọc Anh Nguyễn Hà Trang DEPOCEN Hanoi, 25 May 2012

Intermediate Econometrics

  • Upload
    geri

  • View
    99

  • Download
    2

Embed Size (px)

DESCRIPTION

Intermediate Econometrics . Nguyễn Ngọc Anh Nguyễn Hà Trang DEPOCEN Hanoi, 25 May 2012. Dự kiến Nội dung khoá học. 5 Lecture and 5 practical sessions Simple and multiple regression model review IV Models Discrete choice model – 1 Random utility model Logit / probit Multinomial logit - PowerPoint PPT Presentation

Citation preview

Page 1: Intermediate Econometrics

Intermediate Econometrics

Nguyễn Ngọc AnhNguyễn Hà Trang

DEPOCENHanoi, 25 May 2012

Page 2: Intermediate Econometrics

Dự kiến Nội dung khoá học

• 5 Lecture and 5 practical sessions– Simple and multiple regression model review– IV Models– Discrete choice model – 1

• Random utility model• Logit/probit• Multinomial logit

– Discrete choice model – 2• Ordinal choice model• Poison model

– Panel data

Page 3: Intermediate Econometrics

Statistical Review

• Populations, Parameters and Random Sampling– Use statistical inference to learn something about a population– Population: Complete group of agents – Typically only observe a sample of data– Random sampling: Drawing random samples from a population– Know everything about the distribution of the population except for

one parameter– Use statistical tools to say something about the unknown parameter

• Estimation and hypothesis testing

Page 4: Intermediate Econometrics

Statistical ReviewEstimators and Estimates:

– Given a random sample drawn from a population distribution that depends on an unknown parameter , an estimator of is a rule that assigns each possible outcome of the sample a value of

– Examples:• Estimator for the population mean• Estimator for the variance of the population distribution

– An estimator is given by some function of the r.v.s– This yields a (point) estimate which is itself a r.v.– Distribution of estimator is the sampling distribution– Criteria for selecting estimators

Page 5: Intermediate Econometrics

Statistical ReviewFinite sample properties of estimators:

– UnbiasednessAn estimator of is unbiased if for all values of i.e., on average the estimator is correct

If not unbiased then the extent of the bias is measured as

Extent of bias depends on underlying distribution of population and estimator that is usedChoose the estimator to minimise the bias

ˆE

ˆˆ EBias

Page 6: Intermediate Econometrics

Statistical ReviewFinite sample properties of estimators:

– EfficiencyWhat about the dispersion of the distribution of the estimator?i.e, how likely is it that the estimate is close to the true parameter?Useful summary measure for the dispersion in the distribution is the sampling variance.An efficient estimator is one which has the least amount of dispersion about the mean i.e. the one that has the smallest sampling variance

If and are two unbiased estimators of , is efficient

relative to when for all , with strict inequality

for at least one value of .

1 2 1

2 21 ˆˆ VV

Page 7: Intermediate Econometrics

Statistical Review3. Finite sample properties of estimators:

– EfficiencyWhat if estimators are not unbiased?Estimator with lowest Mean Square Error (MSE) is more efficient:

Example:Compare the small sample properties of the following estimates of the population mean

2ˆˆˆ BiasVMSE

2ˆˆ EMSE

n

iiYn 1

11

n

iiYn 1

2 41

Page 8: Intermediate Econometrics

Statistical Review

Asymptotic Properties of Estimators– How do estimators behave if we have very large samples – as n

increases to infinity?– Consistency

How far is the estimator likely to be from the parameter it is estimating as the sample size increases indefinitely.

is a consistent estimator of θ if for every ε>0

This is known as convergence in probabilityThe above can also be written as:

nP as 0ˆ

ˆplimn

Page 9: Intermediate Econometrics

Statistical ReviewAsymptotic Properties of Estimators

– Consistency (continued)Sufficient condition for consistency:Bias and the variance both tend to zero as the sample size increases indefinitely. That is:

nMSE as 0

Y

nplim

Law of Large numbers: an important result in statistics

When estimating a population average, the larger n the closer to the true population average the estimate will be

Page 10: Intermediate Econometrics

Statistical Review

Asymptotic Properties of Estimators– Asymptotic Efficiency

Compares the variance of the asymptotic distribution of two estimatorsA consistent estimator of is asymptotically efficient if its asymptotic variance is smaller than the asymptotic variance of all other consistent estimators of

Page 11: Intermediate Econometrics

Statistical Review

Asymptotic Properties of Estimators– Asymptotic Normality

An estimator is said to be asymptotically normally distributed if its sampling distribution tends to approach the normal distribution as the sample size increases indefinitely.

The Central Limit Theorem: average from a random sample for any population with finite variance, when standardized, has an asymptotic normal distribution.

1,0.~ NAsyn

YZ

Page 12: Intermediate Econometrics

Statistical Review

Approaches to parameter estimation– Method of Moments (MM)

Moment: Summary statistic of a population distribution (e.g. mean, variance)MM: replaces population moments with sample counterparts

Examples:Estimate population mean μ with(unbiased and consistent)Estimation population variance σ2 with(consistent but biased)

n

ii YYn

1

1

n

ii YYn

1

212

Page 13: Intermediate Econometrics

Statistical Review

Approaches to parameter estimation– Maximum Likelihood Estimation (MLE)

Let {Y1,Y2,……,Yn} be a random sample from a population distribution defined by the density function f(Y|θ)The likelihood function is the joint density of the n independently and identically distributed observations given by:

in

iin YLYfYYYf ;||,....,,

121

n

iiYfYL

1|ln|ln The log likelihood is given by:

The likelihood principle:Choose the estimator of θ that maximises the likelihood of observing the actual sample

MLE is the most efficient estimate but correct specification required for consistency

Page 14: Intermediate Econometrics

Statistical Review

Approaches to parameter estimation– Least Squares Estimation

Minimise the sum of the squared deviations between the actual and the sample valuesExample: Find the least squares estimator of the population mean

SYn

ii minargminargˆ

1

2

The least squares, ML and MM estimator of the population mean is the sample average

Page 15: Intermediate Econometrics

Statistical Review

Interval Estimation and Confidence Intervals– How do we know how accurate an estimate is?– A confidence interval estimates a population parameter within a

range of possible values at a specified probability, called the level of confidence, using information from a known distribution – the standard normal distribution

– Let {Y1,Y2,……,Yn} be a random sample from a population with a normal distribution with mean μ and variance σ2: Yi~N(μ,σ2)The distribution of the sample average will be:Standardising:

– Using what we know about the standard normal distribution we can construct a 95% confidence interval:

nNY 2,~

1,0~ Nn

Y

95.096.196.1Pr

nY

Page 16: Intermediate Econometrics

Statistical Review

Interval Estimation and Confidence Intervals– Re-arranging:

1~

ntnsY

212

111

n

ii YY

ns

95.096.196.1Pr

n

Yn

Y

What if σ unknown?An unbiased estimator of σ

95% confidence interval given by:

nstY

nstY nn 2,12,1 ,

Page 17: Intermediate Econometrics

Statistical ReviewHypothesis Testing

– Hypothesis: statement about a popn. developed for the purpose of testing– Hypothesis testing: procedure based on sample evidence and probability

theory to determine whether the hypothesis is a reasonable statement.– Steps:

1. State the null (H0 ) and alternate (HA ) hypothesesNote distinction between one and two-tailed tests

2. State the level of significanceProbability of rejecting H0 when it is true (Type I Error)Note: Type II Error – failing to reject H0 when it is falsePower of the test: 1-Pr(Type II error)

3. Select a test statisticBased on sample info., follows a known distribution)

4. Formulate decision ruleConditions under which null hypothesis is rejected. Based on critical value from known probability distribution.

5. Compute the value of the test statistic, make a decision, interpret the results.

Page 18: Intermediate Econometrics

Statistical ReviewHypothesis Testing

– P-value:Alternative means of evaluating decision rule

Probability of observing a sample value as extreme as, or more extreme than the value observed when the null hypothesis is true• If the p-value is greater than the significance level, H0 is not rejected• If the p-value is less than the significance level, H0 is rejected

If the p-value is less than:0.10, we have some evidence that H0 is not true0.05 we have strong evidence that H0 is not true0.01 we have very strong evidence that H0 is not true

Page 19: Intermediate Econometrics

The Simple Regression Model

1. Definition of the Simple Regression ModelThe population modelAssume linear functional form:

E(Y|Xi) = β0+β1Xi

β0: intercept term or constantβ1: slope coefficient - quantifies the linear relationship between X and YFixed parameters known as regression coefficients

For each Xi, individual observations will vary around E(Y|Xi) Consider deviation of any individual observation from conditional mean:ui = Yi - E(Y|Xi) ui : stochastic disturbance/error term – unobservable random deviation of an observation from its conditional mean

Page 20: Intermediate Econometrics

The Simple Regression Model

Definition of the Simple Regression ModelThe linear regression modelRe-arrange previous equation to get:

Yi = E(Y|Xi)+ ui

Each individual observation on Y can be explained in terms of: E(Y|Xi): mean Y of all individuals with same level of X – systematic or deterministic component of the model – the part of Y explained by X ui: random or non-systematic component – includes all omitted variables that can affect Y

Assuming a linear functional form:Yi = β0+β1Xi + ui

Page 21: Intermediate Econometrics

The Simple Regression Model

Definition of the Simple Regression ModelA note on linearity: Linear in parameters vs. linear in variablesThe following is linear in parameters but not in variables:

Yi = β0+β1Xi2 + ui

In some cases transformations are required to make a model linear in parameters

Page 22: Intermediate Econometrics

The Simple Regression Model

Definition of the Simple Regression ModelThe linear regression model

Yi = β0+β1Xi + ui

Represents relationship between Y and X in population of dataUsing appropriate estimation techniques we use sample data to estimate values for β0 and β1

β1: measures ceteris paribus affect of X on Y only if all other factors are fixed and do not change.Assume ui fixed so that Δui = 0, then

Δ Yi = β1 Δ Xi

Δ Yi /Δ Xi = β1

Unknown ui – require assumptions about ui to estimate ceteris paribus relationship

Page 23: Intermediate Econometrics

The Simple Regression Model

Definition of the Simple Regression ModelThe linear regression model: Assumptions about the error term

Assume E(ui) =0: On average the unobservable factors that deviate an individual observation from the mean are zero

Assume E(ui|Xi) =0: mean of ui conditional on Xi is zero – regardless of what values Xi takes, the unobservables are on average zero

Zero Conditional Mean Assumption:E(ui|Xi) = E(ui) = 0

Page 24: Intermediate Econometrics

The Simple Regression Model

Definition of the Simple Regression ModelThe linear regression model: Notes on the error term

Reasons why an error term will always be required:– Vagueness of theory– Unavailability of data– Measurement error– Incorrect functional form– Principle of Parsimony

Page 25: Intermediate Econometrics

The Simple Regression Model

Definition of the Simple Regression ModelStatistical Relationship vs. Deterministic Relationship

Regression analysis is concerned with statistical relationships since it deals with random or stochastic variables and their probability distributions the variation in which can never be completely explained using other variables – there will always be some form of error

Page 26: Intermediate Econometrics

The Simple Regression Model

Definition of the Simple Regression ModelRegression vs. Correlation

Correlation analysis: measures the strength or degree of linear association between two random variables

Regression analysis: estimating the average values of one variable on the basis of the fixed values of the other variables for the purpose of prediction.Explanatory variables are fixed, dependent variables are random or stochastic.

Page 27: Intermediate Econometrics

The Simple Regression Model

Ordinary Least Squares (OLS) EstimationEstimate the population relationship given by

using a random sample of data i=1,….n

Least Squares Principle: Minimise the sum of the squared deviations between the actual and the sample values.

Define the fitted values as

OLS minimises

iii uXY 10

ii XY 10ˆˆˆ

2

2

1 1

ˆˆn n

i i ii i

u Y Y

Page 28: Intermediate Econometrics

The Simple Regression Model

Ordinary Least Squares Estimation

First Order Conditions (Normal Equations):

Solve to find:

Assumptions?

10,1

210,

,minmin1010

QXY

n

iii

n

iii XY

Q1

100

10 0ˆˆ2,

n

iiii XYX

Q1

101

10 0ˆˆ2,

XY 10ˆˆ

n

ii

n

iii

XX

YYXX

1

21

1

Page 29: Intermediate Econometrics

The Simple Regression Model

Ordinary Least Squares EstimationMethod of Moments Estimator:Replace population moment conditions with sample counterparts:

Assumptions?

010 iii XYEuE

0ˆˆ1

101

n

iii XYn

010 iiiii XYXEuXE

0ˆˆ1

101

n

iiii XYXn

Page 30: Intermediate Econometrics

The Simple Regression Model

Properties of OLS Estimator

Gauss-Markov TheoremUnder the assumptions of the Classical Linear Regression Model the OLS estimator will be the Best Linear Unbiased Estimator

Linear: estimator is a linear function of a random variableUnbiased:

Best: estimator is most efficient estimator, i.e., estimator has the minimum variance of all linear unbiased estimators

11ˆ E

00ˆ E

Page 31: Intermediate Econometrics

The Simple Regression Model

Goodness of FitHow well does regression line ‘fit’ the observations?

R2 (coefficient of determination) measures the proportion of the sample variance of Yi explained by the model where variation is measured as squared deviation from sample mean.

Recall: SST = SSE + SSR SSE SST and SSE > 0 0 SSE/SST 1

If model perfectly fits data SSE = SST and R2 = 1If model explains none of variation in Yi then SSE=0 since and R2 = 0

SST

SSE

YY

YYR n

ii

n

ii

1

21

2

YYi 0ˆ

Page 32: Intermediate Econometrics

The Multiple Regression Model

The model with two independent variablesSay we have information on more variables that theory tells us may influence Y:

β0 : measures the average value of Y when X1 and X2 are zeroβ1 and β2 are the partial regression coefficients/slope coefficients which measure the ceteris paribus effect of X1 and X2 on Y, respectivelyKey assumption:

For k independent variables:

iiii uXXY 22110

0,| 21 iii XXuE

ikikiii uXXXY ........22110

0,........,,

0..,,.........,|

21

21

kiiiiii

kiiii

XuCovXuCovXuCovXXXuE

Page 33: Intermediate Econometrics

The Multiple Regression Model

4. Goodness-of-Fit in the Multiple Regression ModelHow well does regression line ‘fit’ the observations?As in simple regression model define:SST=Total Sum of SquaresSSE=Explained Sum of SquaresSSR=Residual Sum of Squares

Recall: SST = SSE + SSR SSE SST and SSE > 0 0 SSE/SST 1

R2 never decreases as more independent variables are added – use adjusted R2 :

SST

SSRSSTSSE

YY

YYR n

ii

n

ii

1

ˆ

1

2

1

2

2

n

ii YY

1

2

n

ii YY

1

n

iiu

1

1

112

nSSTknSSRR

Includes punishment for adding more variables to the model

Page 34: Intermediate Econometrics

The Multiple Regression Model

5. Properties of OLS Estimator of Multiple Regression ModelGauss-Markov TheoremUnder certain assumptions known as the Gauss-Markov assumptions the OLS estimator will be the Best Linear Unbiased Estimator

Linear: estimator is a linear function of the dataUnbiased:

Best: estimator is most efficient estimator, i.e., estimator has the minimum variance of all linear unbiased estimators

kkE ˆ

00ˆ E

Page 35: Intermediate Econometrics

The Multiple Regression Model

Properties of OLS Estimator of Multiple Regression ModelAssumptions required to prove unbiasedness:A1: Regression model is linear in parametersA2: X are non-stochastic or fixed in repeated samplingA3: Zero conditional meanA4: Sample is randomA5: Variability in the Xs and there is no perfect collinearity in the Xs

Assumptions required to prove efficiency:A6: Homoscedasticity and no autocorrelation

221 ,....,,| kiiii XXXuV

0, ji uuCov

Page 36: Intermediate Econometrics

Topic 3: The Multiple Regression Model

Estimating the variance of the OLS estimatorsNeed to know dispersion (variance) of sampling distribution of OLS estimator in order to show that it is efficient (also required for inference)

In multiple regression model:

Depends on:a) σ2: the error variance (reduces accuracy of estimates)b) SSTk: variation in X (increases accuracy of estimates)c) R2

k: the coefficient of determination from a regression of Xk on all other independent variables (degree of multicollinearity reduces accuracy of estimates)

What about the variance of the error terms 2?

2

2

kkk RSST

V

n

iiukn 1

22 ˆ1

1

Page 37: Intermediate Econometrics

The Multiple Regression Model

Model specificationInclusion of irrelevant variables:OLS estimator unbiased but with higher variance if X’s correlated

Exclusion of relevant variables:Omitted variable bias if variables correlated with variables included in the estimated model

True Model:

Estimated Model:

OLS estimator:

Biased:

Omitted Variable Bias:

1211~ˆˆ~

1211~~ E

121~~ Bias

ii XY 110~~~

0 1 1 2 2ˆ ˆ ˆ

i i iY X X

Page 38: Intermediate Econometrics

Inference in the Multiple Regression Model

The Classical Linear ModelSince the can be written as a linear function of u, making assumptions about the sampling distribution of u allows us say something about the sampling distribution of the

Assume u normally distributed

s'

s'

2,0~ Nui

Page 39: Intermediate Econometrics

Inference in the Multiple Regression Model

Hypothesis testing about a single population parameterAssume the following population model follows all CLM assumptions

OLS produces unbiased estimates but how accurate are they?Test by constructing hypotheses about population parameters and using sample estimates and statistical theory to test whether hypotheses are trueIn particular, we are interested in testing whether population parameters significantly different than zero:

0:0 kH

ikikiii uXXXY .....22110

Page 40: Intermediate Econometrics

Inference in the Multiple Regression Model

Hypothesis testing about a single population parameterTwo-sided alternative hypothesis

Large positive and negative values of computed test statistic inconsistent with null

Reject null if

Example:

Note: If null rejected variable is said to be ‘statistically significant’ at the chosen significance level

0: kAH

06.2condistributi of leither taiin percentile97.5th

belowor above anywhere is threshold0.05 25

0:0:0

dfHH

kA

k

ctk

Page 41: Intermediate Econometrics

Inference in the Multiple Regression Model

Hypothesis testing about a single population parameterP-value approach:Given the computed t-statistic, what is the smallest significance level at which the null hypothesis would be rejected?P-values below 0.05 provide strong evidence against the nullFor two sided alternative p-value is given by:

Example:

Note distinction between economic vs. statistical significance

0718.00359.0*2

85.1*285.1

85.1 400:0:

ˆ

0

TPTPtTP

tdfHH

k

kA

k

kk

tTPtTP ˆˆ *2

Page 42: Intermediate Econometrics

Inference in the Multiple Regression Model

Testing hypothesis about a single linear combination of parametersConsider the following model:

We wish to test whether X1 and X2 have the same effect on Yor

Construct statistic as before but standardize difference between parameters

Estimate:

21: AH

210 : H

iiiii uXXXY 3322110

121

21 ~ˆˆˆˆ

kntset

0: 210 H

212121 2 ˆ,ˆCovˆVarˆVarˆˆVar

Page 43: Intermediate Econometrics

Topic 4: Inference in the Multiple Regression Model

5. Testing hypothesis about multiple linear restrictionsGeneral model:

We wish to test J exclusion restrictions:

Restricted model:

Estimate both models and compute either:

Or:

Note: degrees of freedom for numerator and denominator!

000 210 kJkJk ,.......,,:H

ikikiii uXXXY .....22110

iJikJkiii uXXXY .....22110

1,~

1

knJur

urr FknSSR

JSSRSSRF

1,2

22

~11

knJ

ur

rur FknRJRRF

Large values inconsistent with null

Page 44: Intermediate Econometrics

Topic 4: Inference in the Multiple Regression Model

6. Overall test for significance of the RegressionGeneral model:

Test of null hypothesis that all variables except intercept insignificant:

Test statistic:

0,.......,0,0: 210 kH

ikikiii uXXXY .....22110

1,2

2

~11

knkFknRkRF

Large values inconsistent with null

0 222 rur RRR