25
9 Binary Regression Models (二分 二分 二分 二分类变量模型 模型 模型 模型) 2.1 Binary Outcomes binary dependent variables have two values, typically coded as 0 for a negative outcome (i.e., the event did not occur) and 1 as a positive outcome (i.e., the event did occur) 2.2 Binary Regression Models: explore how each independent variable affects the probability of the event occurring binary logit model(二分胜算对数模型) binary probit model(二分概率单元模型) 2.3 The Statistical Model 2.3.1 A Latent Variable Model (隐性变量模型):assume a latent or unobserved variable y* ranging from -to For example (女性参与就业市场) y is observed in two states: a woman is in the labor force she is not in the labor force The idea of a latent y* is that: There is an underlying propensity that generates the observed event We cannot directly observe y*, but at some point a change in y* results in a change in what we observe As the number of young children in the family increases, a woman’s propensity to “work” would decrease. At some point the propensity crosses a threshold that results in a decision to leave the labor force -0 = τ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y y*

Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

  • Upload
    others

  • View
    22

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

9

Binary Regression Models ((((二分二分二分二分类类类类变变变变量量量量模型模型模型模型)))) 2.1 Binary Outcomes

– binary dependent variables have two values, typically coded as 0 for a negative outcome (i.e., the event did not occur) and 1 as a positive outcome (i.e., the event did occur)

2.2 Binary Regression Models: explore how each independent variable affects the

probability of the event occurring – binary logit model(二分胜算对数模型) – binary probit model(二分概率单元模型)

2.3 The Statistical Model 2.3.1 A Latent Variable Model (隐性变量模型):assume a latent or unobserved variable y* ranging from -∞ to ∞

– For example (女性参与就业市场) • y is observed in two states:

a woman is in the labor force she is not in the labor force

• The idea of a latent y* is that:

There is an underlying propensity that generates the observed event We cannot directly observe y*, but at some point a change in y* results in

a change in what we observe

– As the number of young children in the family increases, a woman’s propensity to “work” would decrease. At some point the propensity crosses a threshold that results in a decision to leave the labor force

-∞ ∞

0=τ

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y

y*

Page 2: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

10

– Structural model

– Measurement model: the link between the observed binary y and the latent y* is made with a simple measurement equation:

• τ is called the threshold (临界点)or cutpoint(分界点) • the distribution of y* is shown by the bell-shaped curves which should be

thought of as coming out of the figure into a third dimension • when y* is larger than τ, indicated by the shaded area, we observe y=1. • Since y* is continuous, the model avoids the problems encountered with the

LPM. However, since the dependent variable is unobserved, the model cannot be estimated with OLS. Instead, we use ML estimation, which requires assumptions about the distribution of the errors.

• In general, the probit model and the logit model provide similar results except that the probit model assumes that error terms have a normal distribution and the logit model assumes that error terms have a logistic distribution.

• the size of the coefficients of the logit model is 1.7 to 1.8 larger than those of

the probit model

iii xy εβ +=*

≤>

=0 if 0

0 if 1*

*

i

i

i y

yy

Page 3: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

11

2.3.2 A Nonlinear Probability Model(非线性概率模型): The BRM can also be derived without appealing to latent variable through specifying a nonlinear model relating the xs to the probability of an event

( )( )

( )( )x

x

x

x

1Pr1

1Pr

0Pr

1Pr)(

=−=

===

=y

y

y

yxΩΩΩΩ

since the probability is transformed into odds(胜算,或概率比), it indicates how often something happens (y=1) relative to how often it does not happen (y=0), and the values range from 0 when ( ) 0x|1yPr == to ∞ when ( ) 1x|1yPr == . The log (自然对

数)of the odds, or logit(胜算对数), ranges from -∞ to ∞. This suggests a model that is linear in the logit:

( )( )

kk xxx

y

yx

ββββ ++++==

=−=

=

...

1Pr1

1Prln)(ln

22110xβ

x

xΩΩΩΩ

2.3.2.1什么是胜算(odds)?

胜算是某事件发生的概率与该事件不发生的概率的比率

胜算( ΩΩΩΩ )=

概率(p)=

胜算=4,发生的概率为 0.8,不发生的概率为 0.2;发生的概率为不发生

的概率的 4倍 中文:五五波,六四开

– 胜算与概率相似处:

− 39

13.,.

1ge

p

p

i

i

5213

.,.geN

f

Page 4: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

12

都以 0为最低值,无负数 当胜算与概率的值增大时,都表示某事件愈可能发生;概率愈大,胜算愈

大;概率愈小,胜算愈小

− 胜算与概率不同处: 胜算以该事件不发生的概率为分母,概率以所有发生与不发生的事件为分

母 概率最大值为 1 ,胜算可达无穷大,消除上界为 1的问题 概率趋近于 1时,概率少许的变动会对胜算有很大的影响

ip 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 ( )ip−1

0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.01 胜算 0.01 0.11 0.25 0.43 0.67 1.00 1.50 2.33 4.00 9.00 99.0

odds >1,发生的概率大于不发生的概率 0< odds <1,发生的概率小于不发生的概率 odds =1,发生的概率等于不发生的概率

– 胜算与概率的关系:

( )

( )

i

ii

iii

iiii

iiii

iii

i

ii

p

p

pp

pp

pp

p

p

ΩΩΩΩΩΩΩΩ

ΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩΩ

ΩΩΩΩ

ΩΩΩΩ

+=

+=+=−=

−×=−

=

1

1

1

1

2.3.2.2胜算比(odds ratio): 一个胜算可与另一个胜算来比较

22

11

2

1

1

1

pp

pp

−−==

ΩΩΩΩΩΩΩΩθ

Page 5: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

13

如果胜算比大于 1,表示第一个事件组的胜算大于第二个事件组的胜算。胜算比

愈大,第一个事件组的胜算愈大。如果胜算比小于 1,第一个事件组的胜算小于

第二个事件组的胜算。 2.3.2.3什么是自然胜算对数(logit, the natural log of odds)? 自然胜算对数就是将胜算( iii pp −= 1ΩΩΩΩ )取自然对数,即

( )iii pp −= 1lnln ΩΩΩΩ

ip 0.01 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.99 ( )ip−1

0.99 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.01 胜算 0.01 0.11 0.25 0.43 0.67 1.00 1.50 2.33 4.00 9.00 99.0 自然

胜算

对数 -4.60 -2.20 -1.39 -0.85 -0.41 0.00 0.41 0.85 1.39 2.20 4.60

从正无穷大到负无穷大,消除概率上下限为 0的问题

以概率为 0.5为界,成对称型

等量的概率变化会造成自然胜算对数不等量的改变

– 总之,

胜算大于 1,自然胜算对数为正值

胜算等于 1,自然胜算对数为 0

胜算大于 0但小于 1,自然胜算对数为负值

自然胜算对数以 5.0=ip 为界,成对称型

概率为 1或 0,相对的自然胜算对数不存在

Page 6: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

14

2.3.2.4二分胜算对数模型

( )( )

kk xxx

y

yx

ββββ ++++==

=−=

=

...

1Pr1

1Prln)(ln

22110xβ

x

xΩΩΩΩ

• logit transformation linearizes the nonlinear relations • the model is linear in logit, the same change in x have constant effects • the logit implies a nonlinear relations between x and the probabilities

上式可在两边取指数函数(exponent),左侧以胜算表示依变量,右侧为自变

量。等式右侧呈现相乘关系,表示依变量与自变量是非线性关系。所以,二分

胜算对数模型将变量间的非线性关系,以自然胜算对数将之线性化(linear in logit)。

( )( )

kk

k

xxx

xxx

eeee

ey

y

ββββ

ββββ

...

)exp(1Pr1

1Pr

22110

22110 ...

=

===−

= ++++xβx

x

– Example: Labor force participation of women (file name: binlfp2 ) The sample consists of 753 white, married women between the ages of 30 and 60. The dependent variable lfp equals 1 if a woman is employed and 0 otherwise. The hypothesis is that the probability of a married woman in the paid labor force is affected by a series of factors, including the number of younger children (K5), the number of older children (K618), the wife’s age (age), the wife’s education (wc), the husband’s education (hc), log of wife’s estimated wage rate (lwg) and family income excluding wife’s wages (inc). The model to be estimated is

inclwghc

wcageKKFlfp

inclwghc

wcagekk

ββββββββ

+++++++== 6185()1Pr( 61850

− Estimation with STATA (a do-file)

Page 7: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

15

*logit command, suppressing iteration log and stori ng the results logit lfp k5 k618 age wc hc lwg inc, nolog

(or statistics/binary outcomes/logistic regression)

estimates store logit (or statistics/general post-estimation/manage estimation results/store estimation results)

*probit command, suppressing iteration history by t he nolog option and storing the results probit lfp k5 k618 age wc hc lwg inc, nolog

(or statistics/binary outcomes/probit regression) estimates store probit *creating a table that combines the results estimates table logit probit, b(%9.3f) se label var width(30)

(or statistics/general post-estimation/manage estimation results/table of estimation results)

Output page . logit lfp k5 k618 age wc hc lwg inc, nolog Logit estimates N umber of obs = 753 L R chi2(7) = 124.48 P rob > chi2 = 0.0000 Log likelihood = -452.63296 P seudo R2 = 0.1209 --------------------------------------------------- -------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- -------------------------- k5 | -1.462913 .1970006 -7.43 0.0 00 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.3 42 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.0 00 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.0 00 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.5 88 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.0 00 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.0 00 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.0 00 1.919188 4.445092 --------------------------------------------------- --------------------------- . estimates store logit . . *probit command, suppressing iteration history by the nolog option and storing > the results . probit lfp k5 k618 age wc hc lwg inc, nolog Probit estimates N umber of obs = 753 L R chi2(7) = 124.36 P rob > chi2 = 0.0000 Log likelihood = -452.69496 P seudo R2 = 0.1208

Page 8: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

16

--------------------------------------------------- -------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- -------------------------- k5 | -.8747112 .1135583 -7.70 0.0 00 -1.097281 -.6521411 k618 | -.0385945 .0404893 -0.95 0.3 40 -.117952 .0407631 age | -.0378235 .0076093 -4.97 0.0 00 -.0527375 -.0229095 wc | .4883144 .1354873 3.60 0.0 00 .2227642 .7538645 hc | .0571704 .1240052 0.46 0.6 45 -.1858754 .3002161 lwg | .3656287 .0877792 4.17 0.0 00 .1935847 .5376727 inc | -.020525 .0047769 -4.30 0.0 00 -.0298875 -.0111626 _cons | 1.918422 .3806536 5.04 0.0 00 1.172355 2.66449 --------------------------------------------------- --------------------------- . estimates store probit . *creating a table that combines the results . estimates table logit probit, b(%9.3f) t label va rwidth(30) --------------------------------------------------- ----- Variable | logit probi t -------------------------------+------------------- ----- # kids < 6 | -1.463 -0. 875 | 0.197 0. 114 # kids 6-18 | -0.065 -0. 039 | 0.068 0. 040 Wife's age in years | -0.063 -0. 038 | 0.013 0. 008 Wife College: 1=yes 0=no | 0.807 0. 488 | 0.230 0. 135 Husband College: 1=yes 0=no | 0.112 0. 057 | 0.206 0. 124 Log of wife's estimated wages | 0.605 0. 366 | 0.151 0. 088 Family income excluding wife's | -0.034 -0. 021 | 0.008 0. 005 Constant | 3.182 1. 918 | 0.644 0. 381 --------------------------------------------------- ----- legend: b/se

2.3.4 Hypothesis Testing with test and lrtest – If the assumptions of the model hold, the ML estimators are distributed

asymptotically normally and thus can be tested with the corresponding z-statistics in the output.

– The t-test and z-test are the same when the N is large. – The z-test included in the output of estimation command is a Wald test, which

can be computed using test . For example, to test 0: 50 =kH β , *significant test of a single coefficient test k5

(or statistics/general post-estimation/tests/tests parameters)

Page 9: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

17

Output page .test k5 ( 1) k5 = 0 chi2( 1) = 55.14 Prob > chi2 = 0.0000

The effect of having young children on the probability of entering the labor force is significant at the .01 level

– test can be used to test multiple coefficients. For example, to test

0:0 == hcwcH ββ ,

*significant test of multiple coefficients test hc wc

Output page .test hc wc ( 1) hc = 0 ( 2) wc = 0 chi2( 2) = 17.66 Prob > chi2 = 0.0001

The hypothesis that the effects of the husband’s and the wife’s education are simultaneously equal to zero can be rejected at the .01 level

– Comparing competitive (nested) models using LR test: the LR test assesses a hypothesis by comparing the log likelihood from the full model (MU) (i.e., the model that does not include the constraints implied by H0) and a restricted model (MC) that imposes the constraints. If the constraints significantly reduce the log likelihood, the H0 (that the imposed constraints are true) is rejected.

• Consider the following models:

M1: )()|1Pr( 443322110 xxxxxy βββββ ++++Λ==

M2: )()|1Pr( 22110 xxxy βββ ++Λ==

M3: )()|1Pr( 44220 xxxy βββ ++Λ==

Page 10: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

18

• The LR test can be defined as follows: 1. Mc with parameters βc is nested in Mu with βu. 2. H0 is that the constraints imposed to create Mc are true. 3. L(M u) is the value of the likelihood function at the unconstrained

estimates. 4. L(M c) is the value at the constrained estimated. 5. The likelihood ratio statistic equals

)(ln2)(ln2)|(2CUUC MLMLMMG −=

6. Under very general conditions, if H0 is true, then G2 is asymptotically

distributed as chi-square with degrees of freedom equal to the number of independent constraints

• Assume that a restricted model (without k5 and k618 , or

0: 61850 ==kk

H ββ ) is tested against a full model *comparing nested models using an LR test logit lfp k5 k618 age wc hc lwg inc, nolog estimates store fmodel logit lfp age wc hc lwg inc, nolog estimates store rmodel lrtest fmodel rmodel

(or statistics/general post-estimation/tests/likelihood-ratio test)

Output page . *comparing nested models using an LR test . logit lfp k5 k618 age wc hc lwg inc, nolog ( output omitted) . estimates store fmodel . . logit lfp age wc hc lwg inc, nolog ( output omitted) . estimates store rmodel . . lrtest fmodel rmodel likelihood-ratio test LR chi2(2) = 66.49 (Assumption: rmodel nested in fmodel) Prob > chi2 = 0.0000

Page 11: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

19

Since the LR chi-square is statistically significant, we reject the null hypothesis that the coefficients of k5 and k618 are simultaneously equal to

zero (or, reject 0: 61850 ==kk

H ββ ) When conducting LR test: 1) the two models must be nested (i.e., a nested model is created by imposing constraints on the coefficients in the prior model), and 2) the two models must be fitted on exactly the same sample.

2.4 Interpretation: In general, the estimated parameters from the BRM do not provide directly useful information for understanding the relationship between the dependent and independent variables. Substantively meaningful interpretations are thus based on predicted probabilities and functions of those probabilities (e.g., ratios, differences). There are four approaches:

- Predictions can be computed for each observation in the sample using predict

- Predicted values for substantively meaningful “profiles” of the independent variables can be compared using prvalue or prtab

- The marginal or discrete change in the outcome can be computed at representative value of the independent variables using prchange

- The nonlinear model can be transformed to a model that is linear in some other outcome and listcoef can be used

2.4.1 Interpretation with Predicted Probability using predict : *logit command, suppressing iteration log logit lfp k5 k618 age wc hc lwg inc, nolog predict prlogit

(or statistics/general post-estimation/obtain predictions, residuals)

probit lfp k5 k618 age wc hc lwg inc, nolog predict prprobit summarize pr* *Correlating prlogit and prprobit pwcorr prlogit prprobit

Output page

. *logit command, suppressing iteration log . logit lfp k5 k618 age wc hc lwg inc, nolog

Page 12: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

20

Logit estimates N umber of obs = 753 L R chi2(7) = 124.48 P rob > chi2 = 0.0000 Log likelihood = -452.63296 P seudo R2 = 0.1209 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- k5 | -1.462913 .1970006 -7.43 0.0 00 -1.849027 -1.076799 k618 | -.0645707 .0680008 -0.95 0.3 42 -.1978499 .0687085 age | -.0628706 .0127831 -4.92 0.0 00 -.0879249 -.0378162 wc | .8072738 .2299799 3.51 0.0 00 .3565215 1.258026 hc | .1117336 .2060397 0.54 0.5 88 -.2920969 .515564 lwg | .6046931 .1508176 4.01 0.0 00 .3090961 .9002901 inc | -.0344464 .0082084 -4.20 0.0 00 -.0505346 -.0183583 _cons | 3.18214 .6443751 4.94 0.0 00 1.919188 4.445092 --------------------------------------------------- --------------------------- . predict prlogit (option p assumed; Pr(lfp)) . probit lfp k5 k618 age wc hc lwg inc, nolog Probit estimates N umber of obs = 753 L R chi2(7) = 124.36 P rob > chi2 = 0.0000 Log likelihood = -452.69496 P seudo R2 = 0.1208 --------------------------------------------------- --------------------------- lfp | Coef. Std. Err. z P>| z| [95% Conf. Interval] -------------+------------------------------------- --------------------------- k5 | -.8747112 .1135583 -7.70 0.0 00 -1.097281 -.6521411 k618 | -.0385945 .0404893 -0.95 0.3 40 -.117952 .0407631 age | -.0378235 .0076093 -4.97 0.0 00 -.0527375 -.0229095 wc | .4883144 .1354873 3.60 0.0 00 .2227642 .7538645 hc | .0571704 .1240052 0.46 0.6 45 -.1858754 .3002161 lwg | .3656287 .0877792 4.17 0.0 00 .1935847 .5376727 inc | -.020525 .0047769 -4.30 0.0 00 -.0298875 -.0111626 _cons | 1.918422 .3806536 5.04 0.0 00 1.172355 2.66449 --------------------------------------------------- --------------------------- . predict prprobit (option p assumed; Pr(lfp)) . summarize pr* Variable | Obs Mean Std. Dev. Min Max -------------+------------------------------------- ------------------- prlogit | 753 .5683931 .1944213 .0139875 .9621198 prprobit | 753 .5705144 .1928416 .005691 .9744748

Page 13: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

21

predict can be used to examine the range of predicted probabilities from the model. For instance, after storing the predicted probabilities in the corresponding variables, prlogit and prprobit , summarize shows the range of the predicted probabilities are from .014 to .962 and .006 to .974 respectively.

predict can also be used to demonstrate that the predictions from logit and probit models are essentially identical. As the following output shows,

Output page . *Correlating prlogit and prprobit . pwcorr prlogit prprobit | prlogit prprobit -------------+------------------ prlogit | 1.0000 prprobit | 0.9998 1.0000

2.4.2 Individual Predicted Probabilities with prvalue : a table of probabilities for ideal types of people (or countries, cows, etc.) can quickly summarize the effects of key variables. E.g., in the example of labor force participation Young, low income and low education families with young children Highly educated, middle aged couples with no children An “average family” defined as having the mean on all variables

*logit command, suppressing iteration log logit lfp k5 k618 age wc hc lwg inc, nolog *Individual Predicted Probabilities with prvalue *Young, low income and low education families with young children prvalue, x(age=35 k5=2 wc=0 hc=0 inc=15) rest(mean)

(or statistics/general post-estimation/table of adjusted means and proportions)

*Highly educated, middle aged couples with no child ren prvalue, x(age=50 k5=0 k618=0 wc=1 hc=1) rest(mean) *An “average family” defined as having the mean on all variables prvalue, rest(mean)

Output page *Young, low income and low education families with young children . prvalue, x(age=35 k5=2 wc=0 hc=0 inc=15) rest(mea n)

Page 14: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

22

logit: Predictions for lfp Pr(y=inLF|x): 0.1318 95% ci: (0.0723,0.22 82) Pr(y=NotInLF|x): 0.8682 95% ci: (0.7718,0.92 77) k5 k618 age wc hc lwg inc x= 2 1.3532537 35 0 0 1.0971148 15 . . *Highly educated, middle aged couples with no chi ldren . prvalue, x(age=50 k5=0 k618=0 wc=1 hc=1) rest(mea n) logit: Predictions for lfp Pr(y=inLF|x): 0.7166 95% ci: (0.6266,0.79 21) Pr(y=NotInLF|x): 0.2834 95% ci: (0.2079,0.37 34) k5 k618 age wc hc lwg inc x= 0 0 50 1 1 1.0971148 20.128965 . . *An "average family" defined as having the mean o n all variables . prvalue, rest(mean) logit: Predictions for lfp Pr(y=inLF|x): 0.5778 95% ci: (0.5388,0.61 59) Pr(y=NotInLF|x): 0.4222 95% ci: (0.3841,0.46 12) k5 k618 age wc hc lwg inc x= .2377158 1.3532537 42.537849 .2815405 .39 176627 1.0971148 20.128965

With these predictions, we can summarize the results on factors affecting a wife’s labor force participation

Table of Predicted Probabilities at Selected Values Ideal Types Probabilities of LFP Young, low income and low education families with young children

.013

Highly educated, middle aged couples with no children

.72

An “average family” defined as having the mean on all variables

.58

Page 15: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

23

2.4.3 Individual Predicted Probabilities with prtab : if the focus is one two or three categorical independent variables. Predictions for all combinations of the categories of these variables could be presented as

Table of Predicted Probabilities at Selected Values

The above table is created by prtab k5 wc, rest(mean)

Output page . *Individual Predicted Probabilities with prtab . prtab k5 wc, rest (mean) logit: Predicted probabilities of positive outcome for lfp ---------------------------- | Wife College: # kids < | 1=yes 0=no 6 | NoCol College ----------+----------------- 0 | 0.6069 0.7758 1 | 0.2633 0.4449 2 | 0.0764 0.1565 3 | 0.0188 0.0412 ---------------------------- k5 k618 age wc hc lwg inc x= .2377158 1.3532537 42.537849 .2815405 .39 176627 1.0971148 20.128965 2.4.4 Changes in Predicted Probabilities with prchange : this command computes

the marginal/discrete change at the values of the independent variables specified with x() or rest(). For instance,

*computing marginal change with all variables at th eir mean prchange, rest (mean)help *the above command can be supplemented by fromto op tion

Number of Young Children

Predicted Probability

Did not Attend Attended College Difference

0 .61 .78 0.17 1 .26 .44 0.18 2 .08 .16 0.08 3 .02 .04 0.02

Page 16: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

24

prchange, fromto *(output omitted)

Output page prchange, rest (mean) help logit: Changes in Predicted Probabilities for lfp min->max 0->1 -+1/2 -+sd/2 MargE fct k5 -0.6361 -0.3499 -0.3428 -0.1849 -0.3 569 k618 -0.1278 -0.0156 -0.0158 -0.0208 -0.0 158 age -0.4372 -0.0030 -0.0153 -0.1232 -0.0 153 wc 0.1881 0.1881 0.1945 0.0884 0.1 969 hc 0.0272 0.0272 0.0273 0.0133 0.0 273 lwg 0.6624 0.1499 0.1465 0.0865 0.1 475 inc -0.6415 -0.0068 -0.0084 -0.0975 -0.0 084 NotInLF inLF Pr(y|x) 0.4222 0.5778 k5 k618 age wc hc lwg inc x= .237716 1.35325 42.5378 .281541 .391766 1.09711 20.129 sd(x)= .523959 1.31987 8.07257 .450049 .488469 .587556 11.6348 Pr(y|x): probability of observing each y for speci fied x values Avg|Chg|: average of absolute value of the change a cross categories Min->Max: change in predicted probability as x chan ges from its minimum to its maximum 0->1: change in predicted probability as x chan ges from 0 to 1 -+1/2: change in predicted probability as x chan ges from 1/2 unit below base value to 1/2 unit above -+sd/2: change in predicted probability as x chan ges from 1/2 standard dev below base to 1/2 standard dev above MargEfct: the partial derivative of the predicted p robability/rate with respect to a given independent variable

varying age (age) from its minimum of 30 to its maximum of 60 decreases the predicted probability by .44

changing family income (inc) from its minimum to its maximum decreases the probability of a woman being in the labor force by .64

for a woman who is average on all characteristics, an additional young child decreases the probability of employment by .3428

if a woman attends college, her probability of being in the labor force is .1881 greater than a woman who does not attend college, holding all other variables at their mean.

2.4.5 Interpretation using Odds Ratios (胜算比) with listcoef :

- For binary outcomes, we typically consider the odds of observing a positive outcome versus a negative one:

Page 17: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

25

)1Pr(1

)1Pr(

)0Pr(

)1Pr(

=−==

===Ω

y

y

y

y

The log of odds is logit and the model is linear in the logit

kk xxxy

yx ββββ ++++=

=−==Ω ...

)1Pr(1)1Pr(

ln)(ln 22110

- Odds Ratio

• Taking the exponential (指数)of the logit equation:

[ ]kk

k

xxx

xxx

eeee

exxxββββ

βββββ...

)exp()(lnexp)(22110

22110 ...

===Ω=Ω ++++

• Let x2 change by 1:

kk

k

xxx

xxx

eeeee

exxxβββββ

βββββ...

)exp()1,(222110

22110 ...)1(

2

===+Ω +++++

• To compare the odds before and after the change of x2, we take odds ratio:

2

22110

222110

...

...

),(

)1,(

2

2 βββββ

βββββ

eeeee

eeeee

xx

xxkk

kk

xxx

xxx

==Ω

• For a unit change in xk, the odds are expected to change by a factor of

exp(βk), holding all other variables constant For exp(βk) >1, the odds are “exp(βk) times larger” For exp(βk) <1, the odds are “exp(βk) times smaller” The effect does not depend on the level of xk or the level of any other variable.

Factor change (倍数改变)in odds ratios for both a unit change and a

standard deviation change of the independent variables can be obtained with listcoef :

* obtaining factor change in odds ratios listcoef, help

Page 18: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

26

Output page * obtaining factor change in odds ratios listcoef, help logit (N=753): Factor Change in Odds Odds of: inLF vs NotInLF --------------------------------------------------- ------------------- lfp | b z P>|z| e^b e^bStdX SDofX -------------+------------------------------------- ------------------- k5 | -1.46291 -7.426 0.000 0.2316 0.4646 0.5240 k618 | -0.06457 -0.950 0.342 0.9375 0.9183 1.3199 age | -0.06287 -4.918 0.000 0.9391 0.6020 8.0726 wc | 0.80727 3.510 0.000 2.2418 1.4381 0.4500 hc | 0.11173 0.542 0.588 1.1182 1.0561 0.4885 lwg | 0.60469 4.009 0.000 1.8307 1.4266 0.5876 inc | -0.03445 -4.196 0.000 0.9661 0.6698 11.6348 --------------------------------------------------- ------------------- b = raw coefficient z = z-score for test of b=0 P>|z| = p-value for z-test e^b = exp(b) = factor change in odds for unit increase in X e^bStdX = exp(b*SD of X) = change in odds for SD i ncrease in X SDofX = standard deviation of X

For each additional young child, the odds of being employed decrease by a factor of 0.23, holding all other variables constant (在其它变项不变的情

形下,每增加一个小孩,已婚妇女参与就业市场的胜算(或概率比)就

会减少 0.23倍). For a standard deviation increase in the log of the wife’s expected wages, the odds of being employed are 1.43 times greater, holding all other variables constant. The interpretation of the odds ratio assumes that the other variables are held constant, but it does not require that they be held at any specific values. It is important to know that a constant factor change in the odds does not correspond to a constant change or constant factor change in the probability. For example, if the odds are 1/1 (probability=.5) and double to 2/1 (probability=0.333), the probability increases by 0.167

– Percent change in odds ratio can also be obtained with listcoef :

( ) 1exp100 −×δβk which is listed by lisfcoef with the percent option

Page 19: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

27

Output page listcoef, percent logit (N=753): Percentage Change in Odds Odds of: inLF vs NotInLF --------------------------------------------------- ------------------- lfp | b z P>|z| % %StdX SDofX -------------+------------------------------------- ------------------- k5 | -1.46291 -7.426 0.000 -76.8 -53.5 0.5240 k618 | -0.06457 -0.950 0.342 -6.3 -8.2 1.3199 age | -0.06287 -4.918 0.000 -6.1 -39.8 8.0726 wc | 0.80727 3.510 0.000 124.2 43.8 0.4500 hc | 0.11173 0.542 0.588 11.8 5.6 0.4885 lwg | 0.60469 4.009 0.000 83.1 42.7 0.5876 inc | -0.03445 -4.196 0.000 -3.4 -33.0 11.6348 --------------------------------------------------- -------------------

For each additional young child, the odds of being employed decrease by 76.8%, holding all other variables constant (在其它变项不变的情形下,每

增加一个小孩,已婚妇女参与就业市场的胜算(或概率比)就会减少76.8%) For a standard deviation increase in the log of the wife’s expected wages, the odds of being employed increased 42.7%, holding all other variables constant

2.5 Residuals and Influence Using Predict 2.5.1 Detecting Residuals (残差值) − Residuals are the difference between a model’s predicted and observed outcome for

each observation in the sample. When the differences are large, the observations are considered outliers(离群值). For a binary model, define

)|1Pr()|( 1xyxyE iiii ===π

− The deviation iiy π− are heteroscedastistic, with

)1()|()|( iiiiiii xyVarxyVar πππ −==− − Which suggests the Pearson residual

)ˆ1(ˆ ii

iii

yr

πππ−

−=

Page 20: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

28

− Large values of ir suggest a failure of the model to fit a given observation − Pregibon proposed the standardized Pearson residual:

)( i

ii

std

rVar

rr =

− An index plot is a useful way to examine residuals by simply plotting residuals against the observation number.

− There is no hard-and-fast rule for what counts as a “large” residual. One way to search for problematic residuals is to sort the residuals by the value of a variable that you think may be a problem for the model. If this variable is primarily responsible for the lack of fit of some observations, the plot would show a disproportionate number of cases with large residuals. The example below sorts the data by income before plotting.

*compute standardized residuals and plotting them a gainst index logit lfp k5 k618 age wc hc lwg inc, nolog predict rstd, rs

(or statistics/general post-estimation/obtain predictions, residuals)

sort inc generate index=_n graph twoway scatter rstd index, msymbol(none) mlab el(index) list rstd index if rstd>2.5|rstd<-2.5 Output page . //Residuals and influence using predict . logit lfp k5 k618 age wc hc lwg inc, nolog (output omitted) . predict rstd, rs . sort inc . generate index=_n . graph twoway scatter rstd index, msymbol(none) ml abel(index)

Page 21: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

29

1

2

3

4567

8

9

10

11

121314

15

16

17

18

19

20

21

22

23

242526

27

282930

31

323334

35

36

3738

39

4041

42

43

44454647

48

49

50515253

54

5556

5758

59

6061

626364

65

6667

686970

71

727374

75

7677

78

79

80

8182

8384

85

86

8788

89

9091

92

93949596

97

9899100101

102

103104

105106107

108

109110111112113

114

115116117118

119

120

121122

123

124

125

126127128

129130

131132133134

135

136

137

138

139140141

142

143

144

145146

147

148149

150151152

153

154155156

157

158159

160161

162

163164

165

166

167

168169

170171172173

174

175

176

177178179180181

182

183

184

185

186

187

188

189

190191

192

193

194195196197198199

200201

202

203204

205206

207

208209

210

211

212

213214

215

216

217

218

219220221222223

224

225226

227228

229230231

232

233234235236237

238

239240

241

242243244

245

246247

248

249

250

251

252

253

254

255256

257

258

259

260

261262

263

264

265

266

267

268269270

271

272

273274

275

276

277

278279

280281

282

283

284

285

286

287

288

289290291

292

293

294295

296

297

298

299

300

301

302303304305

306307

308

309

310

311

312

313

314

315

316

317318319

320321

322323

324

325

326

327

328

329

330

331

332

333334

335

336337338339340

341

342

343

344

345

346347348

349

350

351

352

353

354355356

357

358359360

361

362363364365

366367368

369

370

371

372373

374375376

377

378

379380

381

382

383

384

385

386

387

388

389

390

391392393

394

395

396

397

398

399

400

401

402403

404405

406

407

408

409

410411

412

413

414415

416

417418419

420

421

422

423

424425

426

427

428

429

430

431432

433

434435

436

437

438

439

440441442443

444

445446447

448

449

450451

452

453

454

455

456

457

458

459460461

462

463

464

465

466467468469470

471472

473

474

475

476

477

478

479

480

481482

483

484485

486487488

489490491

492493

494

495

496

497

498499500501

502

503504

505

506

507

508

509510511

512

513

514

515

516517

518

519520

521522

523

524

525

526

527

528

529

530531

532

533534

535536

537538

539540541

542543

544

545546

547

548

549

550

551

552

553

554

555

556557

558

559

560561

562

563

564

565

566

567

568

569

570571

572573574

575

576577

578579

580

581

582

583

584

585

586

587

588589590

591

592

593

594595596

597

598599

600601602

603

604

605

606

607608

609610611

612

613

614

615

616617

618

619

620

621622

623

624

625

626

627

628

629

630

631632

633634635

636

637

638

639

640

641

642

643

644

645

646

647648

649

650651652

653

654655

656

657658

659

660661

662

663

664

665666

667

668669

670

671672

673

674

675

676677

678679680

681

682

683

684

685

686

687

688

689690691

692

693

694695

696697

698

699

700

701

702

703704705

706

707

708709710

711712

713714715

716

717

718719720

721

722

723

724

725

726

727

728

729730

731

732

733

734

735

736

737738

739

740741742

743

744745746

747

748749

750

751

752

753

-4-2

02

4st

anda

rdiz

ed P

ears

on r

esid

ual

0 200 400 600 800index

The above cases can be listed using the command list .

. list rstd index if rstd>2.5|rstd<-2.5 +-------------------+ | rstd index | |-------------------| 142. | 3.191524 142 | 345. | 2.873378 345 | 514. | -2.677243 514 | 554. | -2.871972 554 | 752. | 3.192648 752 | +-------------------+

2.5.2 Detecting Influential Cases(重要观察值)

− Influential cases are also called high-leverage points, which have a strong influence on the estimated parameters. They can be determined by examining

the change in the estimated β that occurs when the ith observation is deleted. This is the counterpart to Cook’s distance for the linear regression model, which is the dbeta in Stata and is defined as

2

2

)1( ii

iiii h

hrC

−=

* Detecting influential cases predict cook, dbeta

(or statistics/general post-estimation/obtain predictions, residuals)

Page 22: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

30

generate index=_n graph twoway scatter cook index, msymbol(none) mlab el(index) The commands produce the following plot, which shows that cases 338, 534, and 635 merit further examination:

373340

108669

597540539114

466

13 229

459 513 708291 723362

6086394

159 302

44 566542421168704473 519

29 380543 7507 689

290

87 277706367

502

34

9073 29342

191

48420 538371 5818 369 699268217 732

14817679647564

729233214 429 474 676200 531524 571

406

379 431299

747

480576363123 176

52070073044650

632583147 440 7484144323

500451 59534561

4 29221694 207

680

734717522378 548

419

494667549 69147 66626996 678

334240 505407388

27652 653631 746

753536313 462

184

557 600450

534

13078 716523220 357 626

356 69043616

683611418

83751673

163115

22 681

672

448119

69430 132 389 688472 588 69856138 593336 658546335

70

744317

62

491213 423510

88 183 585310 580343 478 594437 56395 139481

188224

267281149 535498260721410

664

675

447

402

182 610545425374 467 578 69723613518

104 400 58450865 737735709718484232

643

64235590

442333193 370 630361

298

702

37756 715254 740341

166

434245

638

243103

489

348259 507174 3125866741145

488 692218725

331

138303318105

62398

150686

713415309 443 541 724

272

40964 316 736

172

604257 326197 607

727613562

288

187 246

338

625304

63740 486137 171

701

38751647738222563 438256

314705279

514215 719134722

372296

417 614616 726602238 649567199

662

720565

396

376226 475401189 745622582258 511 68745372 441 731587 714

30519521366

73843914168 412 645671204 636454 551

628

416212 560 684

20

555308

677

106 483 547330

222

86

235

392275 663

537496

127 227 359 624280

354

211

463156140 295 364180

283 460244237

196

449307265 347

247

36

37

509131 521517

530 621178 577452160 695

194 250

612 752659

742297 381 464 596

55

360177129

710

67749234

164126395

146 282285550

311 589365 527499 556 655615

422 558 696618

251646

294 66880

651

353 648242

346

53353 325170 707591 685569 620476 573165 458154 206 25311

328321

231

192

320

255

6933

118102 712122

60

629125202

544

27142

428 634

57601

52828 711

155

116329 733

323143

45

169161 44435018610 151 398 619

324424 572

121

627

641

266332

657228

252

99 327 552

46

209 504 74343405100

101263 445

239 512 739670173162 306 3492 273 506393

342

181 570644

485

351261

497

579 652532

92

3919 456654411

24433 617 656337 529

274

205 6821128579 455

471479

81

609

592 703728603300 59851 606403

221

45719

153 526

76

41

51 219

693492413284

262

660

52531

48291179

518157 344203230 48759 175 315

661

8449 38612413343097 210

31914 559

82 208190 501397

117 339 515111

674

554198 278599553384

287 426120

286136

568

35875 465390

633

383

493

20154110

241

399322

113

301

40418510977

89575

468355

503

495605

27112 289

408

490248

167 249

427

639

375

461

574

586223

15665640

435

32

264

15271

39 385

469

470

93144

128

352

7423

158

650

27026

368

25

635

1070.1

.2.3

Pre

gibo

n's

dbet

a

0 200 400 600 800index

2.6 Scalar Measures of Fit (适合度)using fitstat − A scalar measure of fit can be used to summarize the overall fit of a model. − A scalar measure of fit can be useful in comparing competing models. Within a

substantive area, measures of fit provide a rough index of whether a model is adequate. However, there is no convincing evidence that selecting a model that maximizes the value of a given measure of fit results in a model that is optimal in any sense other than the model having a larger value of that measure. Measures of fits thus only provide partial information that must be assessed within the context of the theory motivating the analysis.

2.6.1 Pseudo-R2 Based on R2 in the LRM − Several pseudo-R2’s for models with CDVs have been defined by analogy to the

formula for R2 in the LRM: McFadden’s R2, also know as the “likelihood-ratio index”, compares a model with just the intercept to a model with all parameters. It is defined as

Page 23: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

31

)(ˆln

)(ˆln12

Intercept

FullMcF

ML

MLR −=

Maximum Likelihood R 2

N

Full

InterceptML

ML

MLR

2

2

)(

)(1

−=

Cragg & Uhler’s R2

N

Intercept

NFullIntercept

UC

ML

MLMLR 2

2

&2

)(1

)()(1

−=

Efron’s R2

∑∑

−−

−=2

22

)(

)ˆ(1

yy

yR

i

iiEfron

π

2.6.2 Information Measures

AIC: Akaike’s information criterion is defined as NPMLAIC k 2(ˆln2 +−= . Other things being equal, the model with smaller AIC is considered the better-fitting model. BIC: The Bayesian information criterion is defined as NdfMDBIC kkk ln)( −= . The more negative the BIC, the better the fit. When comparing two models, if BIC1 - BIC2 <0, then the first model is preferred. If BIC1 - BIC2 >0, then the second model is preferred. − Example: Consider two models,

• M1 has the original specification of independent variables • M2 adds a squared age term AGE2 and drops the variables K618, HC and

LWG

Page 24: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

32

* Scalar Measures of Fit using fitstat gen age2=age*age logit lfp k5 k618 age wc hc lwg inc, nolog fitstat, save logit lfp k5 age age2 wc inc, nolog fitstat, dif

Output page . * Scalar Measures of Fit using fitstat . quietly logit lfp k5 k618 age wc hc lwg inc, nolo g . estimates store model1 . fitstat, save Measures of Fit for logit of lfp Log-Lik Intercept Only: -514.873 Log-Lik Fu ll Model: -452.633 D(745): 905.266 LR(7): 124.480 Prob > LR: 0.000 McFadden's R2: 0.121 McFadden's Adj R2: 0.105 Maximum Likelihood R2: 0.152 Cragg & Uh ler's R2: 0.204 McKelvey and Zavoina's R2: 0.217 Efron's R2 : 0.155 Variance of y*: 4.203 Variance o f error: 3.290 Count R2: 0.693 Adj Count R2: 0.289 AIC: 1.223 AIC*n: 921.266 BIC: -4029.663 BIC': -78.112 (Indices saved in matrix fs_0) . gen age2=age*age . quietly logit lfp k5 age age2 wc inc, nolog . estimates store model2 . estimates table model1 model2, b(%9.3f) t -------------------------------------- Variable | model1 model2 -------------+------------------------ k5 | -1.463 -1.380 | -7.43 -7.06 k618 | -0.065 | -0.95 age | -0.063 0.057 | -4.92 0.50 wc | 0.807 1.094 | 3.51 5.50 hc | 0.112 | 0.54 lwg | 0.605 | 4.01 inc | -0.034 -0.032 | -4.20 -4.18 age2 | -0.001 | -1.00 _cons | 3.182 0.979 | 4.94 0.40 -------------------------------------- legend: b/t

Page 25: Binary Regression Models ((((二分二分类类类类变变变变量 …sites.duke.edu/niou/files/2011/06/2-Discrete_Dep... · 6/2/2011  · 2.3.4 Hypothesis Testing with

33

. fitstat, dif Measures of Fit for logit of lfp Current Sav ed Difference Model: logit log it N: 753 7 53 0 Log-Lik Intercept Only: -514.873 -514.8 73 0.000 Log-Lik Full Model: -461.653 -452.6 33 -9.020 D: 923.306(747) 905.2 66(745) 18.040(2) LR: 106.441(5) 124.4 80(7) 18.040(2) Prob > LR: 0.000 0.0 00 0.000 McFadden's R2: 0.103 0.1 21 -0.018 McFadden's Adj R2: 0.092 0.1 05 -0.014 Maximum Likelihood R2: 0.132 0.1 52 -0.021 Cragg & Uhler's R2: 0.177 0.2 04 -0.028 McKelvey and Zavoina's R2: 0.182 0.2 17 -0.035 Efron's R2: 0.135 0.1 55 -0.020 Variance of y*: 4.023 4.2 03 -0.180 Variance of error: 3.290 3.2 90 0.000 Count R2: 0.677 0.6 93 -0.016 Adj Count R2: 0.252 0.2 89 -0.037 AIC: 1.242 1.2 23 0.019 AIC*n: 935.306 921.2 66 14.040 BIC: -4024.871 -4029.6 63 4.791 BIC': -73.321 -78.1 12 4.791 Difference of 4.791 in BIC' provides positive su pport for saved model. Note: p-value for difference in LR is only valid if models are nested.

All pseudo-R2’s are slightly larger for M1 Both AIC and BIC statistics are smaller for M1, which provides support for that

model