次数依变量模型 ( Models for Count Outcomes)

次数依变量模型(Models for Count

Outcomes)

2

Models for Count Outcomes (计次变量模型 ) Count variables indicate how many times something has happened.

美国总统否决法案的次数某教授发表论文的篇数非洲国家发生政变的次数

3

• Estimates from the linear regression models are inefficient, inconsistent, and biased

– Functional form

– Nonsensical predictions

4

– A frequently adopted remedy for linear regression model is to make a natural logarithmic transformation of the dependent variable so that a log-linear function is acquired

–Because zero is one of the observed values, a constantc is often added to the dependent variableYi, i.e., ln(Yi +c)

5

• Example: Article Counts( 论文篇数 ) example (file name:couart2): the data on the number of publications produced by Ph.D. biochemists are used

6

• Count Models Poisson Regression Model

(PRM 泊松模型 ) Negative Binomial

Regression Models （负二项模型）

泊松分布（ Poisson Distribution ）—若依变数 y 是计数 (count) 在某个时段内感兴趣的事件 (event) 共发生了几次， ,其值为包含 0 在内之正整数，且在学理上并无上限，这类型变量的分布属于泊松分布（ Poisson distribution ）

7

0,1,2,y

—泊松分布的一大特色是：期望值，其变异量亦为

—泊松分布的连接函数为对数函数（ log link ）

8

V Y

YE

—泊松分布的变异量是随平均数之大小而定，此一特性常称为「变异量与期望值相等」（ equidispersion ）

9

Poisson Regression Model (PRM泊松回归模型 ) ：将 GLM 之「系统部分」设为自变数的线性组合

后，代入连接函数中：

10

i i x β

ii

kk

iiii

yV

xxx

yE

...exp

exp

22110

βxx

• Interpretation of PRM

– the expected value of the count variable (rate of occurrence):listcoef, prchange

– the probability of counts:prvalue

– predicted count:prtab 11

12

• Interpretation of PRM

1. Change in for changes in the independent variables

– factor ( or percent) change in expected count using 　listcoef

–在其他变数固定不变的情形下 , 女性科学家的平均论文数是男性科学家的女性科学家的 0.8 倍 ( 或 , 少 20%)

)|( xyE

13

–在其他变数固定不变的情形下 ,指导教授的论文数增加一个标准差 , 科学家的平均论文数会增加27%

For a standard deviation increase in the mentors’ productivity, a scientist's mean productivity increases by 27 percent, holding all other variables constant

14

– Marginal and Discrete change in (predicted rate) using prchange

在一般情形下 ( 其他变数保持在平均值 ), 女性科学家的平均论文数会比男性少 0.36 篇

)|( xyE

15

2. creating ideal types with 　prvalue and prtab:

16

Negative Binomial Model（负二项模型）•变异量过大问题

—泊松回归在理论模型中均设定变异量等于期望值

)exp()|( βxx iiii yE

iiyV )(

—实际上，经验资料的变异量往往大于理论的预期，即

，称为变异量过大（ overdispersion ）问题

— 若不校正，系数之标准误会被低估，使得检定比实际更容易在统计上显著，造成推论上的误判 17

yEyV

—造成变异量过大的诸多原因之一，就是事件发生率除了受已观测到的引数影响之外，还有研究者「未观测到的异质」（ unobserved heterogeneity ）

18

i

处理方式有二：—不采用泊松回归本身的标准误，而另行计算不会低估的变异量及共变数矩阵（ variance-covariance matrix of the estimator, VCE ），以估计强韧标准误（ robust standard error ）

19

—设定事件发生率本身亦为随机变数，呈迦玛（ gamma ）概率分布，将之代回泊松分布后，二者合成新的「负二项」概率模型

20

i

重估泊松回归之强韧标准误—在 Stata ，于 poisson 指令后，加上 vce(robust) 之次指令，即可估算系数强韧之标准误：

poisson y x1 x2 x3, vce(robust)

21

• 两个「负二项」回归模型–(Negbin 2 或 NB2)

上式显示负二项分布的条件期望值与泊松回归模型相同；但条件变异量则不同

22

1,

exp,

yV

yE xβ

–(Negbin 1 或 NB1)

上式显示负二项分布的条件期望值与泊松回归模型相同；但条件变异量则不同

23

1,

exp,

yV

yE xβ

— 检定 : 当时，负二项分布的变异量等于泊松分布本身的变异量，则泊松模型适用

但只要是，负二项分布的变异量就大于泊松分本身的变异量 ( 过度离散 ) ，则负二项模型适用

24

0

0

25

• Stata 内建负二项回归模型指令：

nbreg y x1 x2 x3•在报表下方有变异量参数（ alpha ）的估计值及 LR 的检定值。如拒斥 H0 ，表示变异量在统计上显著地大于期望值，故应采负二项回归。

26

• Stata 之 nbreg 指令是设为NB2 模型。若要以 NB1 模型估计，则需在加上 dispersion (constant) 的次指令

• Interpretation of NBM

– the expected value of the count variable (rate of occurrence): listcoef, prchange

– the probability of counts: prvalue

– predicted count: prtab 27

28

• Interpretation of NBR

1. Change in for changes in the independent variables

– factor ( or percent) change in expected count usinglistcoef

在其他变量固定不变的情形下 , 女性科学家的平均论文数是男性科学家的 0.8 倍 ( 或 , 少20%)

)|( xyE

29

– Marginal and Discrete change in (predicted rate) usingprchange

在一般情形下 ( 其他变量保持在平均值 ), 女性科学家的平均论文数会比男性少 0.34 篇

)|( xyE

Documents

次数依变量模型 ( Models for Count Outcomes)