1
Tobit models
Econ 60303
Bill Evans
2
Example: Bias in censored models
• Bivariate regression
• xi and ε are drawn from N(0,1)
• yi = α + xi β + εi
• Let α=0 and β=1 (45o line) and construct y
• Estimate yi = α + xi β + εi
3
• Consider three LHS variables
• y1 is as reported (no censoring)
• y2=min(1,y1)
– censored 23.9%
• y3=min(0.25,y1)
– Censored 41.8% of the time
4
Figure 1: Plot of X and Y1
-5
-4
-3
-2
-1
0
1
2
3
4
5
-4 -3 -2 -1 0 1 2 3 4
X
Y
True
5
Figure 1: Plot of X and Y2
-5
-4
-3
-2
-1
0
1
2
3
4
5
-4 -3 -2 -1 0 1 2 3 4
X
Y
True
OLS
6
Figure 1: Plot of X and Y3
-5
-4
-3
-2
-1
0
1
2
3
4
5
-4 -3 -2 -1 0 1 2 3 4
X
Y
True
OLS
7
OLS Estimate of α and β
Dependent Variable Ratio, βYj/ βY1
Y1 Y2 Y3
α 0.027 -0.189 -0.432
β 1.023 0.755 0.565 0.738 0.553
% cen.
(1-%cen)
0 0.239
0.761
0.418
0.582
8
OLS using Y1
Tobit using Y2
Tobit using Y3
α 1.0229
(0.027)
1.0078
(0.036)
0.9960
(0.041)
β 0.027
(0.031)
0.0133
(0.033)
-0.0001
(0.004)
9
10
Example from CPS
• Data from the 1987 CPS out-going rotation group
• Households in CPS for same four months in a two year period (April-July 1987 and 1988)
• ¼ leave the sample temporarily or permanently each month
• In these months, answer detailed questions about current employment
11
• Union status
• Usual hours, hours of overtime
• Usual weekly earnings
• In each survey, weekly earnings are ‘topcoded’
• In the data we use (1987), topcoded at $999
12
• Sample, 25% random sample of full-time/full year male workers, 21-64
13
------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- age byte %9.0g age in years race byte %9.0g 1=white, non-hisp, 2=black, n.h, 3=hisp educ byte %9.0g years of education unionm byte %9.0g 1=union member, 2=otherwise smsa byte %9.0g 1=live in 19 largest smsa, 2=other smsa, 3=non smsa region byte %9.0g 1=east, 2=midwest, 3=south, 4=west earnwke int %9.0g usual weekly earnings -------------------------------------------------------------------------------
14
. gen earnwkl=ln(earnwke);
. gen union=unionm==1;
. gen topcode=earnwke==999;
. gen black=race==2;
. gen hispanic=race==3;
. * get frequencie of topcode;
. tabulate topcode; =1 if | earnwkl is | topcoded | Freq. Percent Cum. ------------+----------------------------------- 0 | 18,474 92.81 92.81 1 | 1,432 7.19 100.00 ------------+----------------------------------- Total | 19,906 100.00
Need a variableThat identifies What obs are censored
FractionOf obs topcoded
15
• . *run simple regression on topcoded data;
• . reg earnwkl age age2 educ black hispanic union;
• [delete results]
• . * run tobit model;• . * here, ul specifies that the dependent variable is;• . * topcoded above (upper censoring);• . tobit earnwkl age age2 educ black hispanic union, ul;
16
Tobit regression Number of obs = 19906 LR chi2(6) = 7309.06 Prob > chi2 = 0.0000 Log likelihood = -13207.534 Pseudo R2 = 0.2167 ------------------------------------------------------------------------------ earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0703864 .00214 32.89 0.000 .0661919 .074581 age2 | -.0006948 .0000262 -26.55 0.000 -.0007461 -.0006435 educ | .0757658 .0012172 62.25 0.000 .07338 .0781515 black | -.2200147 .011795 -18.65 0.000 -.2431339 -.1968954 hispanic | -.1058161 .0141638 -7.47 0.000 -.1335783 -.0780539 union | .1191111 .0077791 15.31 0.000 .1038634 .1343588 _cons | 3.499009 .0421806 82.95 0.000 3.416332 3.581686 -------------+---------------------------------------------------------------- /sigma | .4530426 .0023983 .4483418 .4577434 ------------------------------------------------------------------------------ Obs. summary: 0 left-censored observations 18474 uncensored observations 1432 right-censored observations at earnwkl>=6.906755
Similar to RMSE
17
egen q=mean(topcode) if earnwke>=750; gen alpha=ln(q)/(ln(750) - ln(999)); gen ey_y999=999*alpha/(alpha-1); sum q alpha ey_y999;
. sum q alpha ey_y999; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- q | 3277 .436985 0 .436985 .436985 alpha | 3277 2.887721 0 2.887721 2.887721 ey_y999 | 3277 1528.21 0 1528.21 1528.21
E[Y | Y>c] = αc/(α-1)
α = 2.89
E[Y | Y>999] = (2.89)(999)/(1.89) = 1528
18
OLS/Tobit when Income is Topcoded at $999
OLS Tobit QF Tobit/
OLS
Age 0.0679 0.0704 0.0723 0.964
Age2 -6.8E-4 -6.9E-4 -7.1E-4 0.985
Educ 0.0701 0.0757 0.0796 0.926
Black -0.2130 -0.2200 -0.2252 0.968
Hispanic -0.1096 -0.1058 -0.1049 1.036
Union 0.1316 0.1191 0.1078 1.105
19
• . * artifically topcode wages at 750;
• . gen top750=earnwke>=750;
• . gen earnwkl3=top750*ln(750) + (1-top750)*ln(earnwke);
• . * run regression on model with artificially topcoded wages;
• . reg earnwkl3 age age2 educ black hispanic union;
20
OLS/Tobit when Income is Topcoded at $750
OLS Tobit QF Tobit/
OLS
Age 0.06350 0.0704 0.0750 0.902
Age2 -6.4E-4 -6.9E-4 -7.4E-4 0.927
Educ 0.0614 0.0755 0.0817 0.813
Black -0.2013 -0.2211 -0.2326 0.910
Hispanic -0.1151 -0.1054 -0.1053 1.092
Union 0.1493 0.1318 0.1161 1.132