Tobit models

Econ 60303

Bill Evans

Example: Bias in censored models

• Bivariate regression

• xi and ε are drawn from N(0,1)

• yi = α + xi β + εi

• Let α=0 and β=1 (45o line) and construct y

• Estimate yi = α + xi β + εi

• Consider three LHS variables

• y1 is as reported (no censoring)

• y2=min(1,y1)

– censored 23.9%

• y3=min(0.25,y1)

– Censored 41.8% of the time

Figure 1: Plot of X and Y1












-4 -3 -2 -1 0 1 2 3 4




Figure 1: Plot of X and Y2












-4 -3 -2 -1 0 1 2 3 4





Figure 1: Plot of X and Y3












-4 -3 -2 -1 0 1 2 3 4





OLS Estimate of α and β

Dependent Variable Ratio, βYj/ βY1

Y1 Y2 Y3

α 0.027 -0.189 -0.432

β 1.023 0.755 0.565 0.738 0.553

% cen.


0 0.239




OLS using Y1

Tobit using Y2

Tobit using Y3

α 1.0229






β 0.027






Example from CPS

• Data from the 1987 CPS out-going rotation group

• Households in CPS for same four months in a two year period (April-July 1987 and 1988)

• ¼ leave the sample temporarily or permanently each month

• In these months, answer detailed questions about current employment

• Union status

• Usual hours, hours of overtime

• Usual weekly earnings

• In each survey, weekly earnings are ‘topcoded’

• In the data we use (1987), topcoded at $999

• Sample, 25% random sample of full-time/full year male workers, 21-64

------------------------------------------------------------------------------- storage display value variable name type format label variable label ------------------------------------------------------------------------------- age byte %9.0g age in years race byte %9.0g 1=white, non-hisp, 2=black, n.h, 3=hisp educ byte %9.0g years of education unionm byte %9.0g 1=union member, 2=otherwise smsa byte %9.0g 1=live in 19 largest smsa, 2=other smsa, 3=non smsa region byte %9.0g 1=east, 2=midwest, 3=south, 4=west earnwke int %9.0g usual weekly earnings -------------------------------------------------------------------------------

. gen earnwkl=ln(earnwke);

. gen union=unionm==1;

. gen topcode=earnwke==999;

. gen black=race==2;

. gen hispanic=race==3;

. * get frequencie of topcode;

. tabulate topcode; =1 if | earnwkl is | topcoded | Freq. Percent Cum. ------------+----------------------------------- 0 | 18,474 92.81 92.81 1 | 1,432 7.19 100.00 ------------+----------------------------------- Total | 19,906 100.00

Need a variableThat identifies What obs are censored

FractionOf obs topcoded

• . *run simple regression on topcoded data;

• . reg earnwkl age age2 educ black hispanic union;

• [delete results]

• . * run tobit model;• . * here, ul specifies that the dependent variable is;• . * topcoded above (upper censoring);• . tobit earnwkl age age2 educ black hispanic union, ul;

Tobit regression Number of obs = 19906 LR chi2(6) = 7309.06 Prob > chi2 = 0.0000 Log likelihood = -13207.534 Pseudo R2 = 0.2167 ------------------------------------------------------------------------------ earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .0703864 .00214 32.89 0.000 .0661919 .074581 age2 | -.0006948 .0000262 -26.55 0.000 -.0007461 -.0006435 educ | .0757658 .0012172 62.25 0.000 .07338 .0781515 black | -.2200147 .011795 -18.65 0.000 -.2431339 -.1968954 hispanic | -.1058161 .0141638 -7.47 0.000 -.1335783 -.0780539 union | .1191111 .0077791 15.31 0.000 .1038634 .1343588 _cons | 3.499009 .0421806 82.95 0.000 3.416332 3.581686 -------------+---------------------------------------------------------------- /sigma | .4530426 .0023983 .4483418 .4577434 ------------------------------------------------------------------------------ Obs. summary: 0 left-censored observations 18474 uncensored observations 1432 right-censored observations at earnwkl>=6.906755

Similar to RMSE

egen q=mean(topcode) if earnwke>=750; gen alpha=ln(q)/(ln(750) - ln(999)); gen ey_y999=999*alpha/(alpha-1); sum q alpha ey_y999;

. sum q alpha ey_y999; Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- q | 3277 .436985 0 .436985 .436985 alpha | 3277 2.887721 0 2.887721 2.887721 ey_y999 | 3277 1528.21 0 1528.21 1528.21

E[Y | Y>c] = αc/(α-1)

α = 2.89

E[Y | Y>999] = (2.89)(999)/(1.89) = 1528

OLS/Tobit when Income is Topcoded at $999

OLS Tobit QF Tobit/


Age 0.0679 0.0704 0.0723 0.964

Age2 -6.8E-4 -6.9E-4 -7.1E-4 0.985

Educ 0.0701 0.0757 0.0796 0.926

Black -0.2130 -0.2200 -0.2252 0.968

Hispanic -0.1096 -0.1058 -0.1049 1.036

Union 0.1316 0.1191 0.1078 1.105

• . * artifically topcode wages at 750;

• . gen top750=earnwke>=750;

• . gen earnwkl3=top750*ln(750) + (1-top750)*ln(earnwke);

• . * run regression on model with artificially topcoded wages;

• . reg earnwkl3 age age2 educ black hispanic union;

OLS/Tobit when Income is Topcoded at $750

OLS Tobit QF Tobit/


Age 0.06350 0.0704 0.0750 0.902

Age2 -6.4E-4 -6.9E-4 -7.4E-4 0.927

Educ 0.0614 0.0755 0.0817 0.813

Black -0.2013 -0.2211 -0.2326 0.910

Hispanic -0.1151 -0.1054 -0.1053 1.092

Union 0.1493 0.1318 0.1161 1.132
