Discrete Choice Modeling

Discrete Choice Modeling

William GreeneStern School of BusinessIFS at UCLFebruary 11-13, 2004

http://cemmap.ifs.org.uk/resources/files/resources_greene_discrete.shtml

Part 3

Modeling Binary Choice

A Model for Binary Choice Yes or No decision (Buy/Not buy) Example, choose to fly or not to fly to a destination

when there are alternatives. Model: Net utility of flying Ufly = +1Cost + 2Time + Income + Choose to fly if net utility is positive Data: X = [1,cost,terminal time] Z = [income]

y = 1 if choose fly, Ufly > 0, 0 if not.

What Can Be Learned from the Data? (A Sample of Consumers, i = 1,…,N)

• Are the attributes “relevant?”

• Predicting behavior

- Individual

- Aggregate

• Analyze changes in behavior when

attributes change

Application 210 Commuters Between Sydney and

Melbourne Available modes = Air, Train, Bus, Car Observed:

Choice Attributes: Cost, terminal time, other Characteristics: Household income

First application: Fly or other

Binary Choice Data

Choose Air Gen.Cost Term Time Income1.0000 86.000 25.000 70.000.00000 67.000 69.000 60.000.00000 77.000 64.000 20.000.00000 69.000 69.000 15.000.00000 77.000 64.000 30.000.00000 71.000 64.000 26.000.00000 58.000 64.000 35.000.00000 71.000 69.000 12.000.00000 100.00 64.000 70.0001.0000 158.00 30.000 50.0001.0000 136.00 45.000 40.0001.0000 103.00 30.000 70.000.00000 77.000 69.000 10.0001.0000 197.00 45.000 26.000.00000 129.00 64.000 50.000.00000 123.00 64.000 70.000

An Econometric Model Choose to fly iff UFLY > 0

Ufly = +1Cost + 2Time + Income + Ufly > 0

> -(+1Cost + 2Time + Income) Probability model: For any person observed by the

analyst, Prob(fly) = Prob[ > -(+1Cost + 2Time + Income)]

Note the relationship between the unobserved and the outcome

A Regression - Like Model

INDEX

.2

.4

.6

.8

1.0

.0-1.8 -.6 .6 1.8 3.0-3.0

Pr[

Fly

]

+1Cost + 2TTime + Income

Econometrics How to estimate , 1, 2, ?

It’s not regression The technique of maximum likelihood

Prob[y=1] =

Prob[ > -(+1Cost + 2Time + Income)] Prob[y=0] = 1 - Prob[y=1]

Requires a model for the probability

0 1Prob[ 0] Prob[ 1]

y yL y y

Completing the Model: F() The distribution

Normal: PROBIT, natural for behavior Logistic: LOGIT, allows “thicker tails” Gompertz: EXTREME VALUE, asymmetric, underlies

the basic logit model for multiple choice Does it matter?

Yes, large difference in estimates Not much, quantities of interest are more stable.

Estimated Binary Choice Model+---------------------------------------------+| Binomial Probit Model || Maximum Likelihood Estimates || Model estimated: Jan 20, 2004 at 04:08:11PM.|| Dependent variable MODE || Weighting variable None || Number of observations 210 || Iterations completed 6 || Log likelihood function -84.09172 || Restricted log likelihood -123.7570 || Chi squared 79.33066 || Degrees of freedom 3 || Prob[ChiSqd > value] = .0000000 || Hosmer-Lemeshow chi-squared = 46.96547 || P-value= .00000 with deg.fr. = 8 |+---------------------------------------------++---------+--------------+----------------+--------+---------+----------+|Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|+---------+--------------+----------------+--------+---------+----------+ Index function for probability Constant .43877183 .62467004 .702 .4824 GC .01256304 .00368079 3.413 .0006 102.647619 TTME -.04778261 .00718440 -6.651 .0000 61.0095238 HINC .01442242 .00573994 2.513 .0120 34.5476190

Estimated Binary Choice Models

LOGIT PROBIT EXTREME VALUE

Variable Estimate t-ratio Estimate t-ratio Estimate t-ratio

Constant 1.78458 1.40591 0.438772 0.702406 1.45189 1.34775

GC 0.0214688 3.15342 0.012563 3.41314 0.0177719 3.14153

TTME -0.098467 -5.9612 -0.0477826 -6.65089 -0.0868632 -5.91658

HINC 0.0223234 2.16781 0.0144224 2.51264 0.0176815 2.02876

Log-L -80.9658 -84.0917 -76.5422

Log-L(0) -123.757 -123.757 -123.757

A Regression - Like Model

INDEX

.2

.4

.6

.8

1.0

.0-1.8 -.6 .6 1.8 3.0-3.0

Pr[

Fly

]

+1Cost + 2Time + (Income+1)

Effect on predicted probability of an increase in income

( is positive)

How Well Does the Model Fit? There is no R squared “Fit measures” computed from log L

“pseudo R squared = 1 – logL0/logL Others… - these do not measure fit.

Direct assessment of the effectiveness of the model at predicting the outcome

Fit Measures for Binary Choice Likelihood Ratio Index

Bounded by 0 and 1 Rises when the model is expanded

Cramer (and others)ˆ ˆ ˆ F | = 1 - F | = 0

=

Mean y Mean y reward for correct predictions minus

penalty for incorrect predictions

Fit Measures for the Logit Model+----------------------------------------+| Fit Measures for Binomial Choice Model || Probit model for variable MODE |+----------------------------------------+| Proportions P0= .723810 P1= .276190 || N = 210 N0= 152 N1= 58 || LogL = -84.09172 LogL0 = -123.7570 || Estrella = 1-(L/L0)^(-2L0/n) = .36583 |+----------------------------------------+| Efron | McFadden | Ben./Lerman || .45620 | .32051 | .75897 || Cramer | Veall/Zim. | Rsqrd_ML || .40834 | .50682 | .31461 |+----------------------------------------+| Information Akaike I.C. Schwarz I.C. || Criteria .83897 189.57187 |+----------------------------------------+

Pseudo – R-squared

Predicting the Outcome

Predicted probabilities

P = F(a + b1Cost + b2Time + cIncome) Predicting outcomes

Predict y=1 if P is large Use 0.5 for “large” (more likely than not)

Count successes and failures

Individual Predictions from a Logit Model

Observation Observed Y Predicted Y Residual x(i)b Pr[Y=1]

81 .00000 .00000 .0000 -3.3944 .0325

85 .00000 .00000 .0000 -2.1901 .1006

89 1.0000 .00000 1.0000 -2.6766 .0644

93 1.0000 1.0000 .0000 .8113 .6924

97 1.0000 1.0000 .0000 2.6845 .9361

101 1.0000 1.0000 .0000 2.4457 .9202

105 1.0000 .00000 1.0000 -3.2204 .0384

109 1.0000 1.0000 .0000 .0311 .5078

113 .00000 .00000 .0000 -2.1704 .1024

117 .00000 .00000 .0000 -3.3729 .0332

445 .00000 1.0000 -1.0000 .0295 .5074

Note two types of errors and two types of successes.

Predictions in Binary Choice Predict y = 1 if P > P*

Success depends on the assumed P*

ROC Curve Plot %Y=1 correctly predicted vs. %y=1

incorrectly predicted 450 is no fit. Curvature implies fit. Area under the curve compares models

Aggregate PredictionsFrequencies of actual & predicted outcomes

Predicted outcome has maximum probability.

Threshold value for predicting Y=1 = .5000

Predicted

------ ---------- + -----

Actual 0 1 | Total

------ ---------- + -----

0 151 1 | 152

1 20 38 | 58

------ ---------- + -----

Total 171 39 | 210

Analyzing PredictionsFrequencies of actual & predicted outcomes

Predicted outcome has maximum probability.

Threshold value for predicting Y=1 is P* .5000.

(This table can be computed with any P*.)

Predicted

------ -------------------- + -----

Actual 0 1 | Total

------ ----------------------+-------

0 N(a0,p0) N(a0,p1) | N(a0)

1 N(a1,p0) N(a1,p1) | N(a1)

------ ----------------------+ -----

Total N(p0) N(p1) | N

Analyzing Predictions - Success

Sensitivity = % actual 1s correctly predicted = 100N(a1,p1)/N(a1) % [100(38/58)=65.5%]

Specificity = % actual 0s correctly predicted = 100N(a0,p0)/N(a0) % [100(151/152)=99.3%]

Positive predictive value = % predicted 1s that were actual 1s = 100N(a1,p1)/N(p1) % [100(38/39)=97.4%]

Negative predictive value = % predicted 0s that were actual 0s = 100N(a0,p0)/N(p0) % [100(151/171)=88.3%]

Correct prediction = %actual 1s and 0s correctly predicted = 100[N(a1,p1)+N(a0,p0)]/N [100(151+38)/210=90.0%]

Analyzing Predictions - Failures False positive for true negative = %actual 0s predicted as 1s

= 100N(a0,p1)/N(a0) % [100(1/152)=0.668%]

False negative for true positive = %actual 1s predicted as 0s = 100N(a1,p0)/N(a1) % [100(20/258)=34.5%]

False positive for predicted positive = % predicted 1s that were actual 0s = 100N(a0,p1)/N(p1) % [100(1/39)=2/56%]

False negative for predicted negative = % predicted 0s that were actual 1s = 100N(a1,p0)/N(p0) % [100(20/171)=11.7%]

False predictions = %actual 1s and 0s incorrectly predicted = 100[N(a0,p1)+N(a1,p0)]/N [100(1+20)/210=10.0%]

Aggregate Prediction is a Useful Way to Assess the Importance of a Variable

Frequencies of actual & predicted outcomes. Predicted outcome has maximum probability. Threshold value for predicting Y=1 = .5000

Predicted

------ ---------- + -----

Actual 0 1 | Total

------ ---------- + -----

0 145 7 | 152

1 48 10 | 58

------ ---------- + -----

Total 193 17 | 210

Predicted

------ ---------- + -----

Actual 0 1 | Total

------ ---------- + -----

0 151 1 | 152

1 20 38 | 58

------ ---------- + -----

Total 171 39 | 210

Model fit without TTME

Model fit with TTME

Documents

Discrete Choice Modeling