Upload
douglas-fraser
View
217
Download
0
Embed Size (px)
Citation preview
7/27/2019 SYS7002_HW
1/7
SYS 7002 Financial Engineering Homework
Problem Description
Situation
Financial institutions have historical information regarding the outcome of previous customer loans.
This information includes demographic factors and their credit performance (good or bad). This
information can be used to create establish a credit scores and risk profiles. The risk models can be used
to predict the performance of potential customers based on their relevant characteristics.
Goal
The objective of credit scoring is to be able to predict the probability that a borrower will default or not
default on a credit line. This objective of this specific assignment is to answer the following:
1) Calculate: (1) the log-odds score and (2) the naive Bayes score for the data set in the fileOWNAGERECDES.xls".
2) Plot the ROC curve for each score.3) Which score is more discriminating?
Approach
Data
The provided data is an excel file containing seven categorical variables (six predictor variables and one
response) and two hundred observations. The predictor variable is if a default occurred bad or not
good. There are one hundred good observations and one hundred bad observations. The
remaining categorical predictor variables are binary values (0 or 1) indicating the housing status and age
range. Table 1 is a summary of the variable labels.
Table 1 Categorical Variable Labels
Analysis and Evidence
Using conditional counts of the data, establish a value for the probability of good or bad conditional on
the data for each of the six predictor variables.
Pr(Good|data) = p(G|x); Pr(Bad|data) = p(B|x)
G=1/B=0 Own Rent Age:
7/27/2019 SYS7002_HW
2/7
Use the above probabilities to determine the posterior odds of a good.
o(G|x) = p(G|x) / p(B|x)
In this case, since there are an equivalent number of good and bad observations, the population odds:
p(G)/p(B) = 1
From slide 28 of the lecture:
[1]
For this data set, the information odds, I(x), are equivalent to the posterior odds of a good, o(G|x).
The weight of evidence in favor of good, wi(xi), is used to weight the characteristics, x, and establish a
numerical value for the credit score, s(x).
s(x) = w1*x1 + w2*x2 + wi*xi
This summation of weighted characteristics is the naive Bayes score.
The first ten rows of the attached spreadsheet illustrate the data and the calculated naive Bayes scores.
Opop x I(x)
O(own) 2.40
O(rent) 0.53
O(
7/27/2019 SYS7002_HW
3/7
Where the score was calculated by a summation of the product of the predictive variables and the log
odds value:
0(0.88) + 1(-0.63) + 1(-1.39) + 0(-0.69) + 0(1.10) + 0(1.39) = -2.014903
The log odds score is developed by using the counts (occurrences) for the conditionally independent
characteristics age and residential status.
These counts are used to generate a corresponding matrix of probabilities (fractions).
The probabilities are used to generate a log odds score by taking the natural log of the posterior odds of
a good.
Log odds score = ln(o(G|x)) = ln(p(G|x) / p(B|x))
G=1/B=0 Own Rent Age:
7/27/2019 SYS7002_HW
4/7
A summary of both credit scores versus each demographic category:
In order to create a ROC curve for each credit score, the rate of true positive and false positive defaults
was calculated. This process involved:
- Examining the range of values for each credit score- Selecting thresholds that ensured a change in rate- Perform a count of false positive, true positive, false negative, true negative occurrences for
each threshold
- Calculate the false positive and true positive rate- Plot TP and FP rate for each credit score
The completed ROC curve is illustrated below along with the data tables. This is also in the attached
spreadsheet.
p(G|x)
own/age
7/27/2019 SYS7002_HW
5/7
Recommendation
Based on the Summary of Scores table, the naive Bayes offers higher discrimination for the demographic
categories considered.
threshold LO_FP_rate LO_TP_rate FP TP FN TN
1.7 0 0 0 0 100 100
1.1 0.05 0.25 5 25 75 95
0 0.25 0.85 25 85 15 75
-1.7 0.35 0.9 35 90 10 65-1.9 0.65 0.95 65 95 5 35
-2 1 1 100 100 0 0
log odds scores
7/27/2019 SYS7002_HW
6/7
Bonus:
naive
bins cum
score|good score|bad score|good score|bad
-2.1 0 0 0 0
-2 5 30 5 30
-1.3 5 35 10 65
-0.5 5 10 15 75
0.2 15 5 30 80
0.5 15 5 45 85
0.8 15 5 60 90
2 15 5 75 95
2.5 25 5 100 100
7/27/2019 SYS7002_HW
7/7
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
True
Positive
Rate
False Positive Rate
ROC Curves
log odds
NAIVE