SYS7002_HW

7/27/2019 SYS7002_HW

1/7

SYS 7002 Financial Engineering Homework

Problem Description

Situation

Financial institutions have historical information regarding the outcome of previous customer loans.

This information includes demographic factors and their credit performance (good or bad). This

information can be used to create establish a credit scores and risk profiles. The risk models can be used

to predict the performance of potential customers based on their relevant characteristics.

Goal

The objective of credit scoring is to be able to predict the probability that a borrower will default or not

default on a credit line. This objective of this specific assignment is to answer the following:

1) Calculate: (1) the log-odds score and (2) the naive Bayes score for the data set in the fileOWNAGERECDES.xls".

2) Plot the ROC curve for each score.3) Which score is more discriminating?

Approach

Data

The provided data is an excel file containing seven categorical variables (six predictor variables and one

response) and two hundred observations. The predictor variable is if a default occurred bad or not

good. There are one hundred good observations and one hundred bad observations. The

remaining categorical predictor variables are binary values (0 or 1) indicating the housing status and age

range. Table 1 is a summary of the variable labels.

Table 1 Categorical Variable Labels

Analysis and Evidence

Using conditional counts of the data, establish a value for the probability of good or bad conditional on

the data for each of the six predictor variables.

Pr(Good|data) = p(G|x); Pr(Bad|data) = p(B|x)

G=1/B=0 Own Rent Age:

7/27/2019 SYS7002_HW

2/7

Use the above probabilities to determine the posterior odds of a good.

o(G|x) = p(G|x) / p(B|x)

In this case, since there are an equivalent number of good and bad observations, the population odds:

p(G)/p(B) = 1

From slide 28 of the lecture:

[1]

For this data set, the information odds, I(x), are equivalent to the posterior odds of a good, o(G|x).

The weight of evidence in favor of good, wi(xi), is used to weight the characteristics, x, and establish a

numerical value for the credit score, s(x).

s(x) = w1*x1 + w2*x2 + wi*xi

This summation of weighted characteristics is the naive Bayes score.

The first ten rows of the attached spreadsheet illustrate the data and the calculated naive Bayes scores.

Opop x I(x)

O(own) 2.40

O(rent) 0.53

O(

7/27/2019 SYS7002_HW

3/7

Where the score was calculated by a summation of the product of the predictive variables and the log

odds value:

0(0.88) + 1(-0.63) + 1(-1.39) + 0(-0.69) + 0(1.10) + 0(1.39) = -2.014903

The log odds score is developed by using the counts (occurrences) for the conditionally independent

characteristics age and residential status.

These counts are used to generate a corresponding matrix of probabilities (fractions).

The probabilities are used to generate a log odds score by taking the natural log of the posterior odds of

a good.

Log odds score = ln(o(G|x)) = ln(p(G|x) / p(B|x))

G=1/B=0 Own Rent Age:

7/27/2019 SYS7002_HW

4/7

A summary of both credit scores versus each demographic category:

In order to create a ROC curve for each credit score, the rate of true positive and false positive defaults

was calculated. This process involved:

- Examining the range of values for each credit score- Selecting thresholds that ensured a change in rate- Perform a count of false positive, true positive, false negative, true negative occurrences for

each threshold

- Calculate the false positive and true positive rate- Plot TP and FP rate for each credit score

The completed ROC curve is illustrated below along with the data tables. This is also in the attached

spreadsheet.

p(G|x)

own/age

7/27/2019 SYS7002_HW

5/7

Recommendation

Based on the Summary of Scores table, the naive Bayes offers higher discrimination for the demographic

categories considered.

threshold LO_FP_rate LO_TP_rate FP TP FN TN

1.7 0 0 0 0 100 100

1.1 0.05 0.25 5 25 75 95

0 0.25 0.85 25 85 15 75

-1.7 0.35 0.9 35 90 10 65-1.9 0.65 0.95 65 95 5 35

-2 1 1 100 100 0 0

log odds scores

7/27/2019 SYS7002_HW

6/7

Bonus:

naive

bins cum

score|good score|bad score|good score|bad

-2.1 0 0 0 0

-2 5 30 5 30

-1.3 5 35 10 65

-0.5 5 10 15 75

0.2 15 5 30 80

0.5 15 5 45 85

0.8 15 5 60 90

2 15 5 75 95

2.5 25 5 100 100

7/27/2019 SYS7002_HW

7/7

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.2 0.4 0.6 0.8 1

True

Positive

Rate

False Positive Rate

ROC Curves

log odds

NAIVE

Documents

SYS7002_HW