44
CS 589 Information Risk Management 6 February 2007

CS 589 Information Risk Management 6 February 2007

  • View
    217

  • Download
    2

Embed Size (px)

Citation preview

Page 1: CS 589 Information Risk Management 6 February 2007

CS 589Information Risk Management

6 February 2007

Page 2: CS 589 Information Risk Management 6 February 2007

Today

• More Bayesian ideas – Empirical Bayes

• Your presentations

• Prior Distributions for selected distribution parameters

• Updating Priors Posterior Distribution Updated Parameter Estimates

Page 3: CS 589 Information Risk Management 6 February 2007

References

• A. R. Solow, “An Empirical Bayes Analysis of Volcanic Eruptions”, Mathematical Geology, 33, Vol.1, 2001.

• J. Geweke, Contemporary Bayesian Economics and Statistics. Wiley, 2005.

• S. L. Scott, “A Bayesian Paradigm for Designing Intrusion Detection Systems”, Computational Statistics and Data Analysis, 45, Vol. 1, 2003.

Page 4: CS 589 Information Risk Management 6 February 2007

Why are we doing this?

• Model risks

• Model outcomes

• Use the models in a model of the decision situation to help us rank alternatives

• Gain deeper understanding of the problem and the context of the problem

Page 5: CS 589 Information Risk Management 6 February 2007

Basic Relation

dfxP

fxPxf

||

|

The prior distribution in the numerator shouldbe selected with some care. The distribution in the denominator is known as the predictivedistribution.

Page 6: CS 589 Information Risk Management 6 February 2007

Recall: Why Bayesian Approach?

• Incorporate prior knowledge into the analysis

• From Scott – synthesize probabilistic information from many sources

• Consider the following exercise:

• P(I) = .01; P(D|I) = .9, P(D not|I not) = .95.

• An intrusion alarm goes off. What is the probability that it’s really an intrusion?

Page 7: CS 589 Information Risk Management 6 February 2007

Posterior Probabilities as a Function of Priors and Conditional Probabilities

00.10.20.30.40.50.60.70.80.9

1

0 0.2 0.4 0.6 0.8 1

P(I)

P(I|D

) P(D|I) =.95P(D|I) = .9P(D|I) = .7

Page 8: CS 589 Information Risk Management 6 February 2007

Priors

• Prior for a Poisson parameter is Gamma

0,)(

)(

0)(

1)(

0

1

/1

xE

dxex

xexxf

x

x

Page 9: CS 589 Information Risk Management 6 February 2007

Gamma(16, .125)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

>5.0% 5.0%90.0%1.254 2.887

Page 10: CS 589 Information Risk Management 6 February 2007

Gamma(2, 1)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0 1 2 3 4 5 6 7 8

>5.0%90.0%0.355 4.744

Page 11: CS 589 Information Risk Management 6 February 2007

Gamma(64, .03125)

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

0 1 2 3 4 5 6 7 8

>5.0% 5.0%90.0%1.607 2.428

Page 12: CS 589 Information Risk Management 6 February 2007

Gamma Parameters

• How do we pick them?

• Expert

• Data

• Expert + Data

Page 13: CS 589 Information Risk Management 6 February 2007

Recall Our Data Example

• Go from Data to Gamma Parameters

• We want to pick parameters that reflect the data

• We will have to use our judgment to decide on a final prior parametric estimate

Page 14: CS 589 Information Risk Management 6 February 2007

Events in a 24-Hour Period

0

1

2

3

4

5

6

7

1 3 5 7 9 11 13 15 17 19 21 23

Hour

Eve

nts

Page 15: CS 589 Information Risk Management 6 February 2007

HourEvent

1 32 13 54 65 26 37 18 09 3

10 511 412 1

HourEvent

13 214 315 216 517 118 319 420 221 322 523 224 3

Page 16: CS 589 Information Risk Management 6 February 2007

Parameterization Ideas

• Distribution Mean = Data Mean

• Equate

– Cumulative/Frequency Distribution Data

– Sum of Distribution Frequency Data and 1

– Sum of Absolute Differences and 0

• Pick Criteria that fit best

Page 17: CS 589 Information Risk Management 6 February 2007

We can formulate and optimize

• Pick the best parameters given what we know

• I used Excel and the Solver add-in

• Any optimization program will work

• Canned probability functions are preferred …

Page 18: CS 589 Information Risk Management 6 February 2007

Use All the Data

• Several reasonable possibilities

• This will matter for updating purposes

• Use all data for the parameter estimate

• Use some of the data to estimate the gamma prior – and therefore the Poisson parameter – and the rest to illustrate the idea of updating the prior

Page 19: CS 589 Information Risk Management 6 February 2007

Prior Distribution

• The prior should reflect our degree of certainty, or degree of belief, about the parameter we are estimating

• One way to deal with this is to consider distribution fractiles

• Use fractiles to help us develop the distribution that reflects the synthesis of what we know and what we believe

Page 20: CS 589 Information Risk Management 6 February 2007

Prior + Information

• As we collect information, we can update our prior distribution and get a – we hope – more informative posterior distribution

• Recall what the distribution is for – in this case, a view of our parameter of interest

• The posterior mean is now the estimate for the Poisson lambda, and can be used in decision-making

Page 21: CS 589 Information Risk Management 6 February 2007

Information

• For our Poisson parameter, information might consist of data similar to what we already collected in our example

• We update the Gamma, take the mean, and that’s our new estimate for the average occurrences of the event per unit of measurement.

Page 22: CS 589 Information Risk Management 6 February 2007

Gamma(3.74, .769)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

-1 0 1 2 3 4 5 6 7 8

>5.0% 5.0%90.0%0.936 5.676

Sum of AbsoluteDifferences Minimized

Page 23: CS 589 Information Risk Management 6 February 2007

Gamma(11.1, .259)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

-1 0 1 2 3 4 5 6

>5.0% 5.0%90.0%1.617 4.426

Page 24: CS 589 Information Risk Management 6 February 2007

Updating

• It’s pretty intuitive

• Add the number of hourly intrusions to alpha

• Add the number of hours (that is, the number of hour intervals) to beta

• Be careful with beta – sometimes it’s written in inverse form, which means we need to add the inverse of the number of hourly units

Page 25: CS 589 Information Risk Management 6 February 2007

Back to our Example

• Use the first 22 observations

• Update with the remaining 2

• What happens to

– Our distribution?

– Our Poisson parameter estimate?

• First, let’s get our new Prior

Page 26: CS 589 Information Risk Management 6 February 2007

New Prior

• The first one is a result of minimizing the sum of absolute differences between probability computations and summing computed probabilities to 1

• The second is computed without the latter constraint

Page 27: CS 589 Information Risk Management 6 February 2007

Gamma(11.166, .261)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

0.45

0.50

-1 0 1 2 3 4 5 6

>5.0% 5.0%90.0%1.643 4.481

Page 28: CS 589 Information Risk Management 6 February 2007

Gamma(14.404, .202)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

-1 0 1 2 3 4 5 6

>5.0% 5.0%90.0%1.773 4.275

Page 29: CS 589 Information Risk Management 6 February 2007

Updates

• What can we say about them vis-à-vis

– The original gamma estimate from all 24 points

– The measures we care about (mean, relative accuracy, etc.)

• Which one is “better”?

Page 30: CS 589 Information Risk Management 6 February 2007

Gamma(19.404, .144)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

>5.0% 5.0%90.0%1.839 3.913

E(Lambda) = 2.79

Page 31: CS 589 Information Risk Management 6 February 2007

Gamma(22.516, .125)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

-0.5

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

>5.0% 5.0%90.0%1.915 3.856

E(Lambda) = 2.815

Page 32: CS 589 Information Risk Management 6 February 2007

Another way to Observe Data

• In this case, we’ll use the next 12 hours

• And we’ll update our prior distributions

• Which one provides more accuracy?

• How would we know in a more realistic situation?

Page 33: CS 589 Information Risk Management 6 February 2007

Gamma(46.062, .063)

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

< >5.0% 5.0%90.0%2.236 3.639

E(Lambda) = 2.902

Page 34: CS 589 Information Risk Management 6 February 2007

So, What’s the Conclusion?

• Do our updated priors make sense – especially in light of the original data-driven distribution?

• What can we say about the way in which observed data can impact our posterior distribution and the associated estimate for the Poisson parameter?

• What else can we conclude?

Page 35: CS 589 Information Risk Management 6 February 2007

Another Prior Distribution

• Of interest in Information Risk – and risk in general – applications is the notion of the probability of a binary outcome– Intrusion/Non-Intrusion– Bad item/non-bad item

• In this case, we can model the probability of an event happening – or not

• The number of events of interest in a space of interest could be modeled using a binomial distribution

Page 36: CS 589 Information Risk Management 6 February 2007

Example

• Suppose we know how many intrusion attempts (or any other event) happened in the course of normal operation of our system – and we know how many non-intrusion events happened.

• So our data would look something like the following slide

Page 37: CS 589 Information Risk Management 6 February 2007

HourEvent Total Prob

1 3 172 0.0174422 1 152 0.0065793 5 106 0.047174 6 121 0.0495875 2 97 0.0206196 3 53 0.0566047 1 78 0.0128218 0 101 09 3 88 0.034091

10 5 93 0.053763

Page 38: CS 589 Information Risk Management 6 February 2007

Now …

• We might be interested in the probability that a given input is malicious, bad, etc.

• How could we do this risk model?

• The binomial is a clear choice

• We know n for a given period

• We need p

• p seems to vary – what can we do?

Page 39: CS 589 Information Risk Management 6 February 2007

A Model for p

• Develop a prior distribution for p that combines– The data– What we know that might not be in the data

• Use the expectation of the distribution for E(p)

• Use E(p) in our preliminary analysis

Page 40: CS 589 Information Risk Management 6 February 2007

Another Prior

• The Prior Distribution model for the binomial p is a beta distribution.

• Binomial

• Beta

yny ppy

nyf

1

11 1)(

xxxf

Page 41: CS 589 Information Risk Management 6 February 2007

Beta Prior

pE

The predictive distribution is theBeta-Binomial (you can look it up)

Like the Gamma prior for the Poisson, thisis very easy to update after observing data

Page 42: CS 589 Information Risk Management 6 February 2007

Other Estimates

• Outcomes– These can be in the form of costs, both real and

opportunity– Distributions are better than point estimates if we

know that we don’t know the future

• Problem: Expected Value criterion can diminish the importance of our probability modeling efforts for events and outcomes

Page 43: CS 589 Information Risk Management 6 February 2007

Outcome Distributions

• Unlike our discussion to this point, where the variable of interest has been associated with a discrete distribution, outcome distributions may be continuous in nature

• Normal, Lognormal, Logistic

• Usually estimating more than one parameter

• Possibly more complex prior – info – posterior structure

Page 44: CS 589 Information Risk Management 6 February 2007

Homework

• I’m going to send you sample datasets

• I need team identification – same ones as today?

• Due at the beginning of class next week

• Presentation, not paper

• Also – please be ready to discuss the Scott paper