View
217
Download
2
Embed Size (px)
Citation preview
CS 589Information Risk Management
6 February 2007
Today
• More Bayesian ideas – Empirical Bayes
• Your presentations
• Prior Distributions for selected distribution parameters
• Updating Priors Posterior Distribution Updated Parameter Estimates
References
• A. R. Solow, “An Empirical Bayes Analysis of Volcanic Eruptions”, Mathematical Geology, 33, Vol.1, 2001.
• J. Geweke, Contemporary Bayesian Economics and Statistics. Wiley, 2005.
• S. L. Scott, “A Bayesian Paradigm for Designing Intrusion Detection Systems”, Computational Statistics and Data Analysis, 45, Vol. 1, 2003.
Why are we doing this?
• Model risks
• Model outcomes
• Use the models in a model of the decision situation to help us rank alternatives
• Gain deeper understanding of the problem and the context of the problem
Basic Relation
dfxP
fxPxf
||
|
The prior distribution in the numerator shouldbe selected with some care. The distribution in the denominator is known as the predictivedistribution.
Recall: Why Bayesian Approach?
• Incorporate prior knowledge into the analysis
• From Scott – synthesize probabilistic information from many sources
• Consider the following exercise:
• P(I) = .01; P(D|I) = .9, P(D not|I not) = .95.
• An intrusion alarm goes off. What is the probability that it’s really an intrusion?
Posterior Probabilities as a Function of Priors and Conditional Probabilities
00.10.20.30.40.50.60.70.80.9
1
0 0.2 0.4 0.6 0.8 1
P(I)
P(I|D
) P(D|I) =.95P(D|I) = .9P(D|I) = .7
Priors
• Prior for a Poisson parameter is Gamma
0,)(
)(
0)(
1)(
0
1
/1
xE
dxex
xexxf
x
x
Gamma(16, .125)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
>5.0% 5.0%90.0%1.254 2.887
Gamma(2, 1)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0 1 2 3 4 5 6 7 8
>5.0%90.0%0.355 4.744
Gamma(64, .03125)
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
0 1 2 3 4 5 6 7 8
>5.0% 5.0%90.0%1.607 2.428
Gamma Parameters
• How do we pick them?
• Expert
• Data
• Expert + Data
Recall Our Data Example
• Go from Data to Gamma Parameters
• We want to pick parameters that reflect the data
• We will have to use our judgment to decide on a final prior parametric estimate
Events in a 24-Hour Period
0
1
2
3
4
5
6
7
1 3 5 7 9 11 13 15 17 19 21 23
Hour
Eve
nts
HourEvent
1 32 13 54 65 26 37 18 09 3
10 511 412 1
HourEvent
13 214 315 216 517 118 319 420 221 322 523 224 3
Parameterization Ideas
• Distribution Mean = Data Mean
• Equate
– Cumulative/Frequency Distribution Data
– Sum of Distribution Frequency Data and 1
– Sum of Absolute Differences and 0
• Pick Criteria that fit best
We can formulate and optimize
• Pick the best parameters given what we know
• I used Excel and the Solver add-in
• Any optimization program will work
• Canned probability functions are preferred …
Use All the Data
• Several reasonable possibilities
• This will matter for updating purposes
• Use all data for the parameter estimate
• Use some of the data to estimate the gamma prior – and therefore the Poisson parameter – and the rest to illustrate the idea of updating the prior
Prior Distribution
• The prior should reflect our degree of certainty, or degree of belief, about the parameter we are estimating
• One way to deal with this is to consider distribution fractiles
• Use fractiles to help us develop the distribution that reflects the synthesis of what we know and what we believe
Prior + Information
• As we collect information, we can update our prior distribution and get a – we hope – more informative posterior distribution
• Recall what the distribution is for – in this case, a view of our parameter of interest
• The posterior mean is now the estimate for the Poisson lambda, and can be used in decision-making
Information
• For our Poisson parameter, information might consist of data similar to what we already collected in our example
• We update the Gamma, take the mean, and that’s our new estimate for the average occurrences of the event per unit of measurement.
Gamma(3.74, .769)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
-1 0 1 2 3 4 5 6 7 8
>5.0% 5.0%90.0%0.936 5.676
Sum of AbsoluteDifferences Minimized
Gamma(11.1, .259)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
-1 0 1 2 3 4 5 6
>5.0% 5.0%90.0%1.617 4.426
Updating
• It’s pretty intuitive
• Add the number of hourly intrusions to alpha
• Add the number of hours (that is, the number of hour intervals) to beta
• Be careful with beta – sometimes it’s written in inverse form, which means we need to add the inverse of the number of hourly units
Back to our Example
• Use the first 22 observations
• Update with the remaining 2
• What happens to
– Our distribution?
– Our Poisson parameter estimate?
• First, let’s get our new Prior
New Prior
• The first one is a result of minimizing the sum of absolute differences between probability computations and summing computed probabilities to 1
• The second is computed without the latter constraint
Gamma(11.166, .261)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
-1 0 1 2 3 4 5 6
>5.0% 5.0%90.0%1.643 4.481
Gamma(14.404, .202)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
-1 0 1 2 3 4 5 6
>5.0% 5.0%90.0%1.773 4.275
Updates
• What can we say about them vis-à-vis
– The original gamma estimate from all 24 points
– The measures we care about (mean, relative accuracy, etc.)
• Which one is “better”?
Gamma(19.404, .144)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
>5.0% 5.0%90.0%1.839 3.913
E(Lambda) = 2.79
Gamma(22.516, .125)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
>5.0% 5.0%90.0%1.915 3.856
E(Lambda) = 2.815
Another way to Observe Data
• In this case, we’ll use the next 12 hours
• And we’ll update our prior distributions
• Which one provides more accuracy?
• How would we know in a more realistic situation?
Gamma(46.062, .063)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
< >5.0% 5.0%90.0%2.236 3.639
E(Lambda) = 2.902
So, What’s the Conclusion?
• Do our updated priors make sense – especially in light of the original data-driven distribution?
• What can we say about the way in which observed data can impact our posterior distribution and the associated estimate for the Poisson parameter?
• What else can we conclude?
Another Prior Distribution
• Of interest in Information Risk – and risk in general – applications is the notion of the probability of a binary outcome– Intrusion/Non-Intrusion– Bad item/non-bad item
• In this case, we can model the probability of an event happening – or not
• The number of events of interest in a space of interest could be modeled using a binomial distribution
Example
• Suppose we know how many intrusion attempts (or any other event) happened in the course of normal operation of our system – and we know how many non-intrusion events happened.
• So our data would look something like the following slide
HourEvent Total Prob
1 3 172 0.0174422 1 152 0.0065793 5 106 0.047174 6 121 0.0495875 2 97 0.0206196 3 53 0.0566047 1 78 0.0128218 0 101 09 3 88 0.034091
10 5 93 0.053763
Now …
• We might be interested in the probability that a given input is malicious, bad, etc.
• How could we do this risk model?
• The binomial is a clear choice
• We know n for a given period
• We need p
• p seems to vary – what can we do?
A Model for p
• Develop a prior distribution for p that combines– The data– What we know that might not be in the data
• Use the expectation of the distribution for E(p)
• Use E(p) in our preliminary analysis
Another Prior
• The Prior Distribution model for the binomial p is a beta distribution.
• Binomial
• Beta
yny ppy
nyf
1
11 1)(
xxxf
Beta Prior
pE
The predictive distribution is theBeta-Binomial (you can look it up)
Like the Gamma prior for the Poisson, thisis very easy to update after observing data
Other Estimates
• Outcomes– These can be in the form of costs, both real and
opportunity– Distributions are better than point estimates if we
know that we don’t know the future
• Problem: Expected Value criterion can diminish the importance of our probability modeling efforts for events and outcomes
Outcome Distributions
• Unlike our discussion to this point, where the variable of interest has been associated with a discrete distribution, outcome distributions may be continuous in nature
• Normal, Lognormal, Logistic
• Usually estimating more than one parameter
• Possibly more complex prior – info – posterior structure
Homework
• I’m going to send you sample datasets
• I need team identification – same ones as today?
• Due at the beginning of class next week
• Presentation, not paper
• Also – please be ready to discuss the Scott paper