Upload
aliknf
View
213
Download
0
Embed Size (px)
Citation preview
7/25/2019 W4INSE6220
1/9
1
INSE 6320 -- Week 4
Risk Analysis for Information and Systems Engineering
Statistical Inference Weibull Analysis
Survival Analysis
Dr. A. Ben Hamza Concordia University
2
Risk Assessment
Probabilistic risk assessment (PRA) is characterized by two quantities:
the magnitude (severity) of the possible adverse consequences, and the likelihood (probability) of occurrence of each consequence.
Hazard: the property of a substance or situation with the potential forcreating damage
Ahazard analysis is a process used to assess risk. The results of a hazard analysis is the identification of unacceptable
risks and the selection of means of controlling or eliminating them.
Probability is a way to predict stochastic events
Risk: the likelihood of a specific effect within a specified period
3
Population at risk
Individual Risk
Individual risk is the risk of fatality or injury to any identifiable (named)
individual who lives within the zone impacted by a hazard, or follows a particular
pattern of life, that might subject him or her to the consequences of a hazard.
Societal Risk
Societal risk is the risk of multiple fatalities or injuries in the society as a whole,
and where society would have to carry the burden of a hazard causing a number
of deaths, injury, financial, environmental, and other losses.
4
Individual risk
Cause Probability / year Cause Probability / year
All causes (ill ne ss) 1.19E- 02 Rock climbing 8 .00E -03
Cancer 2.80E-03 Canoeing 2.00E-03
Road accidents 1.00E-04 Hang-gliding 1.50E-03
Acci dents at home 9.30E- 05 Motor cycli ng 2 .40E -04Fire 1.50E-05 Mining 9.00E-04
Drowning 6.00E-06 Fire fighting 8.00E-04
Excessive cold 8.00E-06 Police 2.00E-04
Li ghtning 1.00E-07 Accidents at offices 4.50E-06
Individual risk can be calculated as the total risk divided by the populationat risk.
For example, if a region with a population of one million peopleexperiences on average 5 deaths from flooding per year, the individual risk
of being killed by a flood in that region is 5/1,000,000, usually expressed inorders of magnitude as 5106.
7/25/2019 W4INSE6220
2/9
5
How to express ri sk?
What is the risk of flying by airplane? Is it higher than driving a car?
6Bayes Formula
where:
Bi= ith event ofkmutually exclusive
and collectively exhaustive events
A = new event that might impactP(Bi)
))P(BB|P(A))P(BB|P(A))P(BB|P(A
))P(BB|P(AA)|P(B
kk2211
iii
Bayes Theorem is used to revise previously calculated
probabilities based on new information.
Developed by Thomas Bayes in the 18th Century.
It is an extension of conditional probability. Bayes, Thomas (1763)
7
Bayes Formula Example
A drilling company has estimated a 40% chance of striking oil for theirnew well.
A detailed test has been scheduled for more information. Historically,60% of successful wells have had detailed tests, and 20% of
unsuccessful wells have had detailed tests. Given that this well has been scheduled for a detailed test, what is the
probability that the well will be successful?
8
Let S= successful well
U= unsuccessful well
P(S) = 0.4 , P(U) = 0.6 (prior probabilities) Define the detailed test event asD
Conditional probabilities:
P(D|S) = 0.6 P(D|U) = 0.2
Goal is to find P(S|D)
Bayes Forumula Example(continued)
7/25/2019 W4INSE6220
3/9
9
( | ) ( )( | )
( | ) ( ) ( | ) ( )
(0.6)(0.4)
(0.6)(0.4) (0.2)(0.6)
0.240.667
0.24 0.12
P D S P S P S D
P D S P S P D U P U
Bayes Formula Example(continued)
Apply Bayes Formula:
So the revised probability of success, given that this well has beenscheduled for a detailed test, is0.667
10
Given the detailed test, the revised probability of asuccessful well has risen to 0.667 from the originalestimate of 0.4
Bayes Formula Example
EventPrior
Prob.
Conditional
Prob.
Joint
Prob.
Revised
Prob.
S (successful) 0.4 0.6 (0.4)(0.6) = 0.24 0.24/0.36 = 0.667
U (unsuccessful) 0.6 0.2 (0.6)(0.2) = 0.12 0.12/0.36 = 0.333
Sum = 0.36
(continued)
11Parameter Estimation
12
Parameter estimation
The parameters of a pdf are constants that characterize
its shape, e.g.
r.v.
Suppose we have a sample of observed values:
parameter
We want to fi nd some function of the data to estimate the
parameter(s):
estimator written with a hat
Sometimes we say estimator for the function of x1, ..., xn;estimate for t he value of the estimator wi th a particular data set.
7/25/2019 W4INSE6220
4/9
13Maximum Likelihood Estimation
We want to pick the that maximizesL:
Often easier to maximize lnL. BothL and lnL are maximum at the same location. We maximize lnLrather thanL itself because lnL converts the product into a summation. The new maximization condition is:
could be an array of parameters (e.g. slope and intercept) or just a single variable.
1* *
lnln ( ; ) 0
n
i
i
Lf x
14
Example:
At a glass manufacturing company, 30 randomly selected sheets
of glass are inspected. If the dist. of the number of flaws per
sheet is taken to have a Poisson dist. How should the parameter be estimated?
Themaximum likelihood estimate:
1( )
11 1
1 1
1
( , ) , so that the likelihood is!
( , , , ) ( , )( ! !)
The log-likelihood is therefore
ln( ) ( ) ln( ) ln( ! !)
( )ln( ) so that 0
i
n
x
i
i
x xnn
n i
i n
n n
n
ef x
x
eL x x f x
x x
L n x x x x
x xLn x
15
A random variable T ~ is said to have the Weibull Probability
Distribution with parameters and, where > 0 and > 0, if the probabilitydensity function is
, for 0
, elsewhere
where is the Shape Parameter, is the Scale Parameter, tis the mission length(time, cycles, etc.). The scale parameter (also called the characteristic life) is the
time at which 63.2% of the product will have failed .
If = 1, the Weibull reduces to the Exponential Distribution.
Weibull distribution is frequently used to model fatigue failure, ball bearing failure
etc. (very long tails)
t
et1
0
t)( tf
Weibull Probability Distribution
( , )Weibull
16
Probability Density Function
f(t)
t
t is in multiples of
1.8
1.6
1.4
1.2
1.0
0.8
0.6
0.4
0.2
0
=0.5
=5.0
=3.44
=2.5=1.0
0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4
Weibull Probability Distribution
7/25/2019 W4INSE6220
5/9
17
for t 0
t
-etTPtF
1)(
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200
probability,
p
t
F(t) for various and = 100
= 0.5
= 1
= 3
= 5F(t)
Weibull Probability Distribution
)(1)( tFtR
The reliability of a product is the probability that it does not fail before time t.
It is therefore the complement of the CDF:
18
Mean or Expected Value
Standard Deviation ofT
11)(
XE
Properties of the Weibull Distribution
2
1
2 11
12
22 )()( aa where
( 1) ( )a a a
19
LetT= the ultimate tensile strength (ksi) at -200
degrees F of a type of steel that exhibits coldbrittleness at low temperatures. SupposeThas a
Weibull distribution with parameters = 20,and = 100. Find:
(a) P( T 105)
(b) P(98T 102)
Weibull Distribution - Example
20
(a) P( T 105) = F(105; 100, 20)
(b) P(98T 102)= F(102; 100, 20) -F(98; 100, 20)
930.0070.01120)100/105( e
287.0226.0513.0
2020 )02.1()98.0( ee
Weibull Distribution - Example Solution
7/25/2019 W4INSE6220
6/9
21
The random variableTcan be modeled by a Weibull distribution with = and = 1000. The spec time limit is set at t= 4000. What is the proportionof items not meeting spec?
Weibull Distribution - Example
4000P T
That is, all but about 13.53% of the items will not meet spec.
1353.0
e
e
)4000(F1
)4000(P1
2
1000
40001/2
T
22
Derived from double logarithmic transformation of the Weibull
Distribution Function.
Of the form
where
Any straight line on Weibull Probability Plot is a Weibull Probability
Distribution Function with slope, and intercept, - log , where theordinate is log{log(1/[1-F(t)])} the abscissa is log t.
)/(1)( t
etF
baxy
)(11loglog
tFy
a tx loglogb
WeibullAnalysis: Weibull Probability Plot
23
Weibull Probability Plot in MATLAB
>> data = [1.03; 2.20; 1.55; 0.24; 1.83; 0.40; 0.87; 0.03; 2.24; 1.05; 2.05; 0.14; 3.68; 0.48; 0.41];
>> wblplot(data);
24
What is Survival Analysis?
A class of statistical methods for studying the occurrence and timingof events.
A class of methods for analyzing survival times (i.e., times to events).
A class of methods for analyzing survival probabilities at differentfollow-up times.
Not restricted to data with a certain distribution (non-parametric innature).
In many biomedical studies, the primary endpoint is time until anevent occurs (e.g. death, recurrence, new symptoms, etc.)
Data are typically subject tocensoringwhen a study ends before theevent occurs.
7/25/2019 W4INSE6220
7/9
7/25/2019 W4INSE6220
8/9
29
Kaplan-Meier Estimate of S(t)
Rank the failure times as t(1)t(2)t(n).
Number of items at risk before t(i) is ni Number of items failed at time t(i) is di Estimated hazard function at t(i):
Estimate of survival function
In medical research, the K-M estimate is often used to measure the fraction ofpatients living for a certain amount of time after treatment. In economics, it can beused to measure the length of time people remain unemployed after a job loss. In
engineering, it can be used to measure the time until failure of machine parts.
tt
i
ii
i n
dntS
)(
)(
i
ii
n
dh ^
The KaplanMeier estimator is the nonparametricmaximum likelihood estimate of S(t).
30
FailureTime (t)
Hazard Rate(h(t))
Cumulative HazardRate
0 0 0
t1 d1/n1 d1/n1
t2 d2/n2 h(t1) +d2/n2
... ... ...
tk dk/nk h(tk1) +dk/nk
Time (t) Survival Probability
(S(t))
0 1
t1 1*(1 h(t1))
t2 S(t1)*(1 h(t2))
... ...
tn S(tk1)*(1 h(tk))
Failure Rate
The hazard rate at each period is the number of failures in the given period divided by the
number of surviving individuals at the beginning of the period (number at risk).
Survival ProbabilityFor each period, the survival probability is the product of the complement of hazard rates.
The initial survival probability at the beginning of the first time period is 1. If the hazard
rate for the each period is h(ti), then the survivor probability is as shown.
Kaplan-Meier Estimate of S(t)
31
31
A Data Example
Failure Time
t(i)
Number Failed
n i
Number at Risk
d i
4 3 7
7 1 4
11 2 3
12 1 1
The number at risk is the total number of survivors at the beginning of each period. The
number at risk at the beginning of the first period is all individuals in the lifetime study. At
the beginning of each remaining period, the number at risk is reduced by the number of
failures plus individuals censored at the end of the previous period.
This life table shows fictitious survival data. At the beginning of the first failure time, there
are seven items at risk. At time 4, three fail. So at the beginning of time 7, there are four
items at risk. Only one fails at time 7, so the number at risk at the beginning of time 11 is
three. Two fail at time 11, so at the beginning of time 12, the number at risk is one. The
remaining item fails at time 12.
32A Data Example
We can compute the cumulative hazard rate, survival rate, and cumulative distribution
function for the following data as follows:
t (i) Number Failed
d i
Number at Risk
n i
Hazard Rate
h(t)=di/n i
Survival Probability
S(t)
Cumulative
Distribution
Function
F(t) = 1-S(t)
4 3 7 3/7 1 3/7 = 4/7 = 0.5714 0.4286
7 1 4 1/4 4/7*(1 1/4) = 3/7 = .4286 0.5714
11 2 3 2/3 3/7*(1 2/3) = 1/7 = 0.1429 0.8571
12 1 1 1/1 1/7*(1 1) = 0 1
In MATLAB, we can enter the data and calculate these measures usingecdf.
Suppose the failure times are stored in an arrayy.
>> y = [4 7 11 12];>> freq = [3 1 2 1];
>> [f,x] = ecdf(y,'frequency',freq)
7/25/2019 W4INSE6220
9/9
33A Censored Data Example
Time
t(i)
Number failed
d i
Censor i ng Num ber a t R isk
n i
Hazar d Rate Su rv ival
Probability
Cumulative
Distribution
Function
4 2 1 7 2/7 1 2/7 = 0.7143 0.2857
7 1 0 4 1/4 0.7143*(1 1/4) =
0.5357
0.4643
11 1 1 3 1/3 0.5357*(1 1/3) =0.3571
0.6429
12 1 0 1 1/1 0.3571*(1 1) = 0 1.0000
When you have censored data, the life table might look like the following:
At any given time, the censored items are also considered in the total of number at
risk, and the hazard rate formula is based on the number failed and the total number
at risk.
While updating the number at risk at the beginning of each period, the total number
failed and censored in the previous period is reduced from the number at risk at the
beginning of that period.
Notation: 1 for censored data, and 0 for exact failure time.
34A Censored Data Example: MATLAB
While using ecdf, you must also enter the censoring information using an array of
binary variables. Enter 1 for censored data, and enter 0 for exact failure time.
>> y = [4 4 4 7 11 11 12];
>> cens= [1 0 0 0 1 0 0];
>> [f,x] = ecdf(y,'censoring',cens)
ecdf, by default, produces the cumulative distribution function values. You have to
specify the survivor function or the hazard function using optional name-value pair
arguments. You can also plot the results as follows.
>> ecdf(y,'censoring',cens,'function','survivor');