W4INSE6220

7/25/2019 W4INSE6220

1/9

1

INSE 6320 -- Week 4

Risk Analysis for Information and Systems Engineering

Statistical Inference Weibull Analysis

Survival Analysis

Dr. A. Ben Hamza Concordia University

2

Risk Assessment

Probabilistic risk assessment (PRA) is characterized by two quantities:

the magnitude (severity) of the possible adverse consequences, and the likelihood (probability) of occurrence of each consequence.

Hazard: the property of a substance or situation with the potential forcreating damage

Ahazard analysis is a process used to assess risk. The results of a hazard analysis is the identification of unacceptable

risks and the selection of means of controlling or eliminating them.

Probability is a way to predict stochastic events

Risk: the likelihood of a specific effect within a specified period

3

Population at risk

Individual Risk

Individual risk is the risk of fatality or injury to any identifiable (named)

individual who lives within the zone impacted by a hazard, or follows a particular

pattern of life, that might subject him or her to the consequences of a hazard.

Societal Risk

Societal risk is the risk of multiple fatalities or injuries in the society as a whole,

and where society would have to carry the burden of a hazard causing a number

of deaths, injury, financial, environmental, and other losses.

4

Individual risk

Cause Probability / year Cause Probability / year

All causes (ill ne ss) 1.19E- 02 Rock climbing 8 .00E -03

Cancer 2.80E-03 Canoeing 2.00E-03

Road accidents 1.00E-04 Hang-gliding 1.50E-03

Acci dents at home 9.30E- 05 Motor cycli ng 2 .40E -04Fire 1.50E-05 Mining 9.00E-04

Drowning 6.00E-06 Fire fighting 8.00E-04

Excessive cold 8.00E-06 Police 2.00E-04

Li ghtning 1.00E-07 Accidents at offices 4.50E-06

Individual risk can be calculated as the total risk divided by the populationat risk.

For example, if a region with a population of one million peopleexperiences on average 5 deaths from flooding per year, the individual risk

of being killed by a flood in that region is 5/1,000,000, usually expressed inorders of magnitude as 5106.

7/25/2019 W4INSE6220

2/9

5

How to express ri sk?

What is the risk of flying by airplane? Is it higher than driving a car?

6Bayes Formula

where:

Bi= ith event ofkmutually exclusive

and collectively exhaustive events

A = new event that might impactP(Bi)

))P(BB|P(A))P(BB|P(A))P(BB|P(A

))P(BB|P(AA)|P(B

kk2211

iii

Bayes Theorem is used to revise previously calculated

probabilities based on new information.

Developed by Thomas Bayes in the 18th Century.

It is an extension of conditional probability. Bayes, Thomas (1763)

7

Bayes Formula Example

A drilling company has estimated a 40% chance of striking oil for theirnew well.

A detailed test has been scheduled for more information. Historically,60% of successful wells have had detailed tests, and 20% of

unsuccessful wells have had detailed tests. Given that this well has been scheduled for a detailed test, what is the

probability that the well will be successful?

8

Let S= successful well

U= unsuccessful well

P(S) = 0.4 , P(U) = 0.6 (prior probabilities) Define the detailed test event asD

Conditional probabilities:

P(D|S) = 0.6 P(D|U) = 0.2

Goal is to find P(S|D)

Bayes Forumula Example(continued)

7/25/2019 W4INSE6220

3/9

9

( | ) ( )( | )

( | ) ( ) ( | ) ( )

(0.6)(0.4)

(0.6)(0.4) (0.2)(0.6)

0.240.667

0.24 0.12

P D S P S P S D

P D S P S P D U P U

Bayes Formula Example(continued)

Apply Bayes Formula:

So the revised probability of success, given that this well has beenscheduled for a detailed test, is0.667

10

Given the detailed test, the revised probability of asuccessful well has risen to 0.667 from the originalestimate of 0.4

Bayes Formula Example

EventPrior

Prob.

Conditional

Prob.

Joint

Prob.

Revised

Prob.

S (successful) 0.4 0.6 (0.4)(0.6) = 0.24 0.24/0.36 = 0.667

U (unsuccessful) 0.6 0.2 (0.6)(0.2) = 0.12 0.12/0.36 = 0.333

Sum = 0.36

(continued)

11Parameter Estimation

12

Parameter estimation

The parameters of a pdf are constants that characterize

its shape, e.g.

r.v.

Suppose we have a sample of observed values:

parameter

We want to fi nd some function of the data to estimate the

parameter(s):

estimator written with a hat

Sometimes we say estimator for the function of x1, ..., xn;estimate for t he value of the estimator wi th a particular data set.

7/25/2019 W4INSE6220

4/9

13Maximum Likelihood Estimation

We want to pick the that maximizesL:

Often easier to maximize lnL. BothL and lnL are maximum at the same location. We maximize lnLrather thanL itself because lnL converts the product into a summation. The new maximization condition is:

could be an array of parameters (e.g. slope and intercept) or just a single variable.

1* *

lnln ( ; ) 0

n

i

i

Lf x

14

Example:

At a glass manufacturing company, 30 randomly selected sheets

of glass are inspected. If the dist. of the number of flaws per

sheet is taken to have a Poisson dist. How should the parameter be estimated?

Themaximum likelihood estimate:

1( )

11 1

1 1

1

( , ) , so that the likelihood is!

( , , , ) ( , )( ! !)

The log-likelihood is therefore

ln( ) ( ) ln( ) ln( ! !)

( )ln( ) so that 0

i

n

x

i

i

x xnn

n i

i n

n n

n

ef x

x

eL x x f x

x x

L n x x x x

x xLn x

15

A random variable T ~ is said to have the Weibull Probability

Distribution with parameters and, where > 0 and > 0, if the probabilitydensity function is

, for 0

, elsewhere

where is the Shape Parameter, is the Scale Parameter, tis the mission length(time, cycles, etc.). The scale parameter (also called the characteristic life) is the

time at which 63.2% of the product will have failed .

If = 1, the Weibull reduces to the Exponential Distribution.

Weibull distribution is frequently used to model fatigue failure, ball bearing failure

etc. (very long tails)

t

et1

0

t)( tf

Weibull Probability Distribution

( , )Weibull

16

Probability Density Function

f(t)

t

t is in multiples of

1.8

1.6

1.4

1.2

1.0

0.8

0.6

0.4

0.2

0

=0.5

=5.0

=3.44

=2.5=1.0

0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4


7/25/2019 W4INSE6220

5/9

17

for t 0

t

-etTPtF

1)(

0

0.2

0.4

0.6

0.8

1

0 50 100 150 200

probability,

p

t

F(t) for various and = 100

= 0.5

= 1

= 3

= 5F(t)


)(1)( tFtR

The reliability of a product is the probability that it does not fail before time t.

It is therefore the complement of the CDF:

18

Mean or Expected Value

Standard Deviation ofT

11)(

XE

Properties of the Weibull Distribution

2

1

2 11

12

22 )()( aa where

( 1) ( )a a a

19

LetT= the ultimate tensile strength (ksi) at -200

degrees F of a type of steel that exhibits coldbrittleness at low temperatures. SupposeThas a

Weibull distribution with parameters = 20,and = 100. Find:

(a) P( T 105)

(b) P(98T 102)

Weibull Distribution - Example

20

(a) P( T 105) = F(105; 100, 20)

(b) P(98T 102)= F(102; 100, 20) -F(98; 100, 20)

930.0070.01120)100/105( e

287.0226.0513.0

2020 )02.1()98.0( ee

Weibull Distribution - Example Solution

7/25/2019 W4INSE6220

6/9

21

The random variableTcan be modeled by a Weibull distribution with = and = 1000. The spec time limit is set at t= 4000. What is the proportionof items not meeting spec?

Weibull Distribution - Example

4000P T

That is, all but about 13.53% of the items will not meet spec.

1353.0

e

e

)4000(F1

)4000(P1

2

1000

40001/2

T

22

Derived from double logarithmic transformation of the Weibull

Distribution Function.

Of the form

where

Any straight line on Weibull Probability Plot is a Weibull Probability

Distribution Function with slope, and intercept, - log , where theordinate is log{log(1/[1-F(t)])} the abscissa is log t.

)/(1)( t

etF

baxy

)(11loglog

tFy

a tx loglogb

WeibullAnalysis: Weibull Probability Plot

23

Weibull Probability Plot in MATLAB

>> data = [1.03; 2.20; 1.55; 0.24; 1.83; 0.40; 0.87; 0.03; 2.24; 1.05; 2.05; 0.14; 3.68; 0.48; 0.41];

>> wblplot(data);

24

What is Survival Analysis?

A class of statistical methods for studying the occurrence and timingof events.

A class of methods for analyzing survival times (i.e., times to events).

A class of methods for analyzing survival probabilities at differentfollow-up times.

Not restricted to data with a certain distribution (non-parametric innature).

In many biomedical studies, the primary endpoint is time until anevent occurs (e.g. death, recurrence, new symptoms, etc.)

Data are typically subject tocensoringwhen a study ends before theevent occurs.

7/25/2019 W4INSE6220

7/9

7/25/2019 W4INSE6220

8/9

29

Kaplan-Meier Estimate of S(t)

Rank the failure times as t(1)t(2)t(n).

Number of items at risk before t(i) is ni Number of items failed at time t(i) is di Estimated hazard function at t(i):

Estimate of survival function

In medical research, the K-M estimate is often used to measure the fraction ofpatients living for a certain amount of time after treatment. In economics, it can beused to measure the length of time people remain unemployed after a job loss. In

engineering, it can be used to measure the time until failure of machine parts.

tt

i

ii

i n

dntS

)(

)(

i

ii

n

dh ^

The KaplanMeier estimator is the nonparametricmaximum likelihood estimate of S(t).

30

FailureTime (t)

Hazard Rate(h(t))

Cumulative HazardRate

0 0 0

t1 d1/n1 d1/n1

t2 d2/n2 h(t1) +d2/n2

... ... ...

tk dk/nk h(tk1) +dk/nk

Time (t) Survival Probability

(S(t))

0 1

t1 1*(1 h(t1))

t2 S(t1)*(1 h(t2))

... ...

tn S(tk1)*(1 h(tk))

Failure Rate

The hazard rate at each period is the number of failures in the given period divided by the

number of surviving individuals at the beginning of the period (number at risk).

Survival ProbabilityFor each period, the survival probability is the product of the complement of hazard rates.

The initial survival probability at the beginning of the first time period is 1. If the hazard

rate for the each period is h(ti), then the survivor probability is as shown.

Kaplan-Meier Estimate of S(t)

31

31

A Data Example

Failure Time

t(i)

Number Failed

n i

Number at Risk

d i

4 3 7

7 1 4

11 2 3

12 1 1

The number at risk is the total number of survivors at the beginning of each period. The

number at risk at the beginning of the first period is all individuals in the lifetime study. At

the beginning of each remaining period, the number at risk is reduced by the number of

failures plus individuals censored at the end of the previous period.

This life table shows fictitious survival data. At the beginning of the first failure time, there

are seven items at risk. At time 4, three fail. So at the beginning of time 7, there are four

items at risk. Only one fails at time 7, so the number at risk at the beginning of time 11 is

three. Two fail at time 11, so at the beginning of time 12, the number at risk is one. The

remaining item fails at time 12.

32A Data Example

We can compute the cumulative hazard rate, survival rate, and cumulative distribution

function for the following data as follows:

t (i) Number Failed

d i

Number at Risk

n i

Hazard Rate

h(t)=di/n i

Survival Probability

S(t)

Cumulative

Distribution

Function

F(t) = 1-S(t)

4 3 7 3/7 1 3/7 = 4/7 = 0.5714 0.4286

7 1 4 1/4 4/7*(1 1/4) = 3/7 = .4286 0.5714

11 2 3 2/3 3/7*(1 2/3) = 1/7 = 0.1429 0.8571

12 1 1 1/1 1/7*(1 1) = 0 1

In MATLAB, we can enter the data and calculate these measures usingecdf.

Suppose the failure times are stored in an arrayy.

>> y = [4 7 11 12];>> freq = [3 1 2 1];

>> [f,x] = ecdf(y,'frequency',freq)

7/25/2019 W4INSE6220

9/9

33A Censored Data Example

Time

t(i)

Number failed

d i

Censor i ng Num ber a t R isk

n i

Hazar d Rate Su rv ival

Probability

Cumulative

Distribution

Function

4 2 1 7 2/7 1 2/7 = 0.7143 0.2857

7 1 0 4 1/4 0.7143*(1 1/4) =

0.5357

0.4643

11 1 1 3 1/3 0.5357*(1 1/3) =0.3571

0.6429

12 1 0 1 1/1 0.3571*(1 1) = 0 1.0000

When you have censored data, the life table might look like the following:

At any given time, the censored items are also considered in the total of number at

risk, and the hazard rate formula is based on the number failed and the total number

at risk.

While updating the number at risk at the beginning of each period, the total number

failed and censored in the previous period is reduced from the number at risk at the

beginning of that period.

Notation: 1 for censored data, and 0 for exact failure time.

34A Censored Data Example: MATLAB

While using ecdf, you must also enter the censoring information using an array of

binary variables. Enter 1 for censored data, and enter 0 for exact failure time.

>> y = [4 4 4 7 11 11 12];

>> cens= [1 0 0 0 1 0 0];

>> [f,x] = ecdf(y,'censoring',cens)

ecdf, by default, produces the cumulative distribution function values. You have to

specify the survivor function or the hazard function using optional name-value pair

arguments. You can also plot the results as follows.

>> ecdf(y,'censoring',cens,'function','survivor');

Documents

W4INSE6220