W4INSE6220

  • Upload
    aliknf

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

  • 7/25/2019 W4INSE6220

    1/9

    1

    INSE 6320 -- Week 4

    Risk Analysis for Information and Systems Engineering

    Statistical Inference Weibull Analysis

    Survival Analysis

    Dr. A. Ben Hamza Concordia University

    2

    Risk Assessment

    Probabilistic risk assessment (PRA) is characterized by two quantities:

    the magnitude (severity) of the possible adverse consequences, and the likelihood (probability) of occurrence of each consequence.

    Hazard: the property of a substance or situation with the potential forcreating damage

    Ahazard analysis is a process used to assess risk. The results of a hazard analysis is the identification of unacceptable

    risks and the selection of means of controlling or eliminating them.

    Probability is a way to predict stochastic events

    Risk: the likelihood of a specific effect within a specified period

    3

    Population at risk

    Individual Risk

    Individual risk is the risk of fatality or injury to any identifiable (named)

    individual who lives within the zone impacted by a hazard, or follows a particular

    pattern of life, that might subject him or her to the consequences of a hazard.

    Societal Risk

    Societal risk is the risk of multiple fatalities or injuries in the society as a whole,

    and where society would have to carry the burden of a hazard causing a number

    of deaths, injury, financial, environmental, and other losses.

    4

    Individual risk

    Cause Probability / year Cause Probability / year

    All causes (ill ne ss) 1.19E- 02 Rock climbing 8 .00E -03

    Cancer 2.80E-03 Canoeing 2.00E-03

    Road accidents 1.00E-04 Hang-gliding 1.50E-03

    Acci dents at home 9.30E- 05 Motor cycli ng 2 .40E -04Fire 1.50E-05 Mining 9.00E-04

    Drowning 6.00E-06 Fire fighting 8.00E-04

    Excessive cold 8.00E-06 Police 2.00E-04

    Li ghtning 1.00E-07 Accidents at offices 4.50E-06

    Individual risk can be calculated as the total risk divided by the populationat risk.

    For example, if a region with a population of one million peopleexperiences on average 5 deaths from flooding per year, the individual risk

    of being killed by a flood in that region is 5/1,000,000, usually expressed inorders of magnitude as 5106.

  • 7/25/2019 W4INSE6220

    2/9

    5

    How to express ri sk?

    What is the risk of flying by airplane? Is it higher than driving a car?

    6Bayes Formula

    where:

    Bi= ith event ofkmutually exclusive

    and collectively exhaustive events

    A = new event that might impactP(Bi)

    ))P(BB|P(A))P(BB|P(A))P(BB|P(A

    ))P(BB|P(AA)|P(B

    kk2211

    iii

    Bayes Theorem is used to revise previously calculated

    probabilities based on new information.

    Developed by Thomas Bayes in the 18th Century.

    It is an extension of conditional probability. Bayes, Thomas (1763)

    7

    Bayes Formula Example

    A drilling company has estimated a 40% chance of striking oil for theirnew well.

    A detailed test has been scheduled for more information. Historically,60% of successful wells have had detailed tests, and 20% of

    unsuccessful wells have had detailed tests. Given that this well has been scheduled for a detailed test, what is the

    probability that the well will be successful?

    8

    Let S= successful well

    U= unsuccessful well

    P(S) = 0.4 , P(U) = 0.6 (prior probabilities) Define the detailed test event asD

    Conditional probabilities:

    P(D|S) = 0.6 P(D|U) = 0.2

    Goal is to find P(S|D)

    Bayes Forumula Example(continued)

  • 7/25/2019 W4INSE6220

    3/9

    9

    ( | ) ( )( | )

    ( | ) ( ) ( | ) ( )

    (0.6)(0.4)

    (0.6)(0.4) (0.2)(0.6)

    0.240.667

    0.24 0.12

    P D S P S P S D

    P D S P S P D U P U

    Bayes Formula Example(continued)

    Apply Bayes Formula:

    So the revised probability of success, given that this well has beenscheduled for a detailed test, is0.667

    10

    Given the detailed test, the revised probability of asuccessful well has risen to 0.667 from the originalestimate of 0.4

    Bayes Formula Example

    EventPrior

    Prob.

    Conditional

    Prob.

    Joint

    Prob.

    Revised

    Prob.

    S (successful) 0.4 0.6 (0.4)(0.6) = 0.24 0.24/0.36 = 0.667

    U (unsuccessful) 0.6 0.2 (0.6)(0.2) = 0.12 0.12/0.36 = 0.333

    Sum = 0.36

    (continued)

    11Parameter Estimation

    12

    Parameter estimation

    The parameters of a pdf are constants that characterize

    its shape, e.g.

    r.v.

    Suppose we have a sample of observed values:

    parameter

    We want to fi nd some function of the data to estimate the

    parameter(s):

    estimator written with a hat

    Sometimes we say estimator for the function of x1, ..., xn;estimate for t he value of the estimator wi th a particular data set.

  • 7/25/2019 W4INSE6220

    4/9

    13Maximum Likelihood Estimation

    We want to pick the that maximizesL:

    Often easier to maximize lnL. BothL and lnL are maximum at the same location. We maximize lnLrather thanL itself because lnL converts the product into a summation. The new maximization condition is:

    could be an array of parameters (e.g. slope and intercept) or just a single variable.

    1* *

    lnln ( ; ) 0

    n

    i

    i

    Lf x

    14

    Example:

    At a glass manufacturing company, 30 randomly selected sheets

    of glass are inspected. If the dist. of the number of flaws per

    sheet is taken to have a Poisson dist. How should the parameter be estimated?

    Themaximum likelihood estimate:

    1( )

    11 1

    1 1

    1

    ( , ) , so that the likelihood is!

    ( , , , ) ( , )( ! !)

    The log-likelihood is therefore

    ln( ) ( ) ln( ) ln( ! !)

    ( )ln( ) so that 0

    i

    n

    x

    i

    i

    x xnn

    n i

    i n

    n n

    n

    ef x

    x

    eL x x f x

    x x

    L n x x x x

    x xLn x

    15

    A random variable T ~ is said to have the Weibull Probability

    Distribution with parameters and, where > 0 and > 0, if the probabilitydensity function is

    , for 0

    , elsewhere

    where is the Shape Parameter, is the Scale Parameter, tis the mission length(time, cycles, etc.). The scale parameter (also called the characteristic life) is the

    time at which 63.2% of the product will have failed .

    If = 1, the Weibull reduces to the Exponential Distribution.

    Weibull distribution is frequently used to model fatigue failure, ball bearing failure

    etc. (very long tails)

    t

    et1

    0

    t)( tf

    Weibull Probability Distribution

    ( , )Weibull

    16

    Probability Density Function

    f(t)

    t

    t is in multiples of

    1.8

    1.6

    1.4

    1.2

    1.0

    0.8

    0.6

    0.4

    0.2

    0

    =0.5

    =5.0

    =3.44

    =2.5=1.0

    0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4

    Weibull Probability Distribution

  • 7/25/2019 W4INSE6220

    5/9

    17

    for t 0

    t

    -etTPtF

    1)(

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 50 100 150 200

    probability,

    p

    t

    F(t) for various and = 100

    = 0.5

    = 1

    = 3

    = 5F(t)

    Weibull Probability Distribution

    )(1)( tFtR

    The reliability of a product is the probability that it does not fail before time t.

    It is therefore the complement of the CDF:

    18

    Mean or Expected Value

    Standard Deviation ofT

    11)(

    XE

    Properties of the Weibull Distribution

    2

    1

    2 11

    12

    22 )()( aa where

    ( 1) ( )a a a

    19

    LetT= the ultimate tensile strength (ksi) at -200

    degrees F of a type of steel that exhibits coldbrittleness at low temperatures. SupposeThas a

    Weibull distribution with parameters = 20,and = 100. Find:

    (a) P( T 105)

    (b) P(98T 102)

    Weibull Distribution - Example

    20

    (a) P( T 105) = F(105; 100, 20)

    (b) P(98T 102)= F(102; 100, 20) -F(98; 100, 20)

    930.0070.01120)100/105( e

    287.0226.0513.0

    2020 )02.1()98.0( ee

    Weibull Distribution - Example Solution

  • 7/25/2019 W4INSE6220

    6/9

    21

    The random variableTcan be modeled by a Weibull distribution with = and = 1000. The spec time limit is set at t= 4000. What is the proportionof items not meeting spec?

    Weibull Distribution - Example

    4000P T

    That is, all but about 13.53% of the items will not meet spec.

    1353.0

    e

    e

    )4000(F1

    )4000(P1

    2

    1000

    40001/2

    T

    22

    Derived from double logarithmic transformation of the Weibull

    Distribution Function.

    Of the form

    where

    Any straight line on Weibull Probability Plot is a Weibull Probability

    Distribution Function with slope, and intercept, - log , where theordinate is log{log(1/[1-F(t)])} the abscissa is log t.

    )/(1)( t

    etF

    baxy

    )(11loglog

    tFy

    a tx loglogb

    WeibullAnalysis: Weibull Probability Plot

    23

    Weibull Probability Plot in MATLAB

    >> data = [1.03; 2.20; 1.55; 0.24; 1.83; 0.40; 0.87; 0.03; 2.24; 1.05; 2.05; 0.14; 3.68; 0.48; 0.41];

    >> wblplot(data);

    24

    What is Survival Analysis?

    A class of statistical methods for studying the occurrence and timingof events.

    A class of methods for analyzing survival times (i.e., times to events).

    A class of methods for analyzing survival probabilities at differentfollow-up times.

    Not restricted to data with a certain distribution (non-parametric innature).

    In many biomedical studies, the primary endpoint is time until anevent occurs (e.g. death, recurrence, new symptoms, etc.)

    Data are typically subject tocensoringwhen a study ends before theevent occurs.

  • 7/25/2019 W4INSE6220

    7/9

  • 7/25/2019 W4INSE6220

    8/9

    29

    Kaplan-Meier Estimate of S(t)

    Rank the failure times as t(1)t(2)t(n).

    Number of items at risk before t(i) is ni Number of items failed at time t(i) is di Estimated hazard function at t(i):

    Estimate of survival function

    In medical research, the K-M estimate is often used to measure the fraction ofpatients living for a certain amount of time after treatment. In economics, it can beused to measure the length of time people remain unemployed after a job loss. In

    engineering, it can be used to measure the time until failure of machine parts.

    tt

    i

    ii

    i n

    dntS

    )(

    )(

    i

    ii

    n

    dh ^

    The KaplanMeier estimator is the nonparametricmaximum likelihood estimate of S(t).

    30

    FailureTime (t)

    Hazard Rate(h(t))

    Cumulative HazardRate

    0 0 0

    t1 d1/n1 d1/n1

    t2 d2/n2 h(t1) +d2/n2

    ... ... ...

    tk dk/nk h(tk1) +dk/nk

    Time (t) Survival Probability

    (S(t))

    0 1

    t1 1*(1 h(t1))

    t2 S(t1)*(1 h(t2))

    ... ...

    tn S(tk1)*(1 h(tk))

    Failure Rate

    The hazard rate at each period is the number of failures in the given period divided by the

    number of surviving individuals at the beginning of the period (number at risk).

    Survival ProbabilityFor each period, the survival probability is the product of the complement of hazard rates.

    The initial survival probability at the beginning of the first time period is 1. If the hazard

    rate for the each period is h(ti), then the survivor probability is as shown.

    Kaplan-Meier Estimate of S(t)

    31

    31

    A Data Example

    Failure Time

    t(i)

    Number Failed

    n i

    Number at Risk

    d i

    4 3 7

    7 1 4

    11 2 3

    12 1 1

    The number at risk is the total number of survivors at the beginning of each period. The

    number at risk at the beginning of the first period is all individuals in the lifetime study. At

    the beginning of each remaining period, the number at risk is reduced by the number of

    failures plus individuals censored at the end of the previous period.

    This life table shows fictitious survival data. At the beginning of the first failure time, there

    are seven items at risk. At time 4, three fail. So at the beginning of time 7, there are four

    items at risk. Only one fails at time 7, so the number at risk at the beginning of time 11 is

    three. Two fail at time 11, so at the beginning of time 12, the number at risk is one. The

    remaining item fails at time 12.

    32A Data Example

    We can compute the cumulative hazard rate, survival rate, and cumulative distribution

    function for the following data as follows:

    t (i) Number Failed

    d i

    Number at Risk

    n i

    Hazard Rate

    h(t)=di/n i

    Survival Probability

    S(t)

    Cumulative

    Distribution

    Function

    F(t) = 1-S(t)

    4 3 7 3/7 1 3/7 = 4/7 = 0.5714 0.4286

    7 1 4 1/4 4/7*(1 1/4) = 3/7 = .4286 0.5714

    11 2 3 2/3 3/7*(1 2/3) = 1/7 = 0.1429 0.8571

    12 1 1 1/1 1/7*(1 1) = 0 1

    In MATLAB, we can enter the data and calculate these measures usingecdf.

    Suppose the failure times are stored in an arrayy.

    >> y = [4 7 11 12];>> freq = [3 1 2 1];

    >> [f,x] = ecdf(y,'frequency',freq)

  • 7/25/2019 W4INSE6220

    9/9

    33A Censored Data Example

    Time

    t(i)

    Number failed

    d i

    Censor i ng Num ber a t R isk

    n i

    Hazar d Rate Su rv ival

    Probability

    Cumulative

    Distribution

    Function

    4 2 1 7 2/7 1 2/7 = 0.7143 0.2857

    7 1 0 4 1/4 0.7143*(1 1/4) =

    0.5357

    0.4643

    11 1 1 3 1/3 0.5357*(1 1/3) =0.3571

    0.6429

    12 1 0 1 1/1 0.3571*(1 1) = 0 1.0000

    When you have censored data, the life table might look like the following:

    At any given time, the censored items are also considered in the total of number at

    risk, and the hazard rate formula is based on the number failed and the total number

    at risk.

    While updating the number at risk at the beginning of each period, the total number

    failed and censored in the previous period is reduced from the number at risk at the

    beginning of that period.

    Notation: 1 for censored data, and 0 for exact failure time.

    34A Censored Data Example: MATLAB

    While using ecdf, you must also enter the censoring information using an array of

    binary variables. Enter 1 for censored data, and enter 0 for exact failure time.

    >> y = [4 4 4 7 11 11 12];

    >> cens= [1 0 0 0 1 0 0];

    >> [f,x] = ecdf(y,'censoring',cens)

    ecdf, by default, produces the cumulative distribution function values. You have to

    specify the survivor function or the hazard function using optional name-value pair

    arguments. You can also plot the results as follows.

    >> ecdf(y,'censoring',cens,'function','survivor');