95
AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Embed Size (px)

DESCRIPTION

Central Limit Theorem If the n mutually independent random variables x 1, x 2, …, x n have the same distribution, and if their mean  and their variance  2 exist then …

Citation preview

Page 1: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

AN INTRODUCTION TO STATISTICAL ANALYSISOF SIMULATION OUTPUTS

Page 2: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Sample Statistics If x1, x2, …, xn are n observations of the

value of an unknown quantity X, they constitute a sample of size n for the population on which X is defined.

Sample mean

Sample variance

n

iixn

x1

1

n

ii xx

ns

1

22 )(1

1

Page 3: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Central Limit Theorem If the n mutually independent

random variables x1, x2, …, xn have the same distribution, and if their mean and their variance 2 exist then …

Page 4: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Central Limit Theorem The random variable

is distributed according to the standard normal distribution (zero mean and unit variance).

n

xn

n

ii

1

1

Page 5: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Estimating a mean Assume that we have a sample x1, x2, …,

xn consisting of n independent * observations of a given population

The sample mean xbar is an unbiased estimator of the mean of the population

* This is the critical assumptionWithout it, we cannot apply the formula

Page 6: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Large populations For large values of n, the (1-)%

confidence interval for is given by

with for the standard normal distribution

nzx

nzx 2/2/ ,

21)( 2/

zF

Page 7: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Explanations 1 – expressed in percent is the level of

the confidence interval 90% means= 0.10 95% means= 0.05 99% means= 0.01

is the error probability Probability that the true mean falls

outside the confidence interval

Page 8: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

95% Confidence Intervals For =.05, z/2 =1.96

Example: If = 35, = 4 and n = 100 The 95% confidence interval for is

35 ± 1.96x4/10 = 35 ± 7.84

x

Page 9: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

When is not known We can replace in the preceding formula

by the standard-deviation s of the sample When n < 30, we must read the value

of z/2 from a table of Student's t-distribution with n - 1 degrees of freedom

When n 30, we can use the standard values

Page 10: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Confidence Intervals in CSIM CSIM can automatically compute confidence

intervals for the mean value of any table, qtable and so on. For everything but boxes

xyx->confidence(); For the elapsed times in a box

bd->time_confidence(); For the population of a box

bd->_number_confidence();

We get 90, 95 and 99 percent confidence intervals

Page 11: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Confidence Intervals in CSIM We get 90, 95 and 98 percent confidence

intervals Computed using batch means method

See next section But only for the mean

Page 12: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Estimating a proportion A proportion p represents the

probabilityP(X) for some fixed threshold 97% of our customers have to wait

less than one minute Confidence intervals for proportions are

much easier to compute than confidence intervals for quantiles

Page 13: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Assumptions Assume that we have n independent

observations x1, x2, …, xn of a given population variable X and that this variable has a continuous distribution

Let p represent the proportion we want to estimate, say P(X)

Let k represent the number of observations that are .

Page 14: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Basic property If p represent the proportion we want to

estimate, say P(X), and k represent the number of observations that are The rv k is distributed according to a

binomial distribution with mean np and variance p(1 – p)

The rv k/n is distributed according to a binomial distribution with mean p and variance p(1 – p)/n

Page 15: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

The formula When n > 29, we can use the Wilson’s

interval

where z = 1.96 for a 95% C.I.

1

1

4)ˆ1(ˆ

1

4)ˆ1(ˆ

22/

2

22/

2/

22/

22/

2

22/

2/

22/

nz

nz

nqqz

nzq

q

nz

nz

nqqz

nzq

P

Page 16: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Advantages Produces tighter confidence intervals

then the Central Limit Theorem Works when is equal to zeroq̂

10 2

2/

22/

znz

qP

Page 17: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Example Assume we have 400,000 independent

observations without noting any failure The 95% confidence interval for the

probability that the system could fail is

)10,0()96.1000,400

96.1,0( 52

2

Page 18: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

AUTOCORRELATION

Page 19: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

The problem All statistical analysis techniques we

have discussed assume that sample values are mutually independent

Generally false for quantities such as Waiting times, response times, …

Tend instead to be autocorrelated When the waiting lines are long,

everybody wait a long time

Page 20: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Traditional solution Keep the measurements sufficiently

apart Sample them every T minutes apart Standard solution for collecting

observations on a running system Not practical

Would require very long simulations

Page 21: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Three good solutions Batch means Regenerative method Time series analysis

Page 22: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Batch means We group consecutive observations into

“batches” We compute the means of these batches We observe that autocorrelation among

batch means decreases with size of batches When size increases, each batch

includes more observations that are far apart

Page 23: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Example We collected the following values:

4, 3, 3, 4, 5, 5, 3, 2, 2, 3 We group them into two batches of five

observations: 4, 3, 3, 4, 5 and 5, 3, 2, 2, 3

The batch means are: 3.8 and 3

Page 24: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Batch means in CSIM CSIM uses fixed-size batches

To compute confidence intervals To control the duration of a simulation

(run-length control)

Page 25: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Regenerative method Most systems with queues go through

states that return it to an state identical to its original state The system regenerates itself

Examples: Whenever a disk array is brought back

to its original state Whenever a camper rental agency has

all its campers available

Page 26: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Key idea Define a regeneration interval as an

interval between two consecutive regeneration points: Observations collected during the

same regeneration interval can be correlated

Observations collected during different regeneration intervals are independent

Page 27: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Application We group together observations that

occur within the same regeneration interval

We compute the means of these groups of observations

These group means are independent from each other

Page 28: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Limitations of the approach Not general:

System must go through regeneration points

System must be idle

Leads to complex computations We rarely have exactly the same

number of observations in two different regenerations intervals

Page 29: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Time series analysis Treats consecutive observations as

elements of a time series Estimates autocorrelation among the

elements of a time series Includes this autocorrelation in the

computation of all confidence intervals

Page 30: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

RUN LENGTH CONTROL

Page 31: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Objective Accuracy of confidence intervals

increases with duration of simulation The 1/n factor

We would like to be able to stop the simulation once a given accuracy level has been reached for the confidence interval of a specific measurement

Page 32: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

The CSIM solution (I) Specify

Quantity of interest Mean value from a table, a qtable, …

A relative accuracy Maximum relative error

A confidence level (say, 0.90 to 0.99) A maximum simulation duration

In seconds of CPU time

Page 33: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

The CSIM solution (II) void table::run_length(double accuracy,

double conf_level, double max_time) void qtable::run_length(double accuracy,

double conf_level, double max_time) void meter::run_length(double accuracy,

double conf_level, double max_time) void box::time_run_length(double accuracy,

double conf_level, double max_time) void box::number_run_length(double accuracy,

double conf_level, double max_time)

Page 34: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

The CSIM solution (III) Example:

thistable->run_length(.01, .95, 500) Specifies than we want an error less

than 1 percent for 95% confidence interval of mean of thistable

Stops simulation after 500 seconds thisbox->:time_run_length(.01, .99,

500) Same for time average of a box

Page 35: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

The CSIM solution (IV) Replace termination test by

converged.wait();

Page 36: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Example (I) The campers We want

A maximum error of 1 percent (0.01) For the 95 percent confidence interval Of average number of rented

campers A maximum simulation time of 100 s

Will use number aspect of agency box

Page 37: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Example (II) Add

agency->number_confidence()after activation of box agency in csim process

Page 38: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Example (III) Put main loop

while(simtime() < DURATION) { hold(exponential(MIART); customer();}

in a separate arrivals process Make it an infinite loop

Page 39: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Example (IV) Add

converged.wait();after call to arrivals process arrivals():

converged.wait(); Best way to let sim process generate

customers and wait for terminationin parallel

Page 40: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Example (V) The new arrivals process

void arrivals() {process(“arrivals”); // REQUIRED for(;;) { // forever loop hold(exponential(MIART)); customer(); } // forever} // arrivals

Page 41: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Warnings Confidence intervals do not take into account

model inaccuracies While the batch means method eliminates

most effects of measurement autocorrelation, it is not always 100% effective

The max_time parameter of the run_length() will not necessarily stop the simulation just after the specified CPU time Like the emergency brake of a train

Page 42: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Objective Partition processes into different classes

Low priority High priority

Obtain separate statistics for each process priority class

Page 43: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Declaring and initializing process classes To declare a dynamic process class:

process_class *c; To initialize a process class before it

can be used in any other statement. c = new process_class("low

priority")

Page 44: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

To assign a priority class Add inside the process

c->set_process_class(); Processes that have not been

assigned a process class belong to the “default” process class

Page 45: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Reporting Can use

report() report_classes()

Page 46: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Other options Can change the name of a process class:

c->set_name("high priority"); Can reset statistics associated with a

process c ->reset();

Can do the same for all process classes: reset_process_classes();

Can delete a dynamic process class: delete c;

Page 47: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

RANDOM NUMBERS

Page 48: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Background We need to distinguish between

Truly random numbers Obtained through observations of a

physical random process Rolling dices, roulette Atomic decay

Pseudo-random numbers Obtained through arithmetic operations

Page 49: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Example Linear Congruential Generators (LCG)

Easy to implement and fast Defined by the recurrence relation:

rn+1 = (a rn + c ) mod mwhere

r1, r2, … are the random values m is the "modulus“ r0 is the seed

Page 50: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Two realizations GCC family of compilers

m = 232 a = 69069 c = 5 Microsoft Visual/Quick C/C++

m = 232 a = 214013 c = 2531011

Page 51: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Problems with pseudorandom number generators Much shorter periods for some seed

states Lack of uniformity of distribution Correlation of successive values …

Page 52: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Better RNGs Use the Mersenne twister

Period is 219937 - 1 Blum-Blum-Schub

Page 53: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

A quote "Any one who considers arithmetical

methods of producing random digits is, of course, in a state of sin. For, as has been pointed out several times, there is no such thing as a random number– there are only methods to produce random numbers, and a strict arithmetic procedure of course is not such a method.“

John von Neumann

Page 54: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

CSIM RNGs BY default, CSIM uses a single stream of

random numbers Can reset the seed using

void reseed(stream *s, long n)as in reseed(NIL, 13579)

Page 55: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Continuous distributions supported by CSIM (I) double uniform(double min, double max) double triangular(double min, double max,

double mode) double beta(double min, double max, double

shape1, double shape2) double exponential(double mean) double gamma(double mean, double stddev) double erlang(double mean, double var)

Page 56: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Continuous distributions supported by CSIM (II) double hyperx(double mean, double var) double weibull(double shape, double scale) double normal(double mean, double stddev) double lognormal(double mean, double stddev) double cauchy(double alpha, double beta) double hypoexponential(double mn, double var)

Page 57: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Continuous distributions supported by CSIM (III) double pareto(double a) double zipf(long n) double zipf_sum(long n, double *sum)

Page 58: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Discrete distributions supported by CSIM long uniform_int(long min, long max) long bernoulli(double prob_success) long binomial(double prob_success, long

num_trials) long geometric(double prob_success) long negative_binomial(long success_num,

double prob_success) long poisson(double mean)

Page 59: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Empirical distributions supported by CSIM ???

Page 60: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Using multiple streams In campers example, sequence of RNs

used to generate arrivals is affected by the numbers of campers If agency has less campers

More customers will be lost Lost customers do no generate any

RN Better to have separate random number

streams for arrivals and service times

Page 61: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Declaring and initialing random number streams Can use:

stream *s;s = new stream();

By default, streams are created with seeds that are spaced 100,000 values apart

Page 62: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Reseeding a stream Use:

s->reseed(24680); where the new seed is a long integer

Page 63: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Other functions Inspect the current state of a stream

i = s->state(); Delete a stream

delete s;

Page 64: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Using a specific seed Prefix RNG function with name of seed:

s->uniform (3.0, 7.0)

Page 65: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

A CASE STUDY

Page 66: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

RAID array revisited Reliability of a RAID array Reliability R(t) of a system is the

probability that will remain operational over a time interval [0. t ] given that it was operational at time t = 0 Not the same as availability Our focus is evaluating the risk of a

data loss during array lifetime

Page 67: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Executing multiple runs Multiple runs provide statistically

independent repetitions of original simulation Useful for

Collecting more accurate results Constructing confidence intervals

Use rerun() function within a loop create("sim") call must be inside that

loop

Page 68: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Overview of sim function extern "C" void sim() { // sim process runcount = 0; while(runcount < NRUNS) { create("sim"); // make it a process // usual contents of sim process

rerun(); runcount++; } // while report_hdr(); // produce statistics report} // sim

Page 69: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Global includes and defines

#include <cpp.h> // CSIM C++ header file#define NDISKS 5 // number of disks in

array#define NYEARS 5 // useful lifetime of array#define MTTF 300000.0 // disk MTTF#define MTTR 24.0 // disk MTTR #define NRUNS 100000 // no of runs

void disk(int i);

Page 70: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Global declarations

int nfailed; // number of failed disksint runcount = 0; // number of runsint ndatalosses = 0; //counterdouble lifetime = NYEARS * 365 * 24;// simulation duration

Page 71: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Sim function (I)extern "C" void sim() { // sim process int i; runcount = 0; while(runcount < NRUNS) { create("sim"); // make this a process dataloss = new event("dataloss"); dataloss->clear(); nfailed = 0;

Page 72: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Sim function (II) // create NDISKS disk processes for (i=0; i < NDISKS; i++){ disk(i); } // for dataloss->timed_wait(lifetime); if (simtime() < lifetime) { ndatalosses++; } // if rerun(); runcount++; } // while

Page 73: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Sim function (III) report_hdr(); // produce statistics report printf("Array lifetime %fd years", NYEARS); printf(“or %f hours\n", lifetime); printf("Sim time %f\n", simtime()); printf("Disk MTTF %f hours\n", MTTF); printf("Disk MTTR %ff hours\n", MTTR); printf(“Completed runs %d\n", runcount); printf(“Data lossese%d\n", ndatalosses); } // sim

Page 74: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Disk process (I)void disk(int i) { create("disk"); while(simtime() < lifetime ||

dataloss) { hold(exponential(MTTF));

// disk failed nfailed++;

Page 75: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Disk process (II) if (nfailed == 2) { dataloss = 1; failtime = simtime(); finish->set(); terminate(); } // if hold(MTTR); // repair process nfailed--; // disk is replaced } // while} // disk

Page 76: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Simulation Outcome C++/CSIM Simulation Report (Version 19.0 for Linux x86) Tue Apr 22 10:13:14 2008

Ending simulation time: 0.000 Elapsed simulation time: 0.000 CPU time used (seconds): 0.000

Array lifetime 5 yearswhich corresponds to 43800 hoursSimulated time 0Disk MTTF 300000 hoursDisk MTTR 24 hoursNumber of runs completed 100000Number in runs ending in data loss 17

Page 77: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Discussion Whole program took less than 98 seconds

on linux03 server Simulated time should be equal to

43,800 hours, not zero. rerun() artifact?

We observe 17 failures out of 100,000 runs Data survival rate is 99.983 percent

Three nines

Page 78: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Confidence intervals Data loss rate and survival rate are

distributed according to a binomial law Since n = 100,000, the distributions of both

proportions are approximately normal Will use

1

1

4)ˆ1(ˆ

1

4)ˆ1(ˆ

22/

2

22/

2/

22/

22/

2

22/

2/

22/

nz

nz

nqqz

nzq

q

nz

nz

nqqz

nzq

P

Page 79: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

CI for data survival rate s ŝ =99.983% or 0.99983 95% CI is

0.999849 ± 0.000083 [0.999766, 0.999932]between three and four nines

Page 80: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

What are nines? A measure of system reliability/availability

99% is two nines 99.9% is three nines

More formally, we compute-log10 (1.0 – x)

Our confidence interval could then be expressed as

[3.63 nines, 4.17 nines]

Page 81: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

CI for data loss rate0.0000189 ± 0.000088[0.0000106, 0.000273]

Page 82: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Analytical approach Will use a Markov model:

Works since failure rates and repair rates are distributed according to exponential laws

Obviously not true for repair rates

Page 83: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

The model

State label indicates number of failed drives

Failure state is an absorbing state

Page 84: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Using ergodic hypothesis

Assume that array is returned to original state after each data loss

Failure rate will be 4p1

Page 85: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Model equations 5λp0=(μ+ 4λ)p1

po + p1 = 1 Solution is

Failure rate is

9

5,94

10 pp

9204

2

1pL

Page 86: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Computing five year survival Decay is

For MTTF = 300,000 hours,MTTR = 24 hours and t = 5 years

Data survival rate is 0.999767 or3.632 nines

Barely inside the C.I.

)920exp(

2

te Lt

Page 87: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Why? Markov model assumed repair times

were exponentially distributed Required by the technique

Simulation model assumed deterministic repair times Somewhat more realistic

Difference illustrates a major limitation of stochastic approach

Page 88: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Checking the explanation (I) We repeat the simulation using repair

times that are exponentially distributed We observe 19 data losses out of

100,000 runs qhat is 0.99981 Confidence interval is

(3.588 nines, 4.080 nines)

Page 89: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Checking the explanation (II) We repeat a second time the simulation

using repair times that are exponentially distributed and collecting more data We observe 95 data losses out of

400,000 runs qhat is 0.999763 Confidence interval is

(3.552 nines, 3.734 nines) Markov model predicted 3.632 nines

Page 90: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Comparing both approaches Which is the best approach in terms of

Generality and flexibility?

Provided results?

Discrete simulation and stochastic modeling complement each other

Page 91: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Extensions (I) Other repair time distributions

Exponential ( to compare with results of stochastic analysis)

Ad hoc (80% of repairs within one day, 20% within two days)

Taking accounts of differences between day and night, weekdays and weekends

Will map simtime() into a calendar

Page 92: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

Extensions (II) Other disk arrays

Mirrored disks: NDISKS = 2 RAID level 6: tolerate two disk failures Triplicate disks: tolerate failure of two

out of three disks SSPiRAL arrays

See next slide Will require more work

Page 93: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

SSPiRAL arrays

A

ABC

B

DAB

C

BCD

D

CDA

Resists all triple failures and most quadruple failures

Page 94: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

CONCLUSION

Page 95: AN INTRODUCTION TO STATISTICAL ANALYSIS OF SIMULATION OUTPUTS

CONCLUSION Statistical analysis of outputs is an

important aspect of simulation Standard statistical techniques require

independent observations w/o autocorrelation Simplest solution is to use batch

means CSIM tools simplify constructions of

confidence intervals for all sorts of means