Imprecise Probability and Network Quality of Service Martin Tunnicliffe

Imprecise Probability and Network Quality of Service

Martin Tunnicliffe

Two Kinds of Probability

• Alietory Probability: The probability of chance.

• Example: “When throwing an unweighted die, the probability of obtaining a 6 is 1:6”.

• Epistemic Probability: The probability of belief.

• Example: “The defendant on trial is probably guilty”.

Probability and Betting Odds

• A “fair bet” is a gamble which, if repeated a large number of times, returns the same amount of money in winnings as the amount of money staked.

• Example, if there is a 1:10 chance of winning a game, then the “odds” for a fair gamble would be 10:1.

• Problems arise when we do not know exactly what the chance of winning is. Under such circumstances, how can we know what constitutes a fair gamble?

• Behavioural interpretation of probability (Bruno de Finetti, 1906-1985): “Probability” in such cases refers to what people will consider or believe a fair bet to be.

• Belief stems from experience, i.e. inductive learning.

Inductive Learning

• Induction is the opposite of deduction, which infers the specific from the general.

• Example: “All dogs have four legs. Patch is a dog. Therefore Patch has four legs.”

• Induction is the opposite: It infers the general from the specific.

• Example: “Patch is a dog. Patch has four legs. Therefore all dogs have four legs.”

Inductive LearningThe last statement has little empirical support. However, consider a larger body of evidence:

Dog Number of Legs

Patch 4

Lucky 4

Pongo 4

Perdita 4

Freckles 4

The statement “all dogs have four legs” now has significant plausibility or epistemic probability. However, it remains uncertain: Even with a hundred dogs, there is no categorical proof that the hundred-and-first Dalmatian will not have five legs!

Approaches to Inductive Learning.

• Frequentist statistics disallows the concept of epistemic probability (We cannot talk about the “probability of a five-legged Dalmatian”). Thus it offers very little framework for inductive learning.

• The Objective Bayesian approach allows epistemic probability, which it represents as a single probability distribution. (This is the Bayesian Dogma of Precision).

• The Imprecise Probability approach uses two distributions representing “upper probability” and “lower probability”.

Marble ProblemExample (shamelessly “ripped off” from P. Walley, J. R. Stat. Soc. B, 58(1), pp.3-57, 1996):

Marbles are drawn blindly from a bag of coloured marbles. The event constitutes the drawing of a red marble.

The composition of the bag is unknown. For all we know, it could contain no red marbles. Alternatively every marble in the bag may be red.

Nevertheless, we are asked to compute the probability associated with a “fair gamble” on , both a priori (before any marble is drawn) and after n marbles are drawn, j of which are red. (Marbles are replaced before the next draw.)

Binomial Distribution

jnj

j

nnjP

1,|

If is the true (unknown) chance of drawing a red marble. The probability of drawing j reds in n draws is:

Walley actually considers a more complex “multinomial” situation, where three or more outcomes are possible. However, I am only going to consider two possibilities: = red marble and ~ = any other coloured marble.

This is proportional to the “Likelihood” of given that j red marbles have been drawn

,|,| njPnjL

Bayes’ TheoremBayes’ Theorem provides a relationship between likelihood and epistemic probability.

Since is a continuous variable, its probability must be described by a “probability density function” or pdf which we can denote f():

dfP 2

1

21

njLfjnf ,|,|

Let f ( ) be the “prior pdf” (representing our pre-existing beliefs about ) and f ( | n, j) the “posterior pdf” (representing our modified beliefs given that n trials have yielded j red marbles). Bayes’ Theorem tells us that:

Beta Model

111 1 tsstf

We need a formula for f( ). Let us assume that it follows a beta distribution:

jnjnjL 1,|

Now from the binomial distribution we know that:

The “hyper-parameter” s is the “prior strength”, the influence this prior belief has upon the posterior probability.

Here t is the first moment (or expectation) of the distribution, representing our prior belief.

PEt

Beta Model: Prior Distributions

t=0.5

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0 0.2 0.4 0.6 0.8 1Lambda

f(La

mbd

a)

s=10

s=50

s=100

Beta Model: Posterior Distributions

s=10, t=0.5

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

0 0.2 0.4 0.6 0.8 1Lambda

f(La

mbd

a)

n=0,j=0

n=10, j=7

n=100, j=70

Thus the beta-prior generates a beta-posterior (it is the “conjugate prior” for the binomial distribution).

1)1(1 1,|,| tsjnstjnjLfjnf

Posterior Expectation

sn

stjtnjE

*,|

The expectation of the posterior distribution can now be calculated:

Under the behavioural interpretation, this is viewed as the posterior probability P(|j,n) of a red.

Example: Supposing we are initially willing to bet 2:1 on a red (t=1/2). However, the next ten draws only produce 2 reds. Assuming s=2 gives:

4

1

210

2/12210,2|,|

EjnP

Thus in the light of the new information, a fair gamble now requires odds of 4:1 on red, and 4:3 against red.

Dirichlet DistributionWalley’s paper uses the generalised Dirichlet distribution.

k

K

k

stk

kf

...., 21

1

1

The beta distribution is the special case of the Dirichlet for which the number of possible outcomes is 2. (Sample set has cardinality 2.) This leads to the “Imprecise Dirichlet Model” or IDM. The simpler Beta-function model may be called the “Imprecise Beta Model” (IBM).

Objective Bayesian Approach

We need an initial value for t, to represent our belief that will occur when we have no data available ( j = n = 0). This is called a “non-informative prior”.

Under “Bayes’ Postulate” (in the absence of any information, all possibilities are equally likely) t = 0.5:

sn

sjnjP

2/

,|

Under this assumption:

However, a value for s is still needed.

Non-Informative Priors

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0 0.2 0.4 0.6 0.8 1

Lambda

f(La

mbd

a)

s=2 (uniform)

s=1 (Perks)

s=0 (Haldane)

Bayesians favour setting s to the cardinality of the sample space (in this case 2) to give a “uniform” prior.

Problems with Bayesian Approach

s=2, no successes

0

0.1

0.2

0.3

0.4

0.5

0.6

0 2 4 6 8 10

Number of Trials

Po

ster

ior

Su

cces

s P

rob

abili

tyProblem: Bayesian formula assigns finite probabilities to events which have never been known to happen, and might (for all we know) be physically impossible.

Even after 10 failures to draw a red, the model still supports betting 10:1 on a red!

Problems with Bayesian ApproachStrict application of Bayes’ Postulate yields prior (and hence posterior) probabilities which depend on the choice of sample space (which should be arbitrary).

otherred , Two possibilities, one “successful”, t = 1/2

otherbluered ,, Three possibilities, one “successful”, t = 1/3

otherbluereddarkredlight ,,,Four possibilities, two “successful”, t = 1/2

The experiment is identical in all three cases: Only its representation is altered. Thus the Representation Invariance Principle (RIP) is violated.

A Quote from Walley

“The problem is not that Bayesians have yet to discover the truly noninformative priors, but rather that no precise probability distribution can adequately represent ignorance.”

(Statistical Reasoning with Imprecise Probabilities, 1991)

What does Walley mean by “precise probability”?

The “Dogma of Precision”

• The Bayesian approach rests upon de Finetti’s “Dogma of Precision”.

• Walley (1991) “…..for each event of interest, there is some betting rate which you regard as fair, in the sense that you are willing to accept either side of a bet on the event at that rate.”

• Example: If there is a 1:4 chance of an event , I am equally prepared to bet 4:1 on and 4:3 against .

The Imprecise Probability Approach

The “Imprecise Probability” approach solves the problem by removing the dogma of precision, and thus the requirement for a noninformative prior.

It does this by eliminating the need for a single probability associated with , and replaces it with an upper probability and a lower probability.

Upper and Lower Probabilities

10 PP

When no data is available, might take any value between 0 and 1. thus the prior lower and upper probabilities are respectively:

Walley: Before any marbles are drawn “…I do not have any information at all about the chance of drawing a red marble, so I do not see why I should bet on or against red at any odds’. This is not a very exciting answer, but I believe that it is the correct one.”

Upper and Lower Probabilities

Imprecise Probability

Possibility Theory

Dempster-Shafer Theory

Upper Probability

Possibility Plausibility

Lower Probability

Necessity Belief

Lower Probability: The degree to which we are confident that the next marble will definitely be red.

Upper Probability: The degree to which we are worried that the next marble might be red.

Posterior Upper and Lower Probabilities

However, the arrival of new information (j observed reds in n trials) allow these two probabilities to be modified.

The prior upper and lower probabilities (1 and 0) can be substituted for t in the Bayesian formula for posterior mean probabvility. Thus we obtain the posterior lower and upper probabilities:

sn

sjnjP

sn

jnjP

,|),|(

Properties of Upper and Lower Probabilities

The amount of imprecision is the difference between the upper and lower probabilities, i.e.

sn

snjPnjP

,|),|(

This does not depend on the number of “successes” (occurrences of ). As n, the imprecision tends to zero and the lower and upper probabilities converge towards j/n, the observed success ratio.

As s , the prior dominates: The imprecision becomes 1, and the lower and upper probabilities return to 0 and 1 respectively. As s 0, the new data dominates the prior and and the lower and upper probabilities again converge to j/n (Haldane’s model).

Interpretation of Upper and Lower Probabilities

How do we interpret these upper and lower probabilities? Which do we take as “the probability of red”?

It depends on whether you are betting for or against red.

If you are betting for red then you take the lower probability, since this represents the most cautious expectation of the probability of red.

However, if you are betting against red, you take the upper probability, since this is associated with the lower probability of not-red.

njPsn

sj

sn

jnnjP ,|11,|

Proof:

Interpretation of Upper and Lower Probabilities

s =2, event occurs once on test No.5

0

0.2

0.4

0.6

0.8

1

1.2

0 2 4 6 8 10

Number of Trials

Po

ster

ior

Su

cces

s P

rob

abil

ity

Lower Probability

Upper ProbabilityA “fair bet” would be 1/0.7=1.429:1 against the event .

0.7

0.1

A “fair bet” would be 1/0.1=10:1 in favour of the event .

(For consistency, we continue to assume that s = 2.)

Analogy with Possibility Theory

Thus upper probability is analogous to possibility and lower probability to necessity.

Consider the axiom of possibility theory:

XPXN 1

i.e. the “necessity” of event X occurring is one minus the “possibility” of X not occurring.

PP 1

Similarly the expressions for upper and lower probability show us that:

Choosing a Value of s

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100

Number of Trials

Pro

bab

ilit

y Actual Value

Lower, s=2

Upper, s=2

Lower, s=10

Upper, s=10

Lower, s=100

Upper, s=100

Choosing a Value of sModel Value of s Remarks

Bayes-Laplace Size of sample set

(2 for Beta Model)

Intuitively reasonable results.

Violates the RIP.

Jeffreys Half the size of sample set

(1 for Beta Model)

Intuitively reasonable results. Violates the RIP.

Haldane 0 P(|j,n)=P(|j,n)=j/n

Loss of dichotomy between upper and lower probabilities.

Unreasonable results for small n.

Perks 1 Reasonable results.

Confidence limits agrees with their frequentist values.

Confidence Intervals for

You might be tempted to think that the upper and lower probabilities represent some kind of “confidence interval” for the true value of .

This is not the case. Upper and lower probabilities are the mean values of belief functions for , relevant to people with different agendas (betting for and against ).

Confidence Intervals for Suppose we want to determine a “credible interval” ( -(),+()) such that we are at least 100 per cent “sure” that -() < < +():

1

0 2

1,|

2

1,|

djnfdjnf

95% probability within this range2.5% 2.5%

+()-()

Example: =0.95 (95% confidence)

Confidence Intervals for 11 1,| sjnjjnf 11 1,| jnsjjnf

2

1

2

1

Calculating the Confidence Interval

2

1,

sjnjI

Integrating the two probability distributions, we find that we can compute the confidence intervals by solving the equations:

2

1,

jnsjI

I indicates the “Incomplete Beta Function”. No analytic solution exists, but numerical iteration using the partition method is quite straightforward.

Frequentist Confidence Limits

Binomial distribution for = -()

Binomial distribution for = +()

Frequentist Confidence Limits

Comparison – Frequentist vs. Imprecise Probability

When s = 1 (Perks), Imprecise Probability agrees exactly with Frequentism on the upper and lower confidence limits.

2

11,

jnjI

2

1,1

jnjI

Frequentist

2

1,

sjnjI

2

1,

jnsjI

Imprecise Probability

Applications in Networking

• Network Management and Control often requires decisions to be made based upon limited information.

• This could be viewed as gambling on imprecise probabilities.

• Monitoring Network Quality-of-Service.

• Congestion Window Control in Wired-cum-Wireless Networks.

Quality of Service (QoS)

Host/ End System

Host/ End System

Network Quality of Service (QoS)

Different types of applications have different QoS requirements.

FTP and HTTP can tolerate delay, but not errors/losses (transmitted and received messages must be exactly identical).

Real time services (Voice/Video) can tolerate some data losses, but are sensitive to variations in delay.

QoS Metrics

Loss:Loss: Percentage of transmitted packets which never Percentage of transmitted packets which never reach their intended destination (either due to noise reach their intended destination (either due to noise corruption or overflow at a queuing buffer.)corruption or overflow at a queuing buffer.)

Latency:Latency: A posh word for “delay”; the time a A posh word for “delay”; the time a packet takes to travel between end-points.packet takes to travel between end-points.

Jitter:Jitter: Loosely defined as the amount by which Loosely defined as the amount by which latency varies during a transmission. (Its latency varies during a transmission. (Its precise definition is problematic.)precise definition is problematic.)

Most important in real-time applications.Most important in real-time applications.

Throughput:Throughput: The throughput is the rate at The throughput is the rate at which data can be usefully carried.which data can be usefully carried.

User Data

Monitor Data

User Data

Monitor Data

Network

n packets total

(n - j) packets “successful”j packets “failed”

Failure Probability

Quality of Service (QoS) Monitoring

Simulation Data

Monitor Stream

Data Stream 1

Data Stream 2

Data Stream 3

Packet Size 54-bytes 53-bytes 100-bytes 200 bytes

Packet Separation

10s 10ms 10ms 10ms

Loss Rate 36.74% 38.94% 69.71% 95.19%

95% Interval N/A 36.04-42.09% 36.04-42.09% 36.04-42.09%

Mean Latancy

0.800s 0.800s 0.801s 0.804s

Heavily Loaded Network (Average utilisation: 97%)

Simulation Data

Monitor Stream

Data Stream 1

Data Stream 2

Data Stream 3

Packet Size 54-bytes 53-bytes 100-bytes 200 bytes

Packet Separation

10s 10ms 10ms 10ms

Loss Rate 0% 0% 0% 0%

Mean Latancy

0.0025s 0.0026s 0.0034s 0.0055s

Lightly Loaded Network (Average utilisation: 46%)

Jitter Definition 1

Ref: http://www.slac.stanford.edu/comp/net/wan-mon/dresp-jitter.jpg

Jitter Definition Two

1;1 ittJ iii

“Simple” Jitter:Difference between successive latencies

“Smoothed” Jitter (RFC 3550):Each value inherits 15/16 of the previous value

011

1 2;16

1

16

15

ttJ

iJJJ iii

Jitter Profiles

Interval: 10sPacket Size: 53 bytes (ATM cell)

Network Traffic: Poisson

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Time (seconds)

Jitte

r (s

eco

nd

s)

Raw Jitter

Smoothed Jitter

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0 1 2 3 4 5 6 7 8 9 10

Time (seconds)

Jitte

r (s

econ

ds)

Raw Jitter

Smoothed Jitter

Monitor Stream: 10s Data Stream: 10ms

Wired-Cum-Wireless Networks

Wireless Network: Congestion Plus Random Noise

Wired Network: Congestion Only

WTCP: Identifying the Cause of Packet Loss using Interarrival Time

Block of lost packets

Packet i

Packet i+1

Packet i+2

Packet j-2

Packet j-1

Packet j

Packet Stream

ti ti+1 ti+2 tj-2 tj-1 tj

Arrival Times

Interarrival Times

ti+1 – ti ti+2 – ti+1 tj-1 – tj-2 tj – tj-1

ij

ttTimealInterarrivMean ij

ji

,

WTCP: Identifying the Cause of Packet Loss using Interarrival TimeAssume we already know the mean M and standard deviation σ of the interarrival time when the network is uncongested.

If M - Kσ < Δi,j <M + Kσ (where K is a constant), then the losses are assumed to be random. The sending rate is not altered.

Otherwise, we infer that queue-sizes are varying: An indication that congestion is occurring. The sending rate is reduced to alleviate the problem.

Much work still to be done on this optimising mechanism to maximise throughput.

Documents

Imprecise Probability and Network Quality of Service Martin Tunnicliffe