57
Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Embed Size (px)

Citation preview

Page 1: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Hashing Out Random Graphs

Nick Jones

Sean Porter

Erik Weyers

Andy Schieber

Jon Kroening

Page 2: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

IntroductionWe will be looking at some applications of probability in computer science, hash functions, and also applications of probability with random graphs.

Page 3: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Hash Functions

We are going to map at set of n records, denoted , r1, r2, … rn, in m, m > n, locations with only one record in each location in m.

A hashing function is a function that maps the record values into the m locations.

Page 4: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

We use a sequence of hash functions, denoted h1, h2, h3, …, to map the ri records in the m locations.

The records are placed sequentially as indicated below: h1(r1) = m1.

h1(r2), h2(r2), h3(r3), …

Page 5: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Every time we are unsuccessful in placing a record (because it is already full), a collision occurs.

We will let the random variable X denote the number of collisions that occur when placing n records.

We would like to find E[X] and Var(X).

Page 6: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

These values are very hard to figure out but we can come up with a formula for each of these two problems.

In order to do this we need to define some other random variables.

Page 7: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Yk = #of collisions in placing rk

Therefore,

n

n

k

k YYYYX

...21

1

1 kk YZ

nZZZX n ...21

(geometric with p = (m-k+1)/m)

Page 8: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

We can then find E[Zk].

1

1

km

m

p]E[Zk

nZEZEZEXE n ][...][][][ 21

n

k

kZEn1

][

n

k km

mn

1 1

Page 9: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

1

1...

1

11

nmmmmn

m

nm

xdxmn1

/

1log

nm

mmn

nnm

mmXE

1log

Page 10: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

We would also like to find Var(X).

1

1122

km

km

p

pZVar k

n

k

n

k

k

kmk

mZVarXVar1

21 1

1

1

1...

2

2

1

1222

nm

n

mmm

Page 11: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

dxxm

xm

n

1

1

2

We now know the formula for E[X] and the Var(X).

dxxm

xmXVar

n

1

1

2

nnm

mmXE

1log

Page 12: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Alfred Renyi

March 30, 1921– Feb. 1, 1970

49 years old

Page 13: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

The Hungarian mathematician spent six months in hiding after being forced into a Fascist Labor Camp in 1944

During that time he rescued his parents from a Budapest prison by dressing up in a soldiers uniform

He got his Ph.D. at the University of Szeged in Hungary

Page 14: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Renyi worked with Erdös on Random Graphs, they published joint work

He worked on number theory and graph theory, which led him to results about the measures of the dependency of random variables

Page 15: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Paul Erdös

“A Mathematician is a machine for turning coffee into theorems”

Page 16: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Born: March 26, 1913

May have been the most prolific mathematician of all time

Written and Co-Authored over

1475 Papers

Page 17: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Erdös was born to two high school

math teachers

His mother kept him out of school until his teen years because she

feared its influence

At home he did mental arithmetic and at three he could multiply numbers

in his head

Page 18: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Fortified by espresso Erdös did math for 19 hours a day, 7 days a week

He devoted his life to a single narrow mission: uncovering mathematical

truth

He traveled around for six decades with a suit case looking for

mathematicians to pick his brain

His motto was:“Another roof, another proof”

Page 19: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

“Property is a nuisance”

“Erdös posed and solved thorny problems in number theory and other areas and founded the field of discrete mathematics which is a foundation of computer science”

Awarded his doctorate in 1934 at the University of Pazmany Peter in Budapest

Page 20: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Graphs

A graph consists of a set of elements V called vertices and a set E of pairs of vertices called edges

A path is a set of vertices i,i1,i2,..,ik,j for which (i,i1),(i1,i2),..,(ik,j) Є E is called a path from i to j

Page 21: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Connected Graphs

A graph is said to be connected if there is a path between each pair of vertices

If a graph is not connected it is called disconnected

Page 22: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Random Graphs

In a random graph, we start with a set of vertices and put in edges at random, thus creating paths

So an interesting question is to find P(graph is connected) such that there is a path to every vertex in the set

Page 23: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

James Stirling

Page 24: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Who is James Stirling?Lived 1692 – 1770.

Family is Roman Catholic in a Protestant England.Family supported Jacobite Cause.Matriculated at Balliol College Oxford Believed to have studied and matriculated at two other universities but this is not certain.Did not graduate because h refused to take an oath because of his Jacobite beliefs.Spent years studying, traveling, and making friends with people such as Sir Isaac Newton and Nicolaus(I) Bernoulli.

Page 25: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Methodus DifferentialisStirling became a teacher in London.

There he wrote the book Methodus Differentialis in 1730.

The book’s purpose is to speed up the convergence of a series.

Stirling’s Formula is recorded in this book in Example 2 of Proposition 28.

nnennn 2!

Page 26: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Stirling’s Formula

Used to approximate n!Is an Asymptotic Expansion.Does not converge.Can be used to approximate a lower bound in a series.Percentile error is extremely low.The bigger the number inserted, the lower the percentile error.

nnennn 2!

Page 27: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Stirling’s Formula Error Probability

About 8.00% wrong for 1!About 0.80% wrong for 10!About 0.08% wrong for 100!Etc…

Percentile Error is close to so if the formula is multiplied by , it only gets better with errors only at .

n12

1

n12

11

2

1

n

Page 28: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Probability Background

Normal Distribution and Central Limit theorem

Poisson Distribution

Multinomial Distribution

Page 29: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

The Normal Distribution

A continuous random variable x with pdf

e 2σμ)(x

σ2π

1f(x) 2

2

normal called is x

Page 30: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Normal Distribution

), shown that becan It X~N(μ

Page 31: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Normal Distribution

Note: When the mean = 0 and standard deviation = 1, we get the standard normal random variable

Z~N(0,1)

Page 32: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Central Limit Theorem

If X1, X2,… are independent identically distributed with common mean µ, and standard deviation σ, then

x

n

ii

n

dyy

xn

n

P ex

21

2

2

1lim

Page 33: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Central Limit Theorem

normalely approximat is S

then,large isn , S If

n

n

1iin x

N(0,1)~σ

μxZ

then,),N(~X If

Page 34: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Poisson Distribution

λ toequalboth varianceandMean

210

x

n p(x) p)(1plim

xnx

n

...,, , xx!

p(x) eλλx

Page 35: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Multinomial Distribution

n independent identical trials of events A1, A2,…,Ak with probabilities P1,P2,...Pk

Define Xi = number times Ai occurs j=1…k

(X1+X2+…+Xk = n) then,

Page 36: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Multinomial Distribution

Where n is sum of ni

PPPnnn

nxnxnxk21 n

k

n

2

n

1k21

kk2211

...!!...!

n!

,...,P

Page 37: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Connected Graphs

Recall: A random graph G consists of vertices, V={1,2,…,n}, random variables x(i) where i=1,..,n along with probabilities

P P P x i j Pj j j ( ) { ( ) } 1

Page 38: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Connected Graphs

The set of random edges is then

which is the edge emanating from vertex i

},..,1 : ))( ,{( niixiE

Page 39: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Connected GraphsThe probability that a random graph is connected P {graph is connected} = ?

A special case: suppose vertex 1 is ‘dead’ (doesn’t spawn an edge)

N = 2 P 1P 2

P P1 2 1 + =

P graph connected P{ } 1

Page 40: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Dead Vertex Lemma

Consider a random graph consisting of vertices 0,1,2,..,r and edges ,

i=1,2,…,r where are independent and , j=0,1,..,r

( , )i Yi

YiP Y j Qi j{ }

00

= }connectedgraph { then )1( if QPQn

jj

Page 41: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Dead Vertex Lemma

1

2

4

3 5

6

Page 42: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Maximal Non-Self Intersecting(MNSI)

Consider the maximal non-self intersecting path emanating from vertex 1:1 1 1 1 12 1, ( ), ( ),..., ( ) ( ( ))x x x x xk k

1 2

45

3k = 3

Page 43: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Maximal Non-Self Intersecting(MNSI)

Define

and set )})1(),..,1(,1{ )1(:min( 1 kk XXXkN

1

1)1(1

N

ixiPPW

Page 44: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Maximal Non-Self Intersecting(MNSI)

By using the MNSI path as the Dead Vertex Lemma,

P graph connected N X X WN{ | , , ( ),..., ( )} 1 1 11

12

3

45

6

7

k = 4

Page 45: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Conditional Probability

}{)(

averages.y probabilit lconditiona are

variablesrandom discrete of nsExpectatio

}{} | {}{

:yprobabilit lconditiona of idea The

xXPxXE

scenarioPscenarioeventPeventP

x

Page 46: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Conditional Probability

} {

scenario

)}1(),..,1(,1,{

)}1(),..,1(,1,| {)(

scenario event

:nsexpectatio Taking

1

1

connectedgraphP

XXNP

XXNconnectedgraphPWE

N

N

Page 47: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Conditional ProbabilitySpecial Case of Interest:

equiprobable vertices)

Pn

WN

nE W

nE N

E N P N i

j

i

n

1

1

0

1

(

[ ] [ ]

[ ] { }

Page 48: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Conditional Probability

E Wn

P N i

n

n n n i

n

n

n

n n i

i

n

i

n

i

[ ] { }

( )( )...( )

!

( )!

( )!

1

1 1 2

1 1

1

0

1

1

1

Page 49: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Conditional Probability

( )!

( )!

( )

( )!

[ ]( )!

!

n

n n n i

n

n

n

n i

j n i

E Wn

n

n

j

ii

n

n

n

n i

i

n

n

j

1 1

1

1

1

1

1

0

1

1

0

1

Let

Page 50: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Poisson Distribution

nk

k

ek

n

ek

kXP

nX

!

!

}{

mean th Poisson wi is Suppose

Page 51: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Poisson Distribution

1

0

1

0

1

0

!

!

}{}{

pick So

n

j

jn

n

k

nk

n

k

j

ne

ek

n

kXPnXP

Page 52: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Central Limit Theorem

2!2

1

!

)(2

1)(

),(

largefor Thm,Limit Central By the

nmean ofPoisson each ... :Recall 21

njjn

n

n

e

j

n

j

ne

asymptoticnXP

nnNS

n

XXXX

Page 53: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Conditional Probability

2

)1( )1(2][on substitutiby So

!

)!1(][ Recall

)1( )1(2)!1(

2!

Formula sStirling' :Recall

2

)1(1

)1(1

n

eennWE

j

n

n

nWE

ennn

ennn

nnn

j

n

nn

nn

Page 54: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Conditional Probability

xn

n

n

n

n

n

n

en

x

n

en

n

enn

enn

n

n

en

)1( lim 1

))1(1(22

1))1((

2

2

)1()1(

2

2

2

)1(2

n

21

2

21

Page 55: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Conditional Probability

nWEconnectedisgraphP

nnn

ee

2][} {

2

2

1=

n2

2=

12

2

1

122

Page 56: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

Thank You “The first sign of senility is when a

man forgets his theorems. The second is when he forgets to zip up. The third is when he forgets to zip down.”

--Paul Erdös

Page 57: Hashing Out Random Graphs Nick Jones Sean Porter Erik Weyers Andy Schieber Jon Kroening

References

http://www-history.mcs.st-andrews.ac.uk/Mathematicians/Erdos.html

http://www-history.mcs.st-andrews.ac.uk/Mathematicians/Renyi.html

http://www.lassp.cornell.edu/sethna/Cracks/Stirling.html

http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Stirling.html