Upload
primrose-atkinson
View
213
Download
0
Embed Size (px)
Citation preview
Hashing Out Random Graphs
Nick Jones
Sean Porter
Erik Weyers
Andy Schieber
Jon Kroening
IntroductionWe will be looking at some applications of probability in computer science, hash functions, and also applications of probability with random graphs.
Hash Functions
We are going to map at set of n records, denoted , r1, r2, … rn, in m, m > n, locations with only one record in each location in m.
A hashing function is a function that maps the record values into the m locations.
We use a sequence of hash functions, denoted h1, h2, h3, …, to map the ri records in the m locations.
The records are placed sequentially as indicated below: h1(r1) = m1.
h1(r2), h2(r2), h3(r3), …
Every time we are unsuccessful in placing a record (because it is already full), a collision occurs.
We will let the random variable X denote the number of collisions that occur when placing n records.
We would like to find E[X] and Var(X).
These values are very hard to figure out but we can come up with a formula for each of these two problems.
In order to do this we need to define some other random variables.
Yk = #of collisions in placing rk
Therefore,
n
n
k
k YYYYX
...21
1
1 kk YZ
nZZZX n ...21
(geometric with p = (m-k+1)/m)
We can then find E[Zk].
1
1
km
m
p]E[Zk
nZEZEZEXE n ][...][][][ 21
n
k
kZEn1
][
n
k km
mn
1 1
1
1...
1
11
nmmmmn
m
nm
xdxmn1
/
1log
nm
mmn
nnm
mmXE
1log
We would also like to find Var(X).
1
1122
km
km
p
pZVar k
n
k
n
k
k
kmk
mZVarXVar1
21 1
1
1
1...
2
2
1
1222
nm
n
mmm
dxxm
xm
n
1
1
2
We now know the formula for E[X] and the Var(X).
dxxm
xmXVar
n
1
1
2
nnm
mmXE
1log
Alfred Renyi
March 30, 1921– Feb. 1, 1970
49 years old
The Hungarian mathematician spent six months in hiding after being forced into a Fascist Labor Camp in 1944
During that time he rescued his parents from a Budapest prison by dressing up in a soldiers uniform
He got his Ph.D. at the University of Szeged in Hungary
Renyi worked with Erdös on Random Graphs, they published joint work
He worked on number theory and graph theory, which led him to results about the measures of the dependency of random variables
Paul Erdös
“A Mathematician is a machine for turning coffee into theorems”
Born: March 26, 1913
May have been the most prolific mathematician of all time
Written and Co-Authored over
1475 Papers
Erdös was born to two high school
math teachers
His mother kept him out of school until his teen years because she
feared its influence
At home he did mental arithmetic and at three he could multiply numbers
in his head
Fortified by espresso Erdös did math for 19 hours a day, 7 days a week
He devoted his life to a single narrow mission: uncovering mathematical
truth
He traveled around for six decades with a suit case looking for
mathematicians to pick his brain
His motto was:“Another roof, another proof”
“Property is a nuisance”
“Erdös posed and solved thorny problems in number theory and other areas and founded the field of discrete mathematics which is a foundation of computer science”
Awarded his doctorate in 1934 at the University of Pazmany Peter in Budapest
Graphs
A graph consists of a set of elements V called vertices and a set E of pairs of vertices called edges
A path is a set of vertices i,i1,i2,..,ik,j for which (i,i1),(i1,i2),..,(ik,j) Є E is called a path from i to j
Connected Graphs
A graph is said to be connected if there is a path between each pair of vertices
If a graph is not connected it is called disconnected
Random Graphs
In a random graph, we start with a set of vertices and put in edges at random, thus creating paths
So an interesting question is to find P(graph is connected) such that there is a path to every vertex in the set
James Stirling
Who is James Stirling?Lived 1692 – 1770.
Family is Roman Catholic in a Protestant England.Family supported Jacobite Cause.Matriculated at Balliol College Oxford Believed to have studied and matriculated at two other universities but this is not certain.Did not graduate because h refused to take an oath because of his Jacobite beliefs.Spent years studying, traveling, and making friends with people such as Sir Isaac Newton and Nicolaus(I) Bernoulli.
Methodus DifferentialisStirling became a teacher in London.
There he wrote the book Methodus Differentialis in 1730.
The book’s purpose is to speed up the convergence of a series.
Stirling’s Formula is recorded in this book in Example 2 of Proposition 28.
nnennn 2!
Stirling’s Formula
Used to approximate n!Is an Asymptotic Expansion.Does not converge.Can be used to approximate a lower bound in a series.Percentile error is extremely low.The bigger the number inserted, the lower the percentile error.
nnennn 2!
Stirling’s Formula Error Probability
About 8.00% wrong for 1!About 0.80% wrong for 10!About 0.08% wrong for 100!Etc…
Percentile Error is close to so if the formula is multiplied by , it only gets better with errors only at .
n12
1
n12
11
2
1
n
Probability Background
Normal Distribution and Central Limit theorem
Poisson Distribution
Multinomial Distribution
The Normal Distribution
A continuous random variable x with pdf
e 2σμ)(x
σ2π
1f(x) 2
2
normal called is x
Normal Distribution
), shown that becan It X~N(μ
Normal Distribution
Note: When the mean = 0 and standard deviation = 1, we get the standard normal random variable
Z~N(0,1)
Central Limit Theorem
If X1, X2,… are independent identically distributed with common mean µ, and standard deviation σ, then
x
n
ii
n
dyy
xn
n
P ex
21
2
2
1lim
Central Limit Theorem
normalely approximat is S
then,large isn , S If
n
n
1iin x
N(0,1)~σ
μxZ
then,),N(~X If
Poisson Distribution
λ toequalboth varianceandMean
210
x
n p(x) p)(1plim
xnx
n
...,, , xx!
p(x) eλλx
Multinomial Distribution
n independent identical trials of events A1, A2,…,Ak with probabilities P1,P2,...Pk
Define Xi = number times Ai occurs j=1…k
(X1+X2+…+Xk = n) then,
Multinomial Distribution
Where n is sum of ni
PPPnnn
nxnxnxk21 n
k
n
2
n
1k21
kk2211
...!!...!
n!
,...,P
Connected Graphs
Recall: A random graph G consists of vertices, V={1,2,…,n}, random variables x(i) where i=1,..,n along with probabilities
P P P x i j Pj j j ( ) { ( ) } 1
Connected Graphs
The set of random edges is then
which is the edge emanating from vertex i
},..,1 : ))( ,{( niixiE
Connected GraphsThe probability that a random graph is connected P {graph is connected} = ?
A special case: suppose vertex 1 is ‘dead’ (doesn’t spawn an edge)
N = 2 P 1P 2
P P1 2 1 + =
P graph connected P{ } 1
Dead Vertex Lemma
Consider a random graph consisting of vertices 0,1,2,..,r and edges ,
i=1,2,…,r where are independent and , j=0,1,..,r
( , )i Yi
YiP Y j Qi j{ }
00
= }connectedgraph { then )1( if QPQn
jj
Dead Vertex Lemma
1
2
4
3 5
6
Maximal Non-Self Intersecting(MNSI)
Consider the maximal non-self intersecting path emanating from vertex 1:1 1 1 1 12 1, ( ), ( ),..., ( ) ( ( ))x x x x xk k
1 2
45
3k = 3
Maximal Non-Self Intersecting(MNSI)
Define
and set )})1(),..,1(,1{ )1(:min( 1 kk XXXkN
1
1)1(1
N
ixiPPW
Maximal Non-Self Intersecting(MNSI)
By using the MNSI path as the Dead Vertex Lemma,
P graph connected N X X WN{ | , , ( ),..., ( )} 1 1 11
12
3
45
6
7
k = 4
Conditional Probability
}{)(
averages.y probabilit lconditiona are
variablesrandom discrete of nsExpectatio
}{} | {}{
:yprobabilit lconditiona of idea The
xXPxXE
scenarioPscenarioeventPeventP
x
Conditional Probability
} {
scenario
)}1(),..,1(,1,{
)}1(),..,1(,1,| {)(
scenario event
:nsexpectatio Taking
1
1
connectedgraphP
XXNP
XXNconnectedgraphPWE
N
N
Conditional ProbabilitySpecial Case of Interest:
equiprobable vertices)
Pn
WN
nE W
nE N
E N P N i
j
i
n
1
1
0
1
(
[ ] [ ]
[ ] { }
Conditional Probability
E Wn
P N i
n
n n n i
n
n
n
n n i
i
n
i
n
i
[ ] { }
( )( )...( )
!
( )!
( )!
1
1 1 2
1 1
1
0
1
1
1
Conditional Probability
( )!
( )!
( )
( )!
[ ]( )!
!
n
n n n i
n
n
n
n i
j n i
E Wn
n
n
j
ii
n
n
n
n i
i
n
n
j
1 1
1
1
1
1
1
0
1
1
0
1
Let
Poisson Distribution
nk
k
ek
n
ek
kXP
nX
!
!
}{
mean th Poisson wi is Suppose
Poisson Distribution
1
0
1
0
1
0
!
!
}{}{
pick So
n
j
jn
n
k
nk
n
k
j
ne
ek
n
kXPnXP
Central Limit Theorem
2!2
1
!
)(2
1)(
),(
largefor Thm,Limit Central By the
nmean ofPoisson each ... :Recall 21
njjn
n
n
e
j
n
j
ne
asymptoticnXP
nnNS
n
XXXX
Conditional Probability
2
)1( )1(2][on substitutiby So
!
)!1(][ Recall
)1( )1(2)!1(
2!
Formula sStirling' :Recall
2
)1(1
)1(1
n
eennWE
j
n
n
nWE
ennn
ennn
nnn
j
n
nn
nn
Conditional Probability
xn
n
n
n
n
n
n
en
x
n
en
n
enn
enn
n
n
en
)1( lim 1
))1(1(22
1))1((
2
2
)1()1(
2
2
2
)1(2
n
21
2
21
Conditional Probability
nWEconnectedisgraphP
nnn
ee
2][} {
2
2
1=
n2
2=
12
2
1
122
Thank You “The first sign of senility is when a
man forgets his theorems. The second is when he forgets to zip up. The third is when he forgets to zip down.”
--Paul Erdös
References
http://www-history.mcs.st-andrews.ac.uk/Mathematicians/Erdos.html
http://www-history.mcs.st-andrews.ac.uk/Mathematicians/Renyi.html
http://www.lassp.cornell.edu/sethna/Cracks/Stirling.html
http://www-gap.dcs.st-and.ac.uk/~history/Mathematicians/Stirling.html