19
Chapter 5. Balls, Bins and Random Graphs Part II. Application Probability and Computing Michael Mitzenmacher and Eli Upfal Presenter : Kim, Deawoo

Balls and-bins model app

Embed Size (px)

Citation preview

Chapter 5.Balls, Bins and Random GraphsPart II. Application

Probability and Computing

Michael Mitzenmacher and Eli Upfal

Presenter : Kim, Deawoo

Password checker• Prevent using common, easily cracked passwords

• Keep a dictionary of unacceptable passwords

• Requested password is part of the unacceptable set?

How to search the unacceptable password list?• Binary search on the dictionary

• Θ(log𝑚) time for 𝑚 words

• Chain Hashing – search time

• Bloom filter – save space

Unacceptable set

App 1. Hashing

12

123 345 12345 22 11 lanada

Goal : Reduce search time

Store unacceptable passwd into appropriate bin using hash function

Searching Item1. Hash input to find the appropriate bin

2. Search sequentially through the linked list in a bin

Hash function 𝑓: 𝑈 → 0, 𝑛 − 1• 𝑓 ∶ uniform random and can be computed in 𝑂(1)

• 𝑈 ∶ All possible passwdstrings

• 𝑛 ∶ array size (# of bins)

• 𝑚 : # of unacceptable passwd (# of balls)

Chain Hashing [1/2]

lanada

123 345 12345

11

Balls-and-Bins model with 𝑚 balls in 𝑛 bins• The distribution of # of balls in a bin is approximately Poisson with 𝜇 = 𝑚/𝑛

Total expected time for searching• 𝐸 # 𝑏𝑎𝑙𝑙𝑠 𝑖𝑛 𝑎 𝑏𝑖𝑛 =

𝑚

𝑛

• Since # balls in a bin is Poisson distribution, 𝐸 𝑋 = 𝜇 = 𝑚/𝑛

• If 𝑛 = 𝑚, 𝐸 # 𝑏𝑎𝑙𝑙𝑠 𝑖𝑛 𝑎 𝑏𝑖𝑛 = 1

• Total expected time for the search is constant

Maximum time for searching• Maximum # balls in a bin : Θ(ln𝑚 / ln ln𝑚) 𝑤. ℎ. 𝑝

Better than binary search

Drawback: wasted space

Chain Hashing [2/2]

Goal : save space

Bloom filters • Array of n bits, initially all set to 0• 𝑘 independent hash functions 𝐻1, 𝐻2, … , 𝐻𝑘

• Bloom filter is used to represent a set𝑆 = {𝑠1, 𝑠2, … , 𝑠𝑚} of m elements

• Balls-and-Bins model with 𝑘 ⋅ 𝑚 balls in 𝑛 bins

Q: Is an element 𝑥 in 𝑆 ?

False positive : According to Bloom filter the element is in the array but it actually isn’t.

False positive matches are possible, but false negatives are not.Useful for “Password checker”

Bloom Filters [1/3]

• False positive

• 𝑦 is not in 𝑆

• Bloom filter

𝐻1 x1 = 3 𝐻1 x2 = 5𝐻2 x1 = 5 𝐻2 x2 = 6𝐻3 x1 = 11 𝐻3 x2 = 9

𝐻1 y = 5 𝐻2 𝑦 = 6 𝐻3 𝑦 = 10

𝐻1 y = 3 𝐻2 𝑦 = 6 𝐻3 𝑦 = 9

Calculating false positive probability (balls-and-bins model)

Bloom filter• The prob. that a specific bit is still 0 is

For requested password

• Let 𝑝 = 𝑒−𝑘𝑚

𝑛 . Then the prob. of false positive is

Optimize # of hash functions 𝑘 to minimize the false positive probability 𝑓, for given 𝑚 and 𝑛

Bloom Filters [2/3]

Increasing k

Decreasing k

Gives us more chances to find a 0-bitfor an element that is not a member of S

Increases the fraction of 0-bits in the array

Minimizing false positive probability

min𝑘

𝑓 = 𝑒𝑔

• where 𝑔 = 𝑘 ln(1 − 𝑒−𝑘𝑚/𝑛)

• This yields a global minimum of 𝑘 = (ln 2) ∙ (𝑛/𝑚)

• In this case the prob. f is (1/2)k ≅(0.6185)n/m

Bloom filters allow a constant prob. of a false positive while keeping 𝑛/𝑚

Bloom filters are highly effective even if n=cm for a small constant c, such as c=8• In this case, when k=5 or k=6 the false positive prob. is just over 0.02

Bloom Filters [3/3]

Random graph models Gn,p• n: # of nodes

• p: edge adding prob.

• Expected number of edges in the graph is 𝑛2

𝑝

• Each vertex has expected degree 𝑛 − 1 𝑝

App 2. Random Graphs

8/25

Hamiltonian Cycle Problem• Input : Given a graph 𝐺 = (𝑉, 𝐸) with 𝑛 vertices• Goal : Does 𝐺 Have a Hamiltonian cycle?

A Hamiltonian cycle is a cycle in the graph that visits every vertex in 𝐺 exactly once

Hamiltonian Cycle is NP-Complete

Question• Q: Hard for most inputs or relatively small fraction of all graphs?• A: Finding Hamiltonian cycle is not hard for suitably randomly selected

graphs. (balls-and-bins model)

Analysis• Propose randomized algorithm for finding Hamiltonian cycle

in random graphs• Probabilistic analysis over random choices and input distribution

using balls-and-bins model

Hamiltonian Cycles in Random Graph

𝑟𝑜𝑡𝑎𝑡𝑒 𝑣6, 𝑣3 : 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5 𝑣6 → 𝑣1 𝑣2 𝑣3 𝑣6 𝑣5 𝑣4

Hamiltonian Cycles in Random Graph

𝑣1 𝑣2 𝑣3 𝑣4 𝑣5 𝑣6 𝑣1 𝑣2 𝑣3 𝑣4 𝑣5 𝑣6

Algorithm Version1

remove edge

When the snake does not have any edge to eat next, algorithm stops and reports “FAIL”

When will the algorithm fail?• Difficult to analyze directly…

Modify the algorithm to “stupid” one, but easy to analyze (Algorithm version 2)

Failure Case of the Algorithm

Modify the rotation process so that the next head of the list is chosen uniformly at random from among all vertices of the graph

Algorithm Version 2

Current state : • 𝑃 : 𝑣4, 𝑣3, 𝑣2, 𝑣1• 𝑣𝑘:head after 𝑘𝑡ℎ steps (𝑣1)• 𝑥𝑘: used edge list for 𝑣𝑘(visited vertex from 𝑣𝑘)

Case 1 : prob = 1/𝑛• Reverse 𝑃, make 𝑣4 as head 𝑣𝑘

Case 3 : prob = 1 - 1/𝑛 - |𝑥𝑘|/𝑛• Choose a random node 𝑣 adjacent to 𝑣𝑘, which was not visited from 𝑣𝑘 previously

• Add 𝑣 into 𝑥𝑘Current 𝑥4: 𝑣3Case 3-1 : choose 𝑣2 - 𝑟𝑜𝑡𝑎𝑡𝑒 𝑣4, 𝑣2 , 𝑥4: 𝑣2, 𝑣3Case 3-2 : choose 𝑣5 - 𝑒𝑥𝑡𝑒𝑛𝑑(𝑣5), 𝑥4: 𝑣3, 𝑣5

Algorithm Version 2

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

1

2

3

4

5

<Case 1>

<Case 3-1>

<Case 3-2>

Case 2 : prob = |𝑥𝑘|/𝑛• Choose a random node 𝑣 that is visited from 𝑣𝑘 previously; select 𝑣 ∈ 𝑥𝑘Current 𝑥4: 𝑣3Choose 𝑣3 as next head: 𝑟𝑜𝑡𝑎𝑡𝑒 𝑣3, 𝑣4

Lemma 5.15

After 𝑘𝑡ℎ steps, if there is at least one adjacent vertex of 𝑣𝑘 unvisited from 𝑣𝑘, then for any vertex 𝑢

Pr(𝑉𝑘+1 = 𝑢|𝑉𝑘 = 𝑢𝑘, 𝑉𝑘−1 = 𝑢𝑘−1,…, 𝑉0 = 𝑢0) = 1/𝑛

Any vertex becomes next head with same probability 1/𝑛

Algorithm Version 2

1

2

3

4

5

<Case 2>

Proof sketch

Balls-and-bins model with 𝑛 bins and 𝑂(𝑛 ln 𝑛) balls

Failure case ( ≤ 𝑂(𝑛−1) )• 𝜀1: Algorithm ran for 3𝑛 ln 𝑛 steps but fail to construct a Hamiltonian cycle

• 𝜀2: Unused edge list is empty in the first 3𝑛 ln 𝑛 iterations

Analysis using Balls-and-Bins Model

𝜀1: Algorithm ran for 3𝑛 ln 𝑛 steps but fail to construct a Hamiltonian cycle

• 𝜀1𝑎: Construct a Hamiltonian path within 2𝑛 ln 𝑛 steps• There exists empty bin after throwing 2𝑛 ln 𝑛 balls

• Probability that 1 bin is empty

• By union bound, the probability for n bins is at most 1/n

• 𝜀1𝑏: Complete a Hamiltonian path to cycle within 𝑛 ln 𝑛 steps

• Pr(𝜀1) ≤2

𝑛

Proof [1/2]

𝜀2: Unused edge list is empty in the first 3𝑛 ln 𝑛iterations

• 𝜀2𝑎: At least 9 ln 𝑛 edges were removed from the unused-edge list of at least one vertex in 3𝑛 ln 𝑛 steps• Maximum loads in a bin is more than 9 ln 𝑛 throwing 3𝑛 ln𝑛 balls

• 𝜀2𝑏: At least one vertex had fewer than 10 ln 𝑛 edges

The probability that the algorithm fails to find a Hamiltonian cycle in 3𝑛 ln 𝑛 steps is bounded by

Proof [2/2]

What is the worst case performance of bucket sort?

Why is balls-and-bins model is useful?

What is other example of using balls-and-bins model to analyze algorithm?

Review Question