23
Hashing Algorithm 9042635 羅羅羅 9142610 羅羅羅 9142621 羅羅羅

Hashing Algorithm

  • Upload
    ira

  • View
    47

  • Download
    1

Embed Size (px)

DESCRIPTION

Hashing Algorithm. 9042635 羅正鴻 9142610 林彥廷 9142621 戴嘉宏. Introduction. Hashing , a ubiquitous information retrieval strategy for providing efficient access to information based on a key Information can usually be accessed in constant time Hashing ’ s drawbacks. Concept of hashing. - PowerPoint PPT Presentation

Citation preview

Page 1: Hashing Algorithm

Hashing Algorithm

9042635 羅正鴻9142610 林彥廷9142621 戴嘉宏

Page 2: Hashing Algorithm

Introduction Hashing , a ubiquitous information

retrieval strategy for providing efficient access to information based on a key

Information can usually be accessed in constant time

Hashing’s drawbacks

Page 3: Hashing Algorithm

Concept of hashing The problem at hand is to define and

implement a mapping from a domain of keys to a domain of locations

From the performance standpoint, the goal is to avoid collisions (A collision occurs when two or more keys map to the same location)

From the compactness standpoint, no application ever stores all keys in a domain simultaneously unless the size of the domain is small

Page 4: Hashing Algorithm

Concept of hashing (con’t)

The information to be retrieved is stored in a hash table which is best thought of as an array of m locations, called buckets

The mapping between a key and a bucket is called the hash function

The time to store and retrieve data is proportional to the time to compute the hash function

Page 5: Hashing Algorithm

Hashing function The ideal function, termed a perfect

hash function, would distribute all elements across the buckets such that no collisions ever occurred

h(v) = f(v) mod m Knuth(1973) suggests using as the

value for m a prime number

Page 6: Hashing Algorithm

Hashing function(con’t)

It is usually better to treat v as a sequence of bytes and do one of the following for f(v) :(1) Sum or multiply all the bytes. Overflow can be ignored (2) Use the last (or middle) byte instead of the first (3) Use the square of a few of the middle

bytes

Page 7: Hashing Algorithm

Implementing hashing The following operations are usually

provided by an implementation of hashing :(1) Initialization (2) Insertion (3) Retrieval (4) Deletion

Page 8: Hashing Algorithm

Chained hashing

Page 9: Hashing Algorithm

Chained hashing(con’t)

In the worst case (where all n keys map to a single location), the average time to locate an element will be proportional to n/2.

In the best case (where all chains are of equal length), the time will be proportional to n/m.

Page 10: Hashing Algorithm

Open addressing

Page 11: Hashing Algorithm

Minimal perfect hash functions Minimal perfect hash function

(MPHF) is a perfect hash function with the property that is hashed m keys to m buckets with no collisions

Cichelli(1980) and of Cercone et al.(1983) proposed two important concepts :(1)using tables of values as the parameters(2)using a mapping, ordering, and searching (MOS) approach

Page 12: Hashing Algorithm

Minimal perfect hash functions(con’t)

Mapping : transform the key set from an original to a new universe

Ordering : place the keys in a sequence that determines the order in which hash values are assigned to keys

Searching : assign hash values to the keys of each level

Mapping → Ordering → Searching

Page 13: Hashing Algorithm

Sager’s method and improvement Sager(1984,1985) formalizes and

extends Cichelli’s approach In the mapping step, three

auxiliary(hash) functions are defined on the original universe of keys U : h0 : U→{ 0 , …… , m - 1 }

h1 : U→{ 0 , …… , r - 1 }

h2 : U→{ r , …… , 2r –1 }

Page 14: Hashing Algorithm

Sager’s method and improvement

The class of functions searched is h(k) = ( h0(k) + g(h1(k)) + g(h2(k)) (mod m)

Sager uses a graph that represents the constraints among keys

The mapping step goes from keys to triples to a special bipartite graph, the dependency graph, whose vertices are the h1(k) and h2(k) values and whose edges represent the words

Page 15: Hashing Algorithm

Sager’s method and improvement

Page 16: Hashing Algorithm

The algorithm The mapping step

Page 17: Hashing Algorithm
Page 18: Hashing Algorithm
Page 19: Hashing Algorithm

The algorithm (con’t)

The ordering step

Page 20: Hashing Algorithm
Page 21: Hashing Algorithm

The algorithm (con’t)

The searching step

Page 22: Hashing Algorithm
Page 23: Hashing Algorithm

Discussion Hashing algorithm is a constant-time

algorithm, and there are always advantages to being able to predict the time needed to locate a key

The MPHF uses a large amount of space