Hashing (DASTAL)

Embed Size (px)

Citation preview

  • 8/14/2019 Hashing (DASTAL)

    1/27

    Hashing

  • 8/14/2019 Hashing (DASTAL)

    2/27

    Hashing

    Hashing is the transformation of a

    string of characters into a usually

    shorter fixed-length value or key that

    represents the original string. Hashing

    is used to index and retrieve items in adatabase because it is faster to find the

    item using the shorter hashed key than

    to find it using the original value. It isalso used in many encryption

    algorithms.

  • 8/14/2019 Hashing (DASTAL)

    3/27

    Hash Table

    Is a data structure that

    associates keys with values

    A small phone book as a hash table.

    http://en.wikipedia.org/wiki/File:HASHTB08.svg
  • 8/14/2019 Hashing (DASTAL)

    4/27

    Hash Table (1)

    The primary operation it supports

    efficiently is a lookup: given a key (a

    person's name), find the corresponding

    value (that person's telephone number). It

    works by transforming the key using ahash function into a hash, a number that

    is used as an index in an array to locate

    the desired location where the values

    should be.

  • 8/14/2019 Hashing (DASTAL)

    5/27

    Hash Function

    The hashing algorithm

    is any well-defined procedure or

    mathematical function which converts a

    large, possibly variable-sized amount of

    data into a small datum, usually a singleinteger that may serve as an index into an

    array. The values returned by a hash

    function are called hash values, hashcodes, hash sums, or simply hashes.

  • 8/14/2019 Hashing (DASTAL)

    6/27

    Hash Function

  • 8/14/2019 Hashing (DASTAL)

    7/27

    1.Direct Hashing The key is the address without anyalgorith-mic manipulation. The data structure must

    therefore contain an element for everypossible key.

    While the situations where you can use

    direct hashing are limited, when it can beused it is very powerful because itguarantees that there are no synonyms.

  • 8/14/2019 Hashing (DASTAL)

    8/27

    001 Elmer

    002 Markh

    005 Reymund

    007 Hubert

    100 Rollyn

    HashFunction

    005100002

    5100

    2

    Address

    Key

  • 8/14/2019 Hashing (DASTAL)

    9/27

    2.Subtration MethodSometimes we have keys that areconsecutive but do not start from one.

    Example:A company may have only 100

    employees, but the employee numbersstart from 1000 and go to 1100. In this

    case, we use a very simple hashing functionthat subtracts 1000 from the key todetermine the address.

  • 8/14/2019 Hashing (DASTAL)

    10/27

    3.Digit ExtractionSelected digits are extracted from the keyand used as the address.

    Example:Using six-digit employee number to

    hash to a three-digit address (000-999), wecould select the first, third, and fourth

    digits.

    379452 = 394121267 = 112

    378845 = 388=

  • 8/14/2019 Hashing (DASTAL)

    11/27

    379452 Elmer

    121267 Markh

    378845 Hubert

    160252 Arno045128 Rollyn

    HashFunction

    121267045128379452

    33071

    Divides the key by thearray size and usesthe remainder + 1

    [001]

    [006][005]

    [004]

    [003]

    [002]

    [007]

    [306]

    [307]

    .

    .

    .

    .

    .

    4.Mod division

  • 8/14/2019 Hashing (DASTAL)

    12/27

    5.Midsquare Hashing The key is squared and the addressselected from the middle of the squarednumber.

    Example:

    9452 * 9452 = 89340304 : address is3403

    As a variation, we can select a portion ofthe key, and then use them rather than thewhole key.

    379452 : 379 * 379 = 143641 : address

  • 8/14/2019 Hashing (DASTAL)

    13/27

    6.Folding Methods There are two folding methods that areused:

    Fold Shift, the key value is divided intoparts whose size matches the size of therequired address. Then, the left and rightparts are shifted and added with the middle

    part. Fold Boundary, the left and right numbers

    are folded on a fixed boundary betweenthem and the center number. This resultsin a two outside values being reverse

  • 8/14/2019 Hashing (DASTAL)

    14/27

    12

    345678936

    8

    32

    145698776

    4

    1

    123456789

    1

    123

    789

    Discarded

    123

    Key

    Digitsreversed

    789

    Digitsreversed

  • 8/14/2019 Hashing (DASTAL)

    15/27

  • 8/14/2019 Hashing (DASTAL)

    16/27

    Collision

  • 8/14/2019 Hashing (DASTAL)

    17/27

    Collision

    Is the event that occurs when a hashingalgorithm produce an address for aninsertion key and that address is alreadyoccupied.Home Address

    The address produced by hashingalgorithm.Prime Area

    The memory that contains all of the homeaddresses.

    Probe Calculation of address and test for success.

  • 8/14/2019 Hashing (DASTAL)

    18/27

    [1] [5] [9] [17]

    1. hash(A)

    2. hash(B) 3. hash(C)

    B & ACollides C & B

    Collides

    A BC

  • 8/14/2019 Hashing (DASTAL)

    19/27

    Collision Resolution

    The process of finding alternate location

    Collision strategy techniques:

    Separate chaining

    Open addressing

    Coalesced hashing

    Perfect hashing Dynamic perfect hashing

    Probabilistic hashing

    Robin hood hashing

    Cache-conscious collision resolution

  • 8/14/2019 Hashing (DASTAL)

    20/27

    Separate Chaining

    Sometimes called simply

    chaining or direct chaining, inits simplest form each slot in the

    array is a linked list, or the

    head cell of a linked list, where

    the list contains the elements

    that hashed to the samelocation. Insertion requires

    finding the correct slot, then

    appending to either end of the

    list in that slot

    http://en.wikipedia.org/wiki/File:HASHTB32.svg
  • 8/14/2019 Hashing (DASTAL)

    21/27

    Open Addressing

    Open addressing hash tables store the records directly

    within the array. This approach is also called closedhashing. A hash collision is resolved byprobing, or

    searching through alternate locations in the array

    (following aprobe sequence) until either the target record

    is found, or an unused array slot is found, which indicatesthat there is no such key in the table.

  • 8/14/2019 Hashing (DASTAL)

    22/27

  • 8/14/2019 Hashing (DASTAL)

    23/27

    379452 Elmer

    121267 Markh

    378845 Hubert

    160252 Arno

    045128 Rollyn

    HashFunction

    070918

    166702

    Collision is resolvedby adding one(1) tothe current address

    [001]

    [006]

    [005]

    [004]

    [003]

    [002]

    [007]

    [306]

    [307]

    .

    .

    .

    .

    .Linear Probing

    070918 Redjie

    166702 Reymund

  • 8/14/2019 Hashing (DASTAL)

    24/27

    Quadratic ProbingThe increment is the collision probe number

    squared.

    Probe Collision Probe2 and New

    Num Location Increment Address

    1 1 12 = 1 12 2 22 = 4 3

    3 6 32

    = 9 54 15 42 = 16 75 31 52 = 25 96 56 62 = 36 11

  • 8/14/2019 Hashing (DASTAL)

    25/27

    Key OffsetIs a double hashing method that produces

    different collision path for different keys.

    Formula:

    offset = (key / listsize)adress = ((offset + old address) modulo

    listsize) + 1

    For example if the key is 166702 and thelistsize is 307, using the modulo division

    offset = (166702 / 307) = 543

    address = ((543 + 002) modulo 307) + 1= 239

  • 8/14/2019 Hashing (DASTAL)

    26/27

    379452 Elmer

    070918 Redjie

    121267 Markh

    378845 Hubert

    160252 Arno

    045128 Rollyn

    [001]

    [006]

    [005]

    [004]

    [003]

    [002]

    [007]

    [306]

    [307]

    .

    .

    .

    .

    .

    166702 Reymund

    572556 Angelus

  • 8/14/2019 Hashing (DASTAL)

    27/27

    H h lli i l d b li bi (i t l 1)

    http://en.wikipedia.org/wiki/File:HASHTB12.svghttp://en.wikipedia.org/wiki/File:HASHTB12.svghttp://en.wikipedia.org/wiki/File:HASHTB12.svg