47
File Structure SNU-OOPSLA Lab 1 Chap11 Chap11 . Hashing . Hashing 서서서서서 서서서서서서 서서서서서서서서서서 SNU-OOPSLA-LAB 서서 서 서 서 File Strutures by Folk, Zoellick and Riccardi

File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

Embed Size (px)

DESCRIPTION

File StructureSNU-OOPSLA Lab3 Contents(1) 11.1 Introduction 11.2 A Simple Hashing Algorithm 11.3 Hashing Functions and Record Distribution 11.4 How Much Extra Memory Should Be Used? 11.5 Collision Resolution by Progressive Overflow 11.6 Storing More Than One Record per Address: Buckets 11.7 Making Deletions 11.8 Other Collision Resolution Techniques 11.9 Patterns of Record Access

Citation preview

Page 1: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 1

Chap11Chap11. Hashing. Hashing

서울대학교 컴퓨터공학부객체지향시스템연구실SNU-OOPSLA-LAB

교수 김 형 주

File Strutures by Folk, Zoellick and Riccardi

Page 2: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 2

Chapter ObjectivesChapter Objectives Introduce the concept of hashing Examine the problem of choosing a good hashing algorithm,

presents a reasonable one in detail, and describe some others Explore several approaches for reducing collisions and storage of

several records per address Develop and use mathematical tools for analyzing performance

differences resulting from the use of different hashing techniques Examine problems associated with file deterioration (record

deletions) and discuss some solutions Discuss collision resolution techniques Examine effects of patterns of record access on performance

Page 3: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 3

Contents(1)Contents(1)11.1 Introduction11.2 A Simple Hashing Algorithm11.3 Hashing Functions and Record Distribution11.4 How Much Extra Memory Should Be Used?11.5 Collision Resolution by Progressive Overflow11.6 Storing More Than One Record per Address: Buckets11.7 Making Deletions11.8 Other Collision Resolution Techniques11.9 Patterns of Record Access

Page 4: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 4

Overview(1)Overview(1)

O(1) access to files Variation of the relative file Record number for a record is not arbitrary; rather, it is

obtained by a hashing function H applied to the primary key, H(key)

Record numbers generated should be uniformly and randomly distributed such that 0 < H(key) < N

Overview

Page 5: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 5

Overview(2)Overview(2) A hash function is like a block box that produces an

address every time you drop in a key

All parts of the key should be used by the hashing function H so that a lot of records with similar keys do not all hash to the same location

Given two random keys X, Y and N slots, the probability H(X)=H(Y) is 1/N; in this case, X and Y are called synonyms and a collision occurs

Overview

Page 6: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 6

IntroductionIntroduction

Hash function : h(k) Transforms a key K into an address

Hash vs other index Sequential search : O(N) Binary search : O(log2N) B(B+) Tree index : O(logkN) where k records in an index node Hash : O(1)

11.1 Introduction

Page 7: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 7

A Simple Hashing Scheme(1)A Simple Hashing Scheme(1)

Name ASCII Code forFirst Two Letters Product Home

Address

BALL

LOWELL

TREE

66 65

76 96

84 82

66 X 65 = 4,290

76 X 96 = 6,004

84 X 82 = 6,888

4,290

6,004

6,888

11.1 Introduction

Page 8: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 8

A Simple Hashing Scheme(2)A Simple Hashing Scheme(2)

LOWELL’shome

address

K=LOWELL

h(K)

Address

Address Record

key 1234

0

56

... ...LOWELL . . .

4

11.1 Introduction

Page 9: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 9

Idea behind Hash-based FilesIdea behind Hash-based Files

Record with hash key i is stored in node i All record with hash key h are stored in node h Primary blocks of data level nodes are stored sequentially Contents of the root node can be expressed by a simple functi

on: Address of data level node for record with primary key k = address of node 0 + H(k)

In literature on hash-based files, primary blocks of data level nodes are called buckets

11.1 Introduction

Page 10: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 10

e.g. Hash-based Filee.g. Hash-based File

0 1 2 3 4 5 6

root node

70 15 50 1 30 51 10 45 3 11 60 61124 40 20 55

57 14 15

11.1 Introduction

Page 11: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 11

Hashing(1)Hashing(1)

Hashing Functions : Consider a primary key consisting of a string of 12 letters and a file with 100,000 slots. Since 2612 >> 105,

So synonyms (collisions) are inevitable!

11.1 Introduction

Page 12: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 12

Hashing(2)Hashing(2) Possible means of hashing

First, because 12 characters = 12 bytes = 3(32-bit) words, partition into 3 words and perform folding as follows : either

(a) add modulo 232, or (b) combine using exclusive OR, or (c) invent your own method

Next, let R = V mod N R => Record-number V => Value obtained in the above N => Number of Slots so 0< R < N

If N has many small factors, a poor distribution leading to many collisions can occur. Normally, N is chosen to be a primary number

11.1 Introduction

Page 13: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 13

If M = number of records, N = number of available slots, P(k) = probability of k records hashing to the same slot

then P(k) = where f is the loading factor M/N

As f --> 1, we know that p(0)->1/e and p(1) -> 1/e. The other (1-1/e) of the records must hash into (1-2/e) of the slots, fo

r an average of 2.4 slot. So many synonyms!!

Hashing(3)Hashing(3)

MK

1N x 1

N1- ~~

ek*k!

fk

11.1 Introduction

Page 14: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 14

CollisionCollision Collision

Situation in which a record is hashed to an address that does not have sufficient room to store the record

Perfect hashing : impossible! Different key, same hash value (Different record, same address)

Solutions Spread out the records Use extra memory Put more than one record at a single address

11.1 Introduction

Page 15: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 15

A Simple Hashing AlgorithmA Simple Hashing Algorithm

Step 1. Represent the key in numerical form If the key is a string : take the ASCII code If the key is a number : nothing to be done

e.g.. LOWELL = 76 79 87 69 76 76 32 32 32 32 32 32 L O W E L L ( 6 blanks )

11.2 A Simple Hashing Algorithm

Page 16: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 16

Step 2 . Fold and Add Fold

76 79 | 87 69 | 76 76 | 32 32 | 32 32 | 32 32

Add parts into one integer

(Suppose we use 15bit integer expression, 32767 is limit)

7679 + 8769 + 7676 + 3232 + 3232 + 3232 = 33820 > 32767 (overflow!)

Largest addend : 9090 ( ‘ZZ’ ) Largest allowable result : 32767 - 9090 = 19937 Ensure no intermediate sum exceeds using ‘mod’

( 7679 + 8769 ) mod 19937 = 16448 (16448 + 7676 ) mod 19937 = 4187 ( 4187 + 3232 ) mod 19937 = 7419 ( 7419 + 3232 ) mod 19937 = 10651 ( 10651 + 3232) mod 19937 = 13883

11.2 A Simple Hashing Algorithm

Page 17: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 17

a = s mod n a : home address s : the sum produced in step 2 n : the number of addresses in the file e.g.. a = 13883 mod 100 = 83 (s = 13883, n = 100)

A prime number is usually used for the divisor because primes tend to distribute remainders much more uniformly than do nonprimes

So, we chose a prime number as close as possible to the desired size of the address space (eg, a file with 75 records, a good choice for n is 101, then the file will become 74.3 percent full

Step 3. Divide by size of the address space

11.2 A Simple Hashing Algorithm

Page 18: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 18

Hashing Functions and Record DistributionsHashing Functions and Record Distributions

Distributing records among address

Acceptable

ABCDEFG

123456789

10

Record AddressBest

ABCDEFG

123456789

10

Record AddressABCDEFG

123456789

10

Record AddressWorst

11.3 Hashing Functions and Record Distributions

Page 19: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 19

Some other hashing methodsSome other hashing methods

Better-than-random Examine keys for a pattern Fold parts of the key Divide the key by a number

When the better-than-random methods do not work ----- randomize! Square the key and take the middle Radix transformation

11.3 Hashing Functions and Record Distributions

Page 20: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 20

How Much Extra Memory Should Be Used?How Much Extra Memory Should Be Used?

Packing Density = # of records# of spaces

rN

=

The more records are packed, the more likely a collision will occur

11.4 How Much Extra Memory Should Be Used?

Page 21: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 21

Poisson Distribution

p(x) = (r/N)xe -r/N

x!(poisson distribution)

N = the number of available addressesr = the number of records to be storedx = the number of records assigned to a given address

p(x) : the probability that a given address will have x records assigned to it after the hashing function has been applied to all n records

( x records 가 collision 할 확률 )

11.4 How Much Extra Memory Should Be Used?

Page 22: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 22

Predicting Collisions for Different Packing DensitiesPredicting Collisions for Different Packing Densities # of addresses no record assigned : N X P(0) # of addresses one record assigned : N X P(1) # of addresses more than two assigned :

N X [P(2) + P(3) + P(4) + ...] # of overflows : 1 X NP(2) + 2 X NP(3) +... Percentage of overflow records :

N1 X NP(2) + 2 X NP(3) + 3 X NP(4) ...

X 100

11.4 How Much Extra Memory Should Be Used?

Page 23: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 23

The larger space, the less overflowsThe larger space, the less overflows

Packing Density(%)

Synonym as % of records

102030405060708090100

4.89.413.617.621.424.828.131.234.136.8

11.4 How Much Extra Memory Should Be Used?

N addresses

r records

r/N

Page 24: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 24

Collision Resolution by Progressive OverflowCollision Resolution by Progressive Overflow

Progressive overflow (= linear probing) Insert a new record

1. Take home address if empty 2. Otherwise, next several addresses are searched in sequence,

until an empty one is found 3. If no more space -- wrapping around

11.5 Collision Resolution by Progressive Overflow

Page 25: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 25

Progressive Overflow(Cont’d)

York’s homeaddress (busy)

KeyYorkHashRoutine Address

0

1

5

6

7

8

9

2nd try (busy)3rd try (busy)4th try (open)York’s actual address

....

........

Novak. . .Rosen. . .Jasper. . .Morely. . .

....6

11.5 Collision Resolution by Progressive Overflow

Page 26: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 26

Progressive Overflow(Cont'd)

KeyBlue

HashRoutine Address

01

2

....

........

....

99

97

98

99 Jello...

Wrapping around

11.5 Collision Resolution by Progressive Overflow

Page 27: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 27

Progressive Overflow(Cont'd)Progressive Overflow(Cont'd)

Search a record with a hash function value k: from home address k, look at successive records, until Found, or An open address is encountered

Worst case When the record does not exist and the file is full

11.5 Collision Resolution by Progressive Overflow

Page 28: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 28

Progressive Overflow(Cont'd)Progressive Overflow(Cont'd)

- Search length : # of accesses required to retrieve a record (from secondary memory)

202122232425

Adams. . .Bates. . .Cole. . .Dean. . .Evans. . .

...

......

ActualAddress

Home Address2021212220

Search length

11225

Average search length = total search lengthtotal # of records

= 2.2

11.5 Collision Resolution by Progressive Overflow

Page 29: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 29

Progressive Progressive Overflow(Cont'd)Overflow(Cont'd) With perfect hashing function : average search length = 1

Average search length of no greater than 2.0 are generally considered acceptable

Averagesearchlength

Packing density20 60 80 10040

543

21

11.5 Collision Resolution by Progressive Overflow

Page 30: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 30

Storing More Than One Record per Address : Storing More Than One Record per Address : BucketsBuckets

Bucket : a block of records sharing the same address

Key

GreenHallJerkKingLandMarxNutt

Home Address

30 30 32 33 33 33 33

Green ... Hall ...

Jenks ...King... Land... Marks...

30313233

Bucket contents

(Nutt... is an overflow record)

Bucketaddress

11.6 Storing More Than One Record per Address : Buckets

Page 31: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 31

Effects of Buckets on PerformanceEffects of Buckets on Performance

N : # of addresses b : # of records fit in a bucket bN : # of available locations for records Packing density = r/bN

# of overflow recordsN X [ 1XP(b+1) + 2XP(b+2) + 3XP(b+3)...]

As the bucket size gets larger, performance continues to improve

11.6 Storing More Than One Record per Address : Buckets

Page 32: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 32

Bucket ImplementationBucket Implementation

A full bucket

An empty bucket

Two entries

/ / / / / / / / / / / / / / /

ARNSWORTHJONES / / / / / / / / / / / / / / /

ARNSWORTHJONES STOCKTON BRICE TROOP

0

2

5

/ / / / // / / / /

Collision counter =< bucket size

11.6 Storing More Than One Record per Address : Buckets

Page 33: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 33

Bucket Implementation(Cont'd)Bucket Implementation(Cont'd)

Initializing and Loading Creating empty space Use hash values and find the bucket to store If the home bucket is full, continue to look at successive

buckets Problems when

No empty space exists Duplicate keys occur

11.6 Storing More Than One Record per Address : Buckets

Page 34: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 34

Making DeletionsMaking Deletions

The slot freed by the deletion hinders(disturb) later searches Use tombstones and reuse the freed slots

Record

AdamsJonesMorrisSmith

Homeaddress

5 6 6 5

Adams...Jones...Morris...Smith...

5678

Adams...Jones...######Smith...

5678

A tombstone for Morris

Delete Morris

11.7 Making Deletions

Page 35: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 35

Other Collision Resolution TechniquesOther Collision Resolution Techniques

Double hashing : avoid clustering with a second hash function for overflow records

Chained progressive overflow : each home address contains a pointer to the record with the same address

Chaining with a separate overflow area : move all overflow records to a separate overflow area

Scatter tables : Hash file contains only pointers to records (like indexing)

11.8 Other Collision Resolution Techniques

Page 36: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 36

Overflow FileOverflow File When building the file, if a collision occurs, place the

new synonym into a separate area of the file called the overflow section

This method is not recommended; if there is a high load factor,

there will either be overflow from this overflow section

or it will be organized sequentially and performance suffers; if there is a low load factor, much space is wasted

11.8 Other Collision Resolution Techniques

Page 37: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 37

Linear Probing(1)Linear Probing(1) When a synonym is identified, search forward from the

address given by the hash function (the natural address) until an empty slot is located, and store this record there

This is an example of open addressing (examining a predictable sequence of slots for an empty one)

11.8 Other Collision Resolution Techniques

Page 38: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 38

Linear Probing(2)Linear Probing(2)A S E A R C H I N G E X A M P L E

S

A

A

C

A

E

E

G

H

I

X

E

L

M

N

P

R

key : hash : 1 0 5 1 18 3 8 9 14 7 5 5 1 13 16 12 5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

inse

rtion

se

quen

ce

Memory SpaceA

AA C

EE E G H I

E E G H I X

11 .

8 O

t he r

Col

lisio

n R

esol

u tio

n Te

chni

ques

Page 39: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 39

Rehashing(1) Rehashing(1) Another form of open addressing In linear probing, if synonym occurred, incremented r by 1

and searched next location In rehashing, use a second hash function for the

displacement: D = (FOLD(key) mod P) + 1, where P < N is another prime number

This method has the advantage of avoiding congestion, because each synonym under the first hash function likely uses a different displacement D, and this examines a different sequence of slots

11.8 Other Collision Resolution Techniques

Page 40: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 40

RehashingRehashing(where P=3)(where P=3)(2)(2)A S E A R C H I N G E X A M P L Ekey :

hash : 1 0 5 1 18 3 8 9 14 7 5 5 1 13 16 12 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

S

A

C

E

G

H

I

L

M

N

P

R

inse

rtion

se

quen

ce

Memory SpaceA

A

EE

GH

A

A A

EE

E

N X

H

11 .

8 O

t her

Col

lisio

n R

eso l

u tio

n Te

chn i

que s

Page 41: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 41

Chaining without Replacement(1)Chaining without Replacement(1) Uses pointers to build linked lists within the file; each linked list contains each set of synonyms; the head of the list is the record at the natural address of

these synonyms When a record is added whose natural address is

occupied, it is added to the list whose head is at the natural address

Linear probing usually used to find where to put a new synonym, although rehashing could be used as well

11.8 Other Collision Resolution Techniques

Page 42: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 42

Chaining without Replacement(2)Chaining without Replacement(2) A problem is that linked lists can coalesce: Supp

ose that H(R1) = H(R2) = H(R4) = i and H(R3) = H(R5) = i+1, and records are added in the order R1, R2, R3, R4, R5.

Then the lists for natural addresses i and i+1 coalesce. Periodic reorganization shortens such lists.

Let FWD and BWD be forward and backward pointers alon

g these chains (doubly-linked)

11.8 Other Collision Resolution Techniques

Page 43: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 43

Chaining without Replacement(3)Chaining without Replacement(3)

R1R2R3R4R5

HH

ii+1

11.8 Other Collision Resolution Techniques

Page 44: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 44

Chaining with Replacement(4)Chaining with Replacement(4) Eliminates problem with deletion that caused abandonment to be

necessary Further reduces search lengths When inserting a new record, if the slot at the natural address is occupied by a record for which i

t is not the natural address, then record is relocated so the new record may be replaced at its

natural address Synonym chains can never coalesce, so a record can be deleted even if is the head of a chain, simply by moving the second record on the chain to its natural ad

dress ( ABANDON thus is no longer necessary )

11.8 Other Collision Resolution Techniques

Page 45: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 45

Chaining with Replacement(5)Chaining with Replacement(5)

R1R2 H

R1R3 HR2 H

ii+1

R1R3R2

ii+1

R4R5

after R1, R2 added after R3 hashes to i+1 finally

HH

2 chains : R1 - R2 -R4 R3 - R5

11.8 Other Collision Resolution Techniques

Page 46: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 46

Patterns of Record AccessPatterns of Record Access Pareto Principle ( 80/20 Rule of Thumb): 80 % of the acc

esses are performed on 20 % of the records! The concepts of “the Vital Few and the Trivial Many” 20 % of the fisherman catch 80 % of the fish 20 % of the burglars steal 80 % of the loot

If we know the patterns of record access ahead, we can do many intelligent and effective things!

Sometimes we can know or guess the access patterns Very useful hints for file syetms or DBMSs Intelligent placement of records

fast accesses less collisions

Page 47: File Structure SNU-OOPSLA Lab 1 Chap11. Hashing Chap11. Hashing 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 교수 김 형 주 File Strutures by

File Structure SNU-OOPSLA Lab 47

Let’s Review !!!Let’s Review !!!11.1 Introduction11.2 A Simple Hashing Algorithm11.3 Hashing Functions and Record Distribution11.4 How Much Extra Memory Should Be Used?11.5 Collision Resolution by Progressive Overflow11.6 Storing More Than One Record per Address : Buckets11.7 Making Deletions11.8 Other Collision Resolution Techniques11.9 Patterns of Record Access