48
7-1 Chapter 7 Searching

7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

Embed Size (px)

Citation preview

Page 1: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-1

Chapter 7

Searching

Page 2: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-2

name no. age

record 1 BB 6 16

record 2 CC 9 16

record 3 AA 8 18

record 4 DD 2 17

table(file)

key internal key, embedded key

和整個 record 在一起

6 1

9 2

8 3

2 4

BB 16

CC 16

AA 18

DD

17

agenameno.

1

2

3

4

external key

另外自成一個 table, 並有 pointer

Page 3: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-3

Terminologies of searching

primary key: unique secondary key: may not be unique internal search: data stored in main memory

external search: data stored in auxiliary memory

retrieval: a successful search

a search and insertion algorithm:retrieve the data if a successful search

insert the data if an unsuccessful search

Page 4: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-4

Abstract data type typedef KEYTYPE ... // a type of key typedef RECTYPE ... // a type of record RECTYPE nullrec = ... // a "null" record KEYTYPE keyfunct(r) RECTYPE r; {... }; abstract typedef [rectype] TABLE (RECTYPE); abstract member(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondition if (there exists an r in tbl such tha

t keyfunct(r) == k) then member = TRUE else MEMBER = FALSE

Page 5: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-5

abstract RECTYPE search(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondiction (not member(tbl, k)

&& (search == nullrec) || (member(tbl, k)

&& keyfunct(search) == k);

abstract insert(tbl, k) TABLE(RECTYPE) tbl; RECTYPE r; precondition member(tbl, keyfunct(R) == FALSE postcondition inset(tbl, r); (tbl - [r]) == tbl';

abstract delete(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondition tbl == (tbl' - [search(tbl, k)]);

Page 6: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-6

Sequential search (linear search)

Applied to an array or a linked list Data are not sorted.

e.g. 9 5 6 8 7 2(1) search 6: successful(2) search 4: unsuccessful(3) delete 6: 9 5 2 8 7(4) insert 4: 9 5 2 8 7 4

time complexity:successful search: comparisons = O(n)

unsuccessful search: n comparisons = O(n)2

1n

Page 7: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-7

algorithm: for (i = 0; i < n; i++) if (key == k[i]) return(i); return(-1);

sentinel: an extra key inserted at the end of the array

k[n] = key; for (i = 0; key != k[i]; i++) ; if (i < n) return(i); else return(-1);

Sequential search with C

Page 8: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-8

Move-to-front method Let p(i) be the probability that record i is retrieved. p(0)+ p(1)+ ... + p(n-1) = 1. average number of comparisons: p(0) + 2p(1) + 3p(2) + ... + np(n-1)

This number is minimized if p(0) ≧ p(1) ≧ p(2) ≧ ... ≧ p(n-1). move-to-front method

e.g. 9 5 6 8 7 2 (1) search 6: 6 9 5 8 7 2 (2) search 8: 8 6 9 5 7 2

The retrieved record is moved to the head of the list

Page 9: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-9

Transposition methode.g. 9 5 6 8 7 2

(1) search 6: 9 6 5 8 7 2 (2) search 8: 9 6 8 5 7 2 The retrieved record is interchanged with the

preceding record. The transposition method is more efficient in an

unchanging probability distribution The move-to-front method is better for a small to

medium number of requests and for quickly changing probability distribution.

Mixed method: – use the move-to-front method for the first s

searches, then use the transposition method.

Page 10: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-10

Searching in an ordered table

linear searching: comparisons(sequential) (average) (successful or unsuccessful)

8

73

132

231

321

480

589

592

650

651

732

789

833

876

Key Record

2

n

Page 11: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-11

Indexed sequential search (1)

321

592

876

8

73

132

231

321

480

589

592

650

651

732

789

833

876

index pointer

Key Record Indexed sequential file:

sorted

Page 12: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-12

Indexed sequential search (2)

The use of an index is applicable to a sorted table stored as an array or a linked list.

Deletion: by a flag Insertion:

1) shift some elements if there exist some deleted entries. (Pointers need be changed in the index file)

2) keep an overflow area

Page 13: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-13

A secondary index

591

742

321

485

591

647

706

742

(Key)

(Record)

321

485

591

647

706

742

Secondary index

Primary index

Sequential table

Page 14: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-14

Binary searche.g. 2 5 6 7 8 9search 7: needs 3 comparisons Time complexity: O(logn) used only if the table is sorted and stored in an

array. An insertion or a deletion requires O(n) time. Improvement:

two arrays, one for flags, the other for the sorted keys and some "empty holes".

f e e f f e f f f

A * * D F * G I Kdata

flage: empty

f: full

Page 15: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-15

Binary search tree

inorder traversal: 2 5 6 7 8 9

The binary search uses a sorted array as an implicit binary search tree. (The middle element of the array is the root.)

6

2

5

8

7 9

Page 16: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-16

Insertion in a binary search tree

6

2

5

8

7 9

4

Insert 4 The inserted key is added to the tree as its

leaf node.

6

2

5

8

7 9

Page 17: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-17

Case 1: The deleted node has no sons. Delete it directly.

8

11

14

13

1512

9

10

3

1 5

6

7

8

11

14

13

12

9

10

3

1 5

6

7

Deleting node with key 15.

Deletion in a binary search tree (1)

Page 18: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-18

Case 2: The deleted node has only one subtree. Delete it and move the subtree up.

8

11

14

13

1512

9

10

3

1 5

6

7

8

11

14

13

1512

9

10

3

1 6

7

Deleting node with key 5.

Deletion in a binary search tree (2)

Page 19: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-19

Case 3: The deleted node has two subtrees. Its inorder successor s takes its place. The right son of s takes the place of s. (s has no left son.)

8

11

14

13

1512

9

10

3

1 5

6

7

8

12

14

1513

9

10

3

1 5

6

7

Deleting node with key 11.

Deletion in a binary search tree (3)

Page 20: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-20

Asymmetric deletion: replaced by inorder successors Symmetric deletion: replaced by inorder predecessors and successo

rs alternately.

Average search time in a binary search tree: O(logn)

Deletion in a binary search tree (4)

Page 21: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-21

Optimum binary search trees

e.g. sorted data: 2 3 5 7 some binary search trees:

In an optimum binary search tree, the expected

number of comparisons is minimized under a

given set of keys and probabilities.

2

3

5

7

3

5

7

2

3

7

5

2

5

7

3

2

Page 22: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-22

e.g.k2

k3k1

p2

q1q0

p1 p3

q2 q3

pi: probability for successful searchqi: probability for unsuccessful search

expected number of comparisons:2p1 + p2 + 2p3 + 2q0 + 2q1 + 2q2 + 2q3

e.g. k3

k2

k1

expected number of comparisons:2p1 + 3p2 + p3 + 2q0 + 3q1 + 3q2 + q3

Page 23: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-23

Construction of (near) optimum search trees (1)

(1) Balancing methode.g.

key(data) 1 2 3 4 5 6 7frequencies of 2 10 3 1 4 8 9

successful searchpartial sum 2 12 15 16 20 28 37

Select i as the root such that the difference of the costs on the left and the right is minimized.

The binary search tree can be constructed recursively.

Time complexity: O(n)

frequency

5

7

4

2

31 6

16 17

2 4

1

8

Page 24: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-24

Construction of (near) optimum search trees (2)

24

75

31

1 4 5 6

node keysplit key

(2) Median split treee.g.

key(data) 1 2 3 4 5 6 7frequencies 2 10 3 1 4 8 9

The most frequent key is stored in the root.

The split key is the median of all remaining keys.

The binary search tree can be constructed recursively.

The tree is a balanced tree. Time complexity: O(nlogn)

How to search?

Page 25: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-25

Balanced binary tree (AVL tree)

The heights of the two subtrees of every node never differ by more than 1.balance = (height of left subtree) – (height of right subtree) Each node in a balanced binary tree has a balance of 1, -1, or 0.

A balanced binary tree:

1

0

0 0

0

0

-1

0 0

00

0

1

0 0

0

-1

Page 26: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-26

Rotations of a binary tree

The inorder traversal is the sameafter a rotation is performed.

(a) Original tree (b) Right rotation

B

A D

FC

E G

D

B F

GECA

p

q

r

(c) Left rotation

F

D G

EB

A C

left rotation:q = right(p)r = left(q)left(q) = pright(p) = r

Page 27: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-27

Insertion of an AVL tree (1)

1

0 Tree T3Height = n

T1H = n

T2H = n

Newly inserted

node

C

A

Case 1: Node C is the first unbalanced node traced up from the newly inserted node.

0

0

T3H = n

T1H = n

T2H = n

Newly inserted

node

A

C

The height of the subtree is not changed after the new insertion.

right rotation on the subtree rooted at C

Page 28: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-28

2

2 T4H = n

T1H = n

T3H = n-1

Newly inserted

node

C

B

T2H = n-1

0A

1

0

C

A

0B

T4H = n

T1H = n

T3H = n-1

T2H = n-1

Newly inserted

node

First rotation:left rotation on the subtree rooted at A

Case 2:

Insertion of an AVL tree (2)

Page 29: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-29

0

0

T4H = nT1

H = n

T3H = n-1

Newly inserted

node

B

A

T2H = n-1

-1C

Second rotation:right rotation on the subtree rooted at C

The height of the subtree is not changed after the new insertion.

Insertion requires at most 2 rotations.

Deletion is more complex, it requires O(logn) rotations.

Page 30: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-30

Multiway search trees

A multiway search tree of order n: at most n subtrees at most n-1 keys in a node

12 50 85

60 70 80 100 120 1506 10 37

62 65 69 11037

A

B C D E

F G H

Page 31: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-31

B-trees

B-tree of order m:

≦ # of keys in a nonroot node ≦ m-1

1 ≦ # of keys in the root node ≦ m-1

m-12

a B-tree of order 5:

320 540

430 480

451 472380 395 406 412 493 506 511

(a) Initial portion of a B-tree

Page 32: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-32

451 472 493 506 511

(b) After inserting 382

380 382 406 412

395 430 480

451 472

(c) After inserting 518 and 508

380 382 406 412

395 430 480 508

493 506 511 518

Page 33: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-33

a B-tree of order 4:

152 186 194

87 140

90 100 10623 61 74

(a) An initial B-tree twig

152 186 19423 61 74

(b) Inserting 102 with a left bias

97 102 140

90 100 106

Page 34: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-34

152 186 19423 61 74

(c) Inserting 102 with a right bias

87 100 140

102 10690

Page 35: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-35

Deletion in multiway search trees

(1) The simplest methoda) Mark a deleted key, do not remove it.b) disadvantage

Waste space In a nonleaf node, only the same key can reuse t

he "deleted" space.(2) A technique similar to binary search trees use

d in an unrestricted multiway search treea) If the key has an empty left or right subtree,

remove it. If it is the only one key in the node, remove the node.

b) Otherwise, its successor takes its place. (The successor has an empty left subtree.)

Page 36: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-36

(i) Shift a key from its father and its brother (borrow)

80 120 150

126 135 142

80 126 150

90 120 135 142

90 113A B

A B

Delete key 113

Deletion in B-trees

Page 37: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-37

(ii) Take a key from its father and combine with its brother

80 126 150

68 73 90 126 135 142

90 120 B

B

68 73 135 142

80 150

Delete key 120and

consolidate

Page 38: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-38

(iii) do (ii), then do (i) for its father

60 170

80 15030 50

187 202173 178153 16287 9665 72

180 220 280

A B C D E

60 180

150 17030 50

187 202173 178153 16272 80 87 96

220 280

A B C D E

Delete 65, consolidate and borrow

Page 39: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-39

(iv) do (ii), then do (ii) for its father.

60 180 300

153 162 173 178 187 202

220 280150 17030 50

A B C D E F

G

60 300

153 162 170 178 187 202

150 180 220 28030 50

A B C D E F

G

Deleting 173 and a double consolidation

Page 40: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-40

Deletion in B-trees

This may be done up to the root.If the root has more than one key

=> no problem.If the root has only one key

=> remove the root. Insertion, deletion or searching in a B-tree requires O(logn) time, where n denotes the nu

mber of nodes in the B-tree.

Page 41: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-41

B+-tree

All keys are maintained in leaf nodes and keys are also replicated in nonleaf nodes.

Finding the next record: O(1) time

Page 42: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-42

Digital search treeKeys1801851867195207217217421749217493226272742782792796281284285286287288294307768

1

8

eokeok

650

9

5

eok

eok

7

end of key

Page 43: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-43

2

4

0 1

eok

eokeok

8

2

6

eok

eok

9

eok6

eok

7 7

3

eok

7

60

73

4

9

eok

eok

8

eok

eok eok

7

7

eok

4

eok

6 8

eok

5eok

eok

1

eok

8

4

9

eok

Page 44: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-44

Trie

(1) This is one kind of digital search trees.(2) Each node contains exactly m pointers.

(Some of them are null.) e.g. m=10 for numerical data.

(3) It is useful when the set of keys is dense.

Page 45: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-45

Hashing hash function:

to transforming a key into a table index

e.g. data: 18 23 33 13 24 10

hash function:h(k) = k mod 10

hash collision: Two records (keys) attempt to insert into the same position.

0 10

1

2

3 23

4 33

5 13

6 24

7

8 18

9

Page 46: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-46

Resolution of hash collision

(1) open addressing (rehashing) a) linear probing: to place the collided record in the next available position in the array b) rehashing function:

...

Page 47: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-47

(2) chaining

0

1

2

3 null

4 null

5

6

7

8 null

9

91 null

42 192 372 null

130 null

75 null

49 null

66

87

16 null

67 227

417 null

40

k r next

Page 48: 7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-48

Issues of hashing

How to choose a hash function?the division method:h(key) = key mod mIt is best that the table size m is prime.

Advantage of hashing: faster than binary search

Disadvantage of hashing:1.need more memory.2.to delete a record is difficult.