7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record

7-1

Chapter 7

Searching

7-2

name no. age

record 1 BB 6 16

record 2 CC 9 16

record 3 AA 8 18

record 4 DD 2 17

table(file)

key internal key, embedded key

和整個 record 在一起

6 1

9 2

8 3

2 4

BB 16

CC 16

AA 18

DD

17

agenameno.

1

2

3

4

external key

另外自成一個 table, 並有 pointer

7-3

Terminologies of searching

primary key: unique secondary key: may not be unique internal search: data stored in main memory

external search: data stored in auxiliary memory

retrieval: a successful search

a search and insertion algorithm:retrieve the data if a successful search

insert the data if an unsuccessful search

7-4

Abstract data type typedef KEYTYPE ... // a type of key typedef RECTYPE ... // a type of record RECTYPE nullrec = ... // a "null" record KEYTYPE keyfunct(r) RECTYPE r; {... }; abstract typedef [rectype] TABLE (RECTYPE); abstract member(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondition if (there exists an r in tbl such tha

t keyfunct(r) == k) then member = TRUE else MEMBER = FALSE

7-5

abstract RECTYPE search(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondiction (not member(tbl, k)

&& (search == nullrec) || (member(tbl, k)

&& keyfunct(search) == k);

abstract insert(tbl, k) TABLE(RECTYPE) tbl; RECTYPE r; precondition member(tbl, keyfunct(R) == FALSE postcondition inset(tbl, r); (tbl - [r]) == tbl';

abstract delete(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondition tbl == (tbl' - [search(tbl, k)]);

7-6

Sequential search (linear search)

Applied to an array or a linked list Data are not sorted.

e.g. 9 5 6 8 7 2(1) search 6: successful(2) search 4: unsuccessful(3) delete 6: 9 5 2 8 7(4) insert 4: 9 5 2 8 7 4

time complexity:successful search: comparisons = O(n)

unsuccessful search: n comparisons = O(n)2

1n

7-7

algorithm: for (i = 0; i < n; i++) if (key == k[i]) return(i); return(-1);

sentinel: an extra key inserted at the end of the array

k[n] = key; for (i = 0; key != k[i]; i++) ; if (i < n) return(i); else return(-1);

Sequential search with C

7-8

Move-to-front method Let p(i) be the probability that record i is retrieved. p(0)+ p(1)+ ... + p(n-1) = 1. average number of comparisons: p(0) + 2p(1) + 3p(2) + ... + np(n-1)

This number is minimized if p(0) ≧ p(1) ≧ p(2) ≧ ... ≧ p(n-1). move-to-front method

e.g. 9 5 6 8 7 2 (1) search 6: 6 9 5 8 7 2 (2) search 8: 8 6 9 5 7 2

The retrieved record is moved to the head of the list

7-9

Transposition methode.g. 9 5 6 8 7 2

(1) search 6: 9 6 5 8 7 2 (2) search 8: 9 6 8 5 7 2 The retrieved record is interchanged with the

preceding record. The transposition method is more efficient in an

unchanging probability distribution The move-to-front method is better for a small to

medium number of requests and for quickly changing probability distribution.

Mixed method: – use the move-to-front method for the first s

searches, then use the transposition method.

7-10

Searching in an ordered table

linear searching: comparisons(sequential) (average) (successful or unsuccessful)

8

73

132

231

321

480

589

592

650

651

732

789

833

876

Key Record

2

n

7-11

Indexed sequential search (1)

321

592

876

8

73

132

231

321

480

589

592

650

651

732

789

833

876

index pointer

Key Record Indexed sequential file:

sorted

7-12

Indexed sequential search (2)

The use of an index is applicable to a sorted table stored as an array or a linked list.

Deletion: by a flag Insertion:

1) shift some elements if there exist some deleted entries. (Pointers need be changed in the index file)

2) keep an overflow area

7-13

A secondary index

591

742

321

485

591

647

706

742

(Key)

(Record)

321

485

591

647

706

742

Secondary index

Primary index

Sequential table

7-14

Binary searche.g. 2 5 6 7 8 9search 7: needs 3 comparisons Time complexity: O(logn) used only if the table is sorted and stored in an

array. An insertion or a deletion requires O(n) time. Improvement:

two arrays, one for flags, the other for the sorted keys and some "empty holes".

f e e f f e f f f

A * * D F * G I Kdata

flage: empty

f: full

7-15

Binary search tree

inorder traversal: 2 5 6 7 8 9

The binary search uses a sorted array as an implicit binary search tree. (The middle element of the array is the root.)

6

2

5

8

7 9

7-16

Insertion in a binary search tree

6

2

5

8

7 9

4

Insert 4 The inserted key is added to the tree as its

leaf node.

6

2

5

8

7 9

7-17

Case 1: The deleted node has no sons. Delete it directly.

8

11

14

13

1512

9

10

3

1 5

6

7

8

11

14

13

12

9

10

3

1 5

6

7

Deleting node with key 15.

Deletion in a binary search tree (1)

7-18

Case 2: The deleted node has only one subtree. Delete it and move the subtree up.

8

11

14

13

1512

9

10

3

1 5

6

7

8

11

14

13

1512

9

10

3

1 6

7



7-19

Case 3: The deleted node has two subtrees. Its inorder successor s takes its place. The right son of s takes the place of s. (s has no left son.)

8

11

14

13

1512

9

10

3

1 5

6

7

8

12

14

1513

9

10

3

1 5

6

7



7-20

Asymmetric deletion: replaced by inorder successors Symmetric deletion: replaced by inorder predecessors and successo

rs alternately.

Average search time in a binary search tree: O(logn)


7-21

Optimum binary search trees

e.g. sorted data: 2 3 5 7 some binary search trees:

In an optimum binary search tree, the expected

number of comparisons is minimized under a

given set of keys and probabilities.

2

3

5

7

3

5

7

2

3

7

5

2

5

7

3

2

7-22

e.g.k2

k3k1

p2

q1q0

p1 p3

q2 q3

pi: probability for successful searchqi: probability for unsuccessful search

expected number of comparisons:2p1 + p2 + 2p3 + 2q0 + 2q1 + 2q2 + 2q3

e.g. k3

k2

k1

expected number of comparisons:2p1 + 3p2 + p3 + 2q0 + 3q1 + 3q2 + q3

7-23

Construction of (near) optimum search trees (1)

(1) Balancing methode.g.

key(data) 1 2 3 4 5 6 7frequencies of 2 10 3 1 4 8 9

successful searchpartial sum 2 12 15 16 20 28 37

Select i as the root such that the difference of the costs on the left and the right is minimized.

The binary search tree can be constructed recursively.

Time complexity: O(n)

frequency

5

7

4

2

31 6

16 17

2 4

1

8

7-24

Construction of (near) optimum search trees (2)

24

75

31

1 4 5 6

node keysplit key

(2) Median split treee.g.

key(data) 1 2 3 4 5 6 7frequencies 2 10 3 1 4 8 9

The most frequent key is stored in the root.

The split key is the median of all remaining keys.

The binary search tree can be constructed recursively.

The tree is a balanced tree. Time complexity: O(nlogn)

How to search?

7-25

Balanced binary tree (AVL tree)

The heights of the two subtrees of every node never differ by more than 1.balance = (height of left subtree) – (height of right subtree) Each node in a balanced binary tree has a balance of 1, -1, or 0.

A balanced binary tree:

1

0

0 0

0

0

-1

0 0

00

0

1

0 0

0

-1

7-26

Rotations of a binary tree

The inorder traversal is the sameafter a rotation is performed.

(a) Original tree (b) Right rotation

B

A D

FC

E G

D

B F

GECA

p

q

r

(c) Left rotation

F

D G

EB

A C

left rotation:q = right(p)r = left(q)left(q) = pright(p) = r

7-27

Insertion of an AVL tree (1)

1

0 Tree T3Height = n

T1H = n

T2H = n

Newly inserted

node

C

A

Case 1: Node C is the first unbalanced node traced up from the newly inserted node.

0

0

T3H = n

T1H = n

T2H = n

Newly inserted

node

A

C

The height of the subtree is not changed after the new insertion.

right rotation on the subtree rooted at C

7-28

2

2 T4H = n

T1H = n

T3H = n-1

Newly inserted

node

C

B

T2H = n-1

0A

1

0

C

A

0B

T4H = n

T1H = n

T3H = n-1

T2H = n-1

Newly inserted

node

First rotation:left rotation on the subtree rooted at A

Case 2:

Insertion of an AVL tree (2)

7-29

0

0

T4H = nT1

H = n

T3H = n-1

Newly inserted

node

B

A

T2H = n-1

-1C

Second rotation:right rotation on the subtree rooted at C

The height of the subtree is not changed after the new insertion.

Insertion requires at most 2 rotations.

Deletion is more complex, it requires O(logn) rotations.

7-30

Multiway search trees

A multiway search tree of order n: at most n subtrees at most n-1 keys in a node

12 50 85

60 70 80 100 120 1506 10 37

62 65 69 11037

A

B C D E

F G H

7-31

B-trees

B-tree of order m:

≦ # of keys in a nonroot node ≦ m-1

1 ≦ # of keys in the root node ≦ m-1

m-12

a B-tree of order 5:

320 540

430 480

451 472380 395 406 412 493 506 511

(a) Initial portion of a B-tree

7-32

451 472 493 506 511

(b) After inserting 382

380 382 406 412

395 430 480

451 472

(c) After inserting 518 and 508

380 382 406 412

395 430 480 508

493 506 511 518

7-33

a B-tree of order 4:

152 186 194

87 140

90 100 10623 61 74

(a) An initial B-tree twig

152 186 19423 61 74

(b) Inserting 102 with a left bias

97 102 140

90 100 106

7-34

152 186 19423 61 74

(c) Inserting 102 with a right bias

87 100 140

102 10690

7-35

Deletion in multiway search trees

(1) The simplest methoda) Mark a deleted key, do not remove it.b) disadvantage

Waste space In a nonleaf node, only the same key can reuse t

he "deleted" space.(2) A technique similar to binary search trees use

d in an unrestricted multiway search treea) If the key has an empty left or right subtree,

remove it. If it is the only one key in the node, remove the node.

b) Otherwise, its successor takes its place. (The successor has an empty left subtree.)

7-36

(i) Shift a key from its father and its brother (borrow)

80 120 150

126 135 142

80 126 150

90 120 135 142

90 113A B

A B

Delete key 113

Deletion in B-trees

7-37

(ii) Take a key from its father and combine with its brother

80 126 150

68 73 90 126 135 142

90 120 B

B

68 73 135 142

80 150

Delete key 120and

consolidate

7-38

(iii) do (ii), then do (i) for its father

60 170

80 15030 50

187 202173 178153 16287 9665 72

180 220 280

A B C D E

60 180

150 17030 50

187 202173 178153 16272 80 87 96

220 280

A B C D E

Delete 65, consolidate and borrow

7-39

(iv) do (ii), then do (ii) for its father.

60 180 300

153 162 173 178 187 202

220 280150 17030 50

A B C D E F

G

60 300

153 162 170 178 187 202

150 180 220 28030 50

A B C D E F

G

Deleting 173 and a double consolidation

7-40

Deletion in B-trees

This may be done up to the root.If the root has more than one key

=> no problem.If the root has only one key

=> remove the root. Insertion, deletion or searching in a B-tree requires O(logn) time, where n denotes the nu

mber of nodes in the B-tree.

7-41

B+-tree

All keys are maintained in leaf nodes and keys are also replicated in nonleaf nodes.

Finding the next record: O(1) time

7-42

Digital search treeKeys1801851867195207217217421749217493226272742782792796281284285286287288294307768

1

8

eokeok

650

9

5

eok

eok

7

end of key

7-43

2

4

0 1

eok

eokeok

8

2

6

eok

eok

9

eok6

eok

7 7

3

eok

7

60

73

4

9

eok

eok

8

eok

eok eok

7

7

eok

4

eok

6 8

eok

5eok

eok

1

eok

8

4

9

eok

7-44

Trie

(1) This is one kind of digital search trees.(2) Each node contains exactly m pointers.

(Some of them are null.) e.g. m=10 for numerical data.

(3) It is useful when the set of keys is dense.

7-45

Hashing hash function:

to transforming a key into a table index

e.g. data: 18 23 33 13 24 10

hash function:h(k) = k mod 10

hash collision: Two records (keys) attempt to insert into the same position.

0 10

1

2

3 23

4 33

5 13

6 24

7

8 18

9

7-46

Resolution of hash collision

(1) open addressing (rehashing) a) linear probing: to place the collided record in the next available position in the array b) rehashing function:

...

7-47

(2) chaining

0

1

2

3 null

4 null

5

6

7

8 null

9

91 null

42 192 372 null

130 null

75 null

49 null

66

87

16 null

67 227

417 null

40

k r next

7-48

Issues of hashing

How to choose a hash function?the division method:h(key) = key mod mIt is best that the table size m is prime.

Advantage of hashing: faster than binary search

Disadvantage of hashing:1.need more memory.2.to delete a record is difficult.

Documents

7-1 Chapter 7 Searching. 7-2 nameno.age record 1BB616 record 2CC916 record 3AA818 record 4DD217 table(file) key internal key, embedded key 和整個 record