Upload
bernice-higgins
View
230
Download
1
Embed Size (px)
Citation preview
7-1
Chapter 7
Searching
7-2
name no. age
record 1 BB 6 16
record 2 CC 9 16
record 3 AA 8 18
record 4 DD 2 17
table(file)
key internal key, embedded key
和整個 record 在一起
6 1
9 2
8 3
2 4
BB 16
CC 16
AA 18
DD
17
agenameno.
1
2
3
4
external key
另外自成一個 table, 並有 pointer
7-3
Terminologies of searching
primary key: unique secondary key: may not be unique internal search: data stored in main memory
external search: data stored in auxiliary memory
retrieval: a successful search
a search and insertion algorithm:retrieve the data if a successful search
insert the data if an unsuccessful search
7-4
Abstract data type typedef KEYTYPE ... // a type of key typedef RECTYPE ... // a type of record RECTYPE nullrec = ... // a "null" record KEYTYPE keyfunct(r) RECTYPE r; {... }; abstract typedef [rectype] TABLE (RECTYPE); abstract member(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondition if (there exists an r in tbl such tha
t keyfunct(r) == k) then member = TRUE else MEMBER = FALSE
7-5
abstract RECTYPE search(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondiction (not member(tbl, k)
&& (search == nullrec) || (member(tbl, k)
&& keyfunct(search) == k);
abstract insert(tbl, k) TABLE(RECTYPE) tbl; RECTYPE r; precondition member(tbl, keyfunct(R) == FALSE postcondition inset(tbl, r); (tbl - [r]) == tbl';
abstract delete(tbl, k) TABLE(RECTYPE) tbl; KEYTYPE k; postcondition tbl == (tbl' - [search(tbl, k)]);
7-6
Sequential search (linear search)
Applied to an array or a linked list Data are not sorted.
e.g. 9 5 6 8 7 2(1) search 6: successful(2) search 4: unsuccessful(3) delete 6: 9 5 2 8 7(4) insert 4: 9 5 2 8 7 4
time complexity:successful search: comparisons = O(n)
unsuccessful search: n comparisons = O(n)2
1n
7-7
algorithm: for (i = 0; i < n; i++) if (key == k[i]) return(i); return(-1);
sentinel: an extra key inserted at the end of the array
k[n] = key; for (i = 0; key != k[i]; i++) ; if (i < n) return(i); else return(-1);
Sequential search with C
7-8
Move-to-front method Let p(i) be the probability that record i is retrieved. p(0)+ p(1)+ ... + p(n-1) = 1. average number of comparisons: p(0) + 2p(1) + 3p(2) + ... + np(n-1)
This number is minimized if p(0) ≧ p(1) ≧ p(2) ≧ ... ≧ p(n-1). move-to-front method
e.g. 9 5 6 8 7 2 (1) search 6: 6 9 5 8 7 2 (2) search 8: 8 6 9 5 7 2
The retrieved record is moved to the head of the list
7-9
Transposition methode.g. 9 5 6 8 7 2
(1) search 6: 9 6 5 8 7 2 (2) search 8: 9 6 8 5 7 2 The retrieved record is interchanged with the
preceding record. The transposition method is more efficient in an
unchanging probability distribution The move-to-front method is better for a small to
medium number of requests and for quickly changing probability distribution.
Mixed method: – use the move-to-front method for the first s
searches, then use the transposition method.
7-10
Searching in an ordered table
linear searching: comparisons(sequential) (average) (successful or unsuccessful)
8
73
132
231
321
480
589
592
650
651
732
789
833
876
Key Record
2
n
7-11
Indexed sequential search (1)
321
592
876
8
73
132
231
321
480
589
592
650
651
732
789
833
876
index pointer
Key Record Indexed sequential file:
sorted
7-12
Indexed sequential search (2)
The use of an index is applicable to a sorted table stored as an array or a linked list.
Deletion: by a flag Insertion:
1) shift some elements if there exist some deleted entries. (Pointers need be changed in the index file)
2) keep an overflow area
7-13
A secondary index
591
742
321
485
591
647
706
742
(Key)
(Record)
321
485
591
647
706
742
Secondary index
Primary index
Sequential table
7-14
Binary searche.g. 2 5 6 7 8 9search 7: needs 3 comparisons Time complexity: O(logn) used only if the table is sorted and stored in an
array. An insertion or a deletion requires O(n) time. Improvement:
two arrays, one for flags, the other for the sorted keys and some "empty holes".
f e e f f e f f f
A * * D F * G I Kdata
flage: empty
f: full
7-15
Binary search tree
inorder traversal: 2 5 6 7 8 9
The binary search uses a sorted array as an implicit binary search tree. (The middle element of the array is the root.)
6
2
5
8
7 9
7-16
Insertion in a binary search tree
6
2
5
8
7 9
4
Insert 4 The inserted key is added to the tree as its
leaf node.
6
2
5
8
7 9
7-17
Case 1: The deleted node has no sons. Delete it directly.
8
11
14
13
1512
9
10
3
1 5
6
7
8
11
14
13
12
9
10
3
1 5
6
7
Deleting node with key 15.
Deletion in a binary search tree (1)
7-18
Case 2: The deleted node has only one subtree. Delete it and move the subtree up.
8
11
14
13
1512
9
10
3
1 5
6
7
8
11
14
13
1512
9
10
3
1 6
7
Deleting node with key 5.
Deletion in a binary search tree (2)
7-19
Case 3: The deleted node has two subtrees. Its inorder successor s takes its place. The right son of s takes the place of s. (s has no left son.)
8
11
14
13
1512
9
10
3
1 5
6
7
8
12
14
1513
9
10
3
1 5
6
7
Deleting node with key 11.
Deletion in a binary search tree (3)
7-20
Asymmetric deletion: replaced by inorder successors Symmetric deletion: replaced by inorder predecessors and successo
rs alternately.
Average search time in a binary search tree: O(logn)
Deletion in a binary search tree (4)
7-21
Optimum binary search trees
e.g. sorted data: 2 3 5 7 some binary search trees:
In an optimum binary search tree, the expected
number of comparisons is minimized under a
given set of keys and probabilities.
2
3
5
7
3
5
7
2
3
7
5
2
5
7
3
2
7-22
e.g.k2
k3k1
p2
q1q0
p1 p3
q2 q3
pi: probability for successful searchqi: probability for unsuccessful search
expected number of comparisons:2p1 + p2 + 2p3 + 2q0 + 2q1 + 2q2 + 2q3
e.g. k3
k2
k1
expected number of comparisons:2p1 + 3p2 + p3 + 2q0 + 3q1 + 3q2 + q3
7-23
Construction of (near) optimum search trees (1)
(1) Balancing methode.g.
key(data) 1 2 3 4 5 6 7frequencies of 2 10 3 1 4 8 9
successful searchpartial sum 2 12 15 16 20 28 37
Select i as the root such that the difference of the costs on the left and the right is minimized.
The binary search tree can be constructed recursively.
Time complexity: O(n)
frequency
5
7
4
2
31 6
16 17
2 4
1
8
7-24
Construction of (near) optimum search trees (2)
24
75
31
1 4 5 6
node keysplit key
(2) Median split treee.g.
key(data) 1 2 3 4 5 6 7frequencies 2 10 3 1 4 8 9
The most frequent key is stored in the root.
The split key is the median of all remaining keys.
The binary search tree can be constructed recursively.
The tree is a balanced tree. Time complexity: O(nlogn)
How to search?
7-25
Balanced binary tree (AVL tree)
The heights of the two subtrees of every node never differ by more than 1.balance = (height of left subtree) – (height of right subtree) Each node in a balanced binary tree has a balance of 1, -1, or 0.
A balanced binary tree:
1
0
0 0
0
0
-1
0 0
00
0
1
0 0
0
-1
7-26
Rotations of a binary tree
The inorder traversal is the sameafter a rotation is performed.
(a) Original tree (b) Right rotation
B
A D
FC
E G
D
B F
GECA
p
q
r
(c) Left rotation
F
D G
EB
A C
left rotation:q = right(p)r = left(q)left(q) = pright(p) = r
7-27
Insertion of an AVL tree (1)
1
0 Tree T3Height = n
T1H = n
T2H = n
Newly inserted
node
C
A
Case 1: Node C is the first unbalanced node traced up from the newly inserted node.
0
0
T3H = n
T1H = n
T2H = n
Newly inserted
node
A
C
The height of the subtree is not changed after the new insertion.
right rotation on the subtree rooted at C
7-28
2
2 T4H = n
T1H = n
T3H = n-1
Newly inserted
node
C
B
T2H = n-1
0A
1
0
C
A
0B
T4H = n
T1H = n
T3H = n-1
T2H = n-1
Newly inserted
node
First rotation:left rotation on the subtree rooted at A
Case 2:
Insertion of an AVL tree (2)
7-29
0
0
T4H = nT1
H = n
T3H = n-1
Newly inserted
node
B
A
T2H = n-1
-1C
Second rotation:right rotation on the subtree rooted at C
The height of the subtree is not changed after the new insertion.
Insertion requires at most 2 rotations.
Deletion is more complex, it requires O(logn) rotations.
7-30
Multiway search trees
A multiway search tree of order n: at most n subtrees at most n-1 keys in a node
12 50 85
60 70 80 100 120 1506 10 37
62 65 69 11037
A
B C D E
F G H
7-31
B-trees
B-tree of order m:
≦ # of keys in a nonroot node ≦ m-1
1 ≦ # of keys in the root node ≦ m-1
m-12
a B-tree of order 5:
320 540
430 480
451 472380 395 406 412 493 506 511
(a) Initial portion of a B-tree
7-32
451 472 493 506 511
(b) After inserting 382
380 382 406 412
395 430 480
451 472
(c) After inserting 518 and 508
380 382 406 412
395 430 480 508
493 506 511 518
7-33
a B-tree of order 4:
152 186 194
87 140
90 100 10623 61 74
(a) An initial B-tree twig
152 186 19423 61 74
(b) Inserting 102 with a left bias
97 102 140
90 100 106
7-34
152 186 19423 61 74
(c) Inserting 102 with a right bias
87 100 140
102 10690
7-35
Deletion in multiway search trees
(1) The simplest methoda) Mark a deleted key, do not remove it.b) disadvantage
Waste space In a nonleaf node, only the same key can reuse t
he "deleted" space.(2) A technique similar to binary search trees use
d in an unrestricted multiway search treea) If the key has an empty left or right subtree,
remove it. If it is the only one key in the node, remove the node.
b) Otherwise, its successor takes its place. (The successor has an empty left subtree.)
7-36
(i) Shift a key from its father and its brother (borrow)
80 120 150
126 135 142
80 126 150
90 120 135 142
90 113A B
A B
Delete key 113
Deletion in B-trees
7-37
(ii) Take a key from its father and combine with its brother
80 126 150
68 73 90 126 135 142
90 120 B
B
68 73 135 142
80 150
Delete key 120and
consolidate
7-38
(iii) do (ii), then do (i) for its father
60 170
80 15030 50
187 202173 178153 16287 9665 72
180 220 280
A B C D E
60 180
150 17030 50
187 202173 178153 16272 80 87 96
220 280
A B C D E
Delete 65, consolidate and borrow
7-39
(iv) do (ii), then do (ii) for its father.
60 180 300
153 162 173 178 187 202
220 280150 17030 50
A B C D E F
G
60 300
153 162 170 178 187 202
150 180 220 28030 50
A B C D E F
G
Deleting 173 and a double consolidation
7-40
Deletion in B-trees
This may be done up to the root.If the root has more than one key
=> no problem.If the root has only one key
=> remove the root. Insertion, deletion or searching in a B-tree requires O(logn) time, where n denotes the nu
mber of nodes in the B-tree.
7-41
B+-tree
All keys are maintained in leaf nodes and keys are also replicated in nonleaf nodes.
Finding the next record: O(1) time
7-42
Digital search treeKeys1801851867195207217217421749217493226272742782792796281284285286287288294307768
1
8
eokeok
650
9
5
eok
eok
7
end of key
7-43
2
4
0 1
eok
eokeok
8
2
6
eok
eok
9
eok6
eok
7 7
3
eok
7
60
73
4
9
eok
eok
8
eok
eok eok
7
7
eok
4
eok
6 8
eok
5eok
eok
1
eok
8
4
9
eok
7-44
Trie
(1) This is one kind of digital search trees.(2) Each node contains exactly m pointers.
(Some of them are null.) e.g. m=10 for numerical data.
(3) It is useful when the set of keys is dense.
7-45
Hashing hash function:
to transforming a key into a table index
e.g. data: 18 23 33 13 24 10
hash function:h(k) = k mod 10
hash collision: Two records (keys) attempt to insert into the same position.
0 10
1
2
3 23
4 33
5 13
6 24
7
8 18
9
7-46
Resolution of hash collision
(1) open addressing (rehashing) a) linear probing: to place the collided record in the next available position in the array b) rehashing function:
...
7-47
(2) chaining
0
1
2
3 null
4 null
5
6
7
8 null
9
91 null
42 192 372 null
130 null
75 null
49 null
66
87
16 null
67 227
417 null
40
k r next
7-48
Issues of hashing
How to choose a hash function?the division method:h(key) = key mod mIt is best that the table size m is prime.
Advantage of hashing: faster than binary search
Disadvantage of hashing:1.need more memory.2.to delete a record is difficult.