21
Chapter 9 - 1 Chapter 9 The B + Tree Family and Indexed Sequential File Access Chapter 9 - 1 TABLE OF CONTENTS Indexed Sequential Access Maintaining a Sequence Set Adding a Simple Index to the Sequence Set Separators Instead of Keys The Simple Prefix B + Tree Simple Prefix B + Tree Maintenance Index Set Block Size Internal Structure of Index Set Blocks Loading a Simple Prefix B + Tree B + Tree Perspective ( B / B + / Simple Prefix B + Tree )

Chapter 9 The B Tree Family and Indexed Sequential File Access …ynucc.yu.ac.kr/~hrcho/Courses/FS/Chapter09.pdf · Chapter 9 - 1 ÷ ö O F f j ¶ j æ b Ú Chapter 9 The B+ Tree

Embed Size (px)

Citation preview

Chapter 9 - 1

÷�öOF fj¶�j� æbÚ

Chapter 9

The B+ Tree Family and

Indexed Sequential File Access

Chapter 9 - 1÷�öOF fj¶�j� æbÚ

TABLE OF CONTENTS

● Indexed Sequential Access● Maintaining a Sequence Set● Adding a Simple Index to the Sequence Set● Separators Instead of Keys● The Simple Prefix B+ Tree● Simple Prefix B+ Tree Maintenance● Index Set Block Size● Internal Structure of Index Set Blocks● Loading a Simple Prefix B+ Tree● B+ Tree● Perspective ( B / B+ / Simple Prefix B+ Tree )

Chapter 9 - 2

Chapter 9 - 2÷�öOF fj¶�j� æbÚ

1. Indexed Sequential Access

z ��

✔ Indexed Access� Sequential Access�����

✔ File ��

– Indexed Part ( Random Access �� )– Sequential Part ( Batch Processing �� )

z �

✔ Student record systems at universities✔ Credit card systems✔ Banking systems

Chapter 9 - 3÷�öOF fj¶�j� æbÚ

2. Maintaining a Sequence Set

z Sorted file���

✔ Record insertion/deletion/update ���

✔ �� file ��� I/O overhead

2.1 The Use of Blocks

z Basic Idea✔ Restrict the effects of insertion or deletion✔ Collect the records into blocks✔ � Block � linked list��� ( �� 9.1 )✔ Sequence Set

Chapter 9 - 3

Chapter 9 - 4÷�öOF fj¶�j� æbÚ

z �� ( �� 9.1 )✔ Overflow: ��� page �� & Link ��

✔ Underflow: Redistribution or Concatenation

z ������

✔ Internal Fragmentation✔ No Clustering✔ Sorting (O) → Binary Search (X)

Chapter 9 - 5÷�öOF fj¶�j� æbÚ

ADAMS . . . BAIRD . . . BIXBY . . . BOONE . . .

BYNUM . . . CARSON . . . COLE . . . DAVIS . . .

DENVER . . . ELLIS . . .

ADAMS . . . BAIRD . . . BIXBY . . . BOONE . . .

BYNUM . . . CARSON . . . CARTER . . .

DENVER . . . ELLIS . . .

COLE . . . DAVIS . . .

Block 1

Block 2

Block 3

Block 1

Block 2

Block 3

Block 4

(a)

(b)

Chapter 9 - 4

Chapter 9 - 6÷�öOF fj¶�j� æbÚ

ADAMS . . . BAIRD . . . BIXBY . . . BOONE . . .

BYNUM . . . CARSON . . . CARTER . . .

COLE . . . DENVER . . . ELLIS . . .

Block 1

Block 2

Block 3

Block 4

(c)

Availablefor reuse

FIGURE 9.1 Block splitting and concatenation due to insertions anddeletions in the sequence set. (a) Initial blocked sequence set. (b) Sequenceset after insertion of CARTER record – block 2 splits, and the contents aredivided between blocks 2 and 4. (c) Sequence set after deletion of DAVISrecord – block 4 is less than half full, so it is concatenation with block 3.

Chapter 9 - 7÷�öOF fj¶�j� æbÚ

2.2 Choice of Block Size

z � Block size��� file ��������?

z ����

1) ���� block ����� RAM �����

– Node Splitting or Concatenating– Two-to-three splitting ���?

2) ��� block access ������ I/O ���

– Block size = 1 cluster !

Chapter 9 - 5

Chapter 9 - 8÷�öOF fj¶�j� æbÚ

3. Adding a Simple Index to Sequence Set

z ����

✔ ��� key ��� random access ��

✔ Index : � block ���� key �� (�� 9.3)✔ Index � RAM ������ (Simple Index)

– Binary search ��

– Efficient index update ��

✔ Index ������, ������� (B+ Tree)– B-Tree index +– A Sequence that holds the actual records

�����?

Chapter 9 - 9÷�öOF fj¶�j� æbÚ

ADAMS-BERNE BOLEN-CAGE CAMP-DUTTON EMBRY-EVANS FABER-FOLK FOLKS-GADDIS

1 2 3 4 5 6

FIGURE 9.2 Sequence of blocks showing the range of keys in each block.

FIGURE 9.3 Simple index for the sequence set illustrated in Fig. 9.2.

Key Block number

BERNE 1CAGE 2DUTTON 3EVANS 4FOLK 5GADDIS 6

Chapter 9 - 6

Chapter 9 - 10÷�öOF fj¶�j� æbÚ

4. Separators Instead of Keys

z B+ Tree�� (non-leaf) index key���

✔ Leaf node ����� separator��� (�� 9.4)✔ ������������ ← B-Tree�����

z Optimization✔ ������� string separator���

✔ �� : Fan-out ��

find_sep ( key1, key2, sep )Char *key1, *key2, *sep ;{

while ( ( *sep++ = *key2++ ) = = *key1++ ) ;*sep = ‘ \0 ’ ;

}

Chapter 9 - 11÷�öOF fj¶�j� æbÚ

ADAMS-BERNE BOLEN-CAGE CAMP-DUTTON EMBRY-EVANS FABER-FOLK FOLKS-GADDIS

1 2 3 4 5 6

Separator: BO CAM E F FOLKS

FIGURE 9.4 Separator between blocks in the sequence set.

FIGURE 9.5 A list of potential separator

CAMP-DUTTON EMBRY-EVANS

DUTUDVXGHESJFDZEEBQXELEEMOSYNARY3 4

Chapter 9 - 7

Chapter 9 - 12÷�öOF fj¶�j� æbÚ

5. The Simple Prefix B+ Tree

z �� ( �� 9.8 )✔ Index set + Sequence set✔ Simple prefix : Index set contains

– shortest separator, or– prefixes of the keys

Chapter 9 - 13÷�öOF fj¶�j� æbÚ

ADAMS-BERNE BOLEN-CAGE CAMP-DUTTON EMBRY-EVANS FABER-FOLK FOLKS-GADDIS

1 2 3 4 5 6

BO CAM F FOLKS

E

FIGURE 9.8 A B-tree index set for the sequence set, forming a simpleprefix B+ tree.

Indexset

Chapter 9 - 8

Chapter 9 - 14÷�öOF fj¶�j� æbÚ

6. Simple Prefix B+ Tree Maintenance

z �

✔ �� 9.8�� “EMBRY” ���

– Sequence Set ��� ( Index ��� )✔ �� 9.8�� “FOLKS” ���

– Sequence Set ��

– Index ���� separator �� ← �����?✔ �� 9.9� “EATON” ��

– Sequence Set ��

– Index set �� separator ���?

6.1 Changes Localized to Single Blocks in a Sequence Set

Chapter 9 - 15÷�öOF fj¶�j� æbÚ

ADAMS-BERNE BOLEN-CAGE CAMP-DUTTON ERVIN-EVANS FABER-FOLK FROST-GADDIS

1 2 3 4 5 6

BO CAM F FOLKS

E

FIGURE 9.9 The deletion of the EMBRY and FOLKS records from thesequence set leaves the index set unchanged.

Chapter 9 - 9

Chapter 9 - 16÷�öOF fj¶�j� æbÚ

6.2 Changes Involving Multiple Blocks in a Sequence Set

z B-Tree insertion/deletion����

✔ �� 9.9 �� block 1� splitting ��

– �� 9.10✔ �� 9.10�� block 2� underflow ��

– Block 2 & 3 � concatenation– �� 9.11

Chapter 9 - 17÷�öOF fj¶�j� æbÚ

ADAMS-AVERY

BOLEN-CAGE

CAMP-DUTTON

EMBRY-EVANS

FABER-FOLK

FOLKS-GADDIS

1 2 3 4 5 6

BO E

F FOLKSAY

AYERS-BERNE

7

CAM

FIGURE 9.10 An insertion into block 1 causes a split and the consequent addition of block 7. The addition of a block in the sequence set requires a new separator in the index set. Insertion of the AY separator into the node containing BO and CAM causes a node splitin the index set B-tree and consequent promotion of BO to the root.

Chapter 9 - 10

Chapter 9 - 18÷�öOF fj¶�j� æbÚ

ADAMS-AVERY AYERS-BERNE BOLEN-DUTTON ERVIN-EVANS FABER-FOLK FROST-GADDIS

1 7 2 4 5 6

AY BO F FOLKS

E

FIGURE 9.11 A deletion from block 2 causes underflow and the consequent concatenationof block 2 and 3. After the concatenation, block 3 is no longer needed and can be placed on an avail list . Consequently, the separator CAM is no longer needed. Removing CAM from its node in the index set forces a concatenation of index set nodes, bringing BO back down from the root.

Chapter 9 - 19÷�öOF fj¶�j� æbÚ

z Index Set ����

✔ Sequence set� block� split ��, ��� separator � index set�������.

✔ Sequence set� block �� concatenation ��, ��

separator� index set��������.✔ Sequence set� block� ��� record � redistribute ��, index set� separator��������.

z Index set� sequence set���������

��������?

Chapter 9 - 11

Chapter 9 - 20÷�öOF fj¶�j� æbÚ

7. Index Set Block Size

z Index block� Sequence block�������

���

✔ Sequence set� block ������ =Index set� block ������

✔ Virtual simple prefix B+ Tree� �����

✔ Index set� Sequence set��� file�����

– ��: ��� file��� seek time ��

– ��: �� key��� index������?

Chapter 9 - 21÷�öOF fj¶�j� æbÚ

8. Internal Structure of Index Set Blocks

z ��

✔ Variable-length separator ��

– Fixed-length separator���?✔ ��� binary search ��

z ��

✔ Index page� separator, separator��� index, ��

����� pointer ����

Chapter 9 - 12

Chapter 9 - 22÷�öOF fj¶�j� æbÚ

z ����� (�� 9.12 ~ �� 9.14)✔ Separator counter: binary search ���

✔ Total length of separators: index � ����

✔ Separators: �� separator� concatenation✔ Index: �� separator��� offset✔ Pointer: relative block number

z ����

✔ ����������������.✔ B+ Tree� order���

– ����� separator ���

– Split/Concatenate/Redistribute ����

Chapter 9 - 23÷�öOF fj¶�j� æbÚ

AsBaBroCChCraDeleEdiErrFaFle 00 02 04 07 08 10 13 17 20 23 25

Concatenatedseparator

Index to separators

FIGURE 9.12 Variable-length separators and corresponding index.

Chapter 9 - 13

Chapter 9 - 24÷�öOF fj¶�j� æbÚ

11 28 AsBaBroCChCraDeleEdiErrFaFle 00 02 04 07 08 10 13 17 20 23 25 B00 B01 B02 B03 B04 B05 B06 B07 B08 B09 B10 B11

Total length of separators

Separator count

Separators Index toseparators

Relative block numbers

FIGURE 9.13 Structure of an index set block.

B00 As B01 Ba B02 Bro B03 C B04 Ch B05 Cra B06 Dele B07 Edi B08 Err B09 Fa B10 Fle B11

Separatorsubscript : 0 1 2 3 4 5 6 7 8 9 10

FIGURE 9.14 Conceptual relationship of separators and relative block numbers.

Chapter 9 - 25÷�öOF fj¶�j� æbÚ

• Data Page

� Variable Length Record ��

� Normal Record��� : Maximum Size = 1 Page

Record(0) Record(1) Record(2) • • • Record(cnt-1)

PtrRID(cnt-1) • • • PtrRID(1) PtrRID(0) ThisPage

first free byte FileID RIDcnt PrevPage NextPage

Free Area

• Record Header

Type(Moved, Not Moved, New Home)

Kind

(Normal, Slice, Crumb) Length

Chapter 9 - 14

Chapter 9 - 26÷�öOF fj¶�j� æbÚ

• Long Data Item

� Long Data Item = Directory + Slice + Crumb

# of bytes # of segments RID 1, Length 1 • • • RID n Length n

• RID ( Record ID )

Volume ID Page Address Slot

Chapter 9 - 27÷�öOF fj¶�j� æbÚ

• Index Structure

Index

Description

Structure

Scan

Etc

B-Tree Hash

• Prefix B-Tree • Extendible Hash

• Root, Internal, Leaf

• Boolean Expression• Range Scan

• Secondary Index: Managing Long Data Item forLong RID list

• Root, Leaf

• Boolean Expression• Search Key (Matching)

• Root: Hash Table with ShortPID

Chapter 9 - 15

Chapter 9 - 28÷�öOF fj¶�j� æbÚ

•Root & Internal Node

ææ Header Information : < Key Type, Key Length, Offset into Record >ææ Header Information 6 Root Node ÆÂj¢

Header Information

PID 2, Key 2 • • •

PID 1, Key 1

PID n, Key n

PID n+1 Free Space

offset n • • • offset 2

Control Informationoffset 1

Chapter 9 - 29÷�öOF fj¶�j� æbÚ

•Leaf Node

� Data Entry : < Key, Count, RID list >� Control Information

– File ID, Page Identifiers ( ThisPage, Next, previous )– Count Information ( # of Free Bytes, # of Entries )– Miscellaneous Information

( type of page, uniqueness of index, … )

Data Entry 1 Data Entry 2 • • •

• • • • • • Data Entry n

Free Space

offset n • • • offset 2

offset 1 Control Information

Chapter 9 - 16

Chapter 9 - 30÷�öOF fj¶�j� æbÚ

9. Loading a Simple Prefix B+ Tree

z B+ Tree����� 2���

✔ N�� B+ Tree insertion �����

– Searching overhead– Splitting overhead

✔ � level leaf �� bottom up���� (�)

z Example✔ �� 9.15 ~ �� 9.17✔ ������� underflow ����

✔ Example�������?

Chapter 9 - 31÷�öOF fj¶�j� æbÚ

ALWASPBET 00 03 06

ACCESS-ALSO ALWAYS-ASK ASPECT-BEST BETTER-CAST CATCH-CHECK

Nextsequenceset block :

FIGURE 9.15 Formation of the first index set block as the sequence set is loaded.

Next separator : CAT

Chapter 9 - 17

Chapter 9 - 32÷�öOF fj¶�j� æbÚ

ACCESS-ALSO ALWAYS-ASK ASPECT-BEST CATCH-CHECKBETTER-CAST

ALWASPBET 00 03 06 -1 -1 -1

CAT 00 -1 -1 Index blockcontaining noseparators

FIGURE 9.16 Simultaneous building of two index set levels as the sequenceset continues to grow.

Chapter 9 - 33÷�öOF fj¶�j� æbÚ

ACCESS-ALSO ALWAYS-ASK ASPECT-BEST BETTER-CAST

CATCH-CHECK CLASS-COPY COST-DAMAGE DELETE-DISK

DRUM-EDITOR EFFORT-GROW HEAD-IDEAL IGNORE-ITEM

• • •

• • •

ALWASPBET 00 03 06 CLCOSDE 00 02 05 EFHIG 00 02 03

CATDR 00 03 -1

• • •

• • •

FIGURE 9.17 Continued growth of index set built up from the sequence set.

Chapter 9 - 18

Chapter 9 - 34÷�öOF fj¶�j� æbÚ

z “� level leaf�� bottom up����” – ��

✔ B+ Tree ������

– ������������

– Searchin /splitting overhead ����

✔ ��� fill factor ����

✔ Index set� sequence set�� physical proximity (×)

Chapter 9 - 35÷�öOF fj¶�j� æbÚ

10. B+ Trees

z Simple Prefix B+ Tree�����

✔ B+ Tree���, separator� actual key � copy✔ �� 9.15 ↔ �� 9.18

ALWAYSASPECTBETTER 00 06 12

ACCESS-ALSO ALWAYS-ASK ASPECT-BEST BETTER-CAST CATCH-CHECK

Nextsequenceset block :

Next separator : CATCH

FIGURE 9.18 Formation of the first index set block in a B+ tree withoutthe use of shortest separators.

Chapter 9 - 19

Chapter 9 - 36÷�öOF fj¶�j� æbÚ

z B+ Tree��, ��

✔ ��: ���� Fan-out������.✔ ��: ���

– ���� key ����

– Compression�������.

Chapter 9 - 37÷�öOF fj¶�j� æbÚ

11. B-Tree Family � ��

z B-Tree Family��������� ��.✔ ���

✔ Index ��� ��, simple index ��

✔ Random access��� ����, hashing ��

z B-Tree Family � ��

✔ Paged index structure✔ Maintain height-balanced trees✔ The trees grow from the bottom up.✔ Greater storage efficiency ( 1-to-2, 2-to-3 )✔ Virtual tree structures ����

✔ Variable-length record ����

Chapter 9 - 20

Chapter 9 - 38÷�öOF fj¶�j� æbÚ

z B-Tree✔ (key, info)�����������

– Leaf node���� �����

✔ Less space than B+ Tree– But, data record��� pointer �

– Separator < Key ← Tree depth�����?✔ In-order traversal��� sequential search ��

– Data� index���������? (NO!)

Chapter 9 - 39÷�öOF fj¶�j� æbÚ

z B+ Tree✔ Leaf node�� linked list �����

– Efficient sequential access ��

– Range query ����

✔ Tree Order ��

– Internal node� data� �� pointer ��

✔ Deletion ����������

✔ Leaf ��� Non-leaf ���������

– (Key, Child /Data pointer)– B-Tree: (Key, Child pointer, Data pointer)

Chapter 9 - 21

Chapter 9 - 40÷�öOF fj¶�j� æbÚ

z Simple Prefix B+ Tree✔ B+ Tree���

✔ Separator��� < Key���

– Internal node� fan-out ��

– Tree� depth �� & Space utilization ��