35
Ders Notları #1 (Disk Organization & Performance) «Alim-i mursid, koyun olmalı; kuş olmamalı. Koyun, kuzusuna süt; kuş yavrusuna kay verir.» 1 File Organization (Dosya Düzenleme)

01-Disk

Embed Size (px)

DESCRIPTION

Computer

Citation preview

  • Ders Notlar #1

    (Disk Organization & Performance)

    Alim-i mursid, koyun olmal; ku olmamal.

    Koyun, kuzusuna st; ku yavrusuna kay verir.

    1

    File Organization

    (Dosya Dzenleme)

  • Textbook & References

    2

    File Structures, Michael J. Folk, Bill Zoellick, Greg Riccardi,

    An OO Approach with C++, Addison Wesley,1998

    Database Design &Implementation, Edward Sciore, John

    Wiley, 2009

  • Definitions-1

    3

    File Structure is a combination of representations for data in files and of operations for accessing the data on disk.

    Data structures: deal with data in the main memory

    File structures: deal with the data in the secondary storage

    Main operations on file structure : search, add, remove, update, sort (external sorting) , merge

    Main Metrics: simplicity, complexity, scalability, programmability, and maintainability

    Structures for static files vs. dynamic files differs a lot.

    Structures differ according to the media as well.

    A rough History Access Methods:

    Sequential Search

    simple index search (before 1960 )

    tree structures

    ( BST-1960s, AVL-1963, B-tree (B+-tree) 1970s

    Simple Hashing (before 1980)

    Dynamic Hashing (after 1980)

    Based on

    usage characteristics of data data type (basic vs. multi-dim data) physical characteristics of machine

  • Definitions-2

    4

    Physical File is the particular collections of bytes stored in disk.

    Logical file is the view of physical file from the standpoint of application program.

    There are thousands of physical files on disk, but the program can have only 20 logical files.

    OS make the connections between physical file and logical files.

    Example:

    int fd = open (filename, flags[,mode]);

    Read/Write: First OS make the connection then read/write with using logical descriptor.

    Physical Devices as Files. The program access the file without knowing whether the file comes from disk, tape, another computer, keyboard(stdin), screen (stdout)... % list.exe > myoutput % prog1 | prog2 % list | sort

  • MAIN GOAL

    5

    Increase reliability while increasing the speed at the lowest cost

    Data is inevitably scattered over disk pages in real world.

    To decrese the speed, we have to minimize the number of access by

    clustering (temporal & locality) as far as possible

    Getting e/t you need at one acess.

    The physical characteristics of the hardware together with data structures and the algorithms are used to predict the efficiency of file operations.

    Now we will study physical characteristics of the hardware...

  • Storage Hierarchy

    Cache

    Main Memory volatile media

    USB Flash Storage Non-volatile media

    Secondary Storage

    Non-volatile media

    1G-16GB, 200 cycle (10 nanosec.)

    6

    1M, 15 cycle

    80G, 10 milisec.

    Cost/unit increase

    (~40 times expensive)

    Capacity decrease

    Reliability increase

    Machine

    instructions

    CD-RW

    DVD-RW

    Magnetic Tape

    storage

    100.000

    times slower:

    (like 10sec

    vs. 10days)

    Floppy Disk

    Serial access

  • technologies used in storage

    Cache memory: static RAM (SRAM)

    Primary memory: dynamic RAM (DRAM)

    Secondary memory: HDD (Hard Disk Drive): magnetically store and R/W by magnetic arm

    SSD (Solid state Disk): electronic disk

    CD_ROM (Compact Disk-Read Only Memory): store optically, read by laser

    WORM (Write Once Read Many)

    DVD (Digital Video Disk): store optically, read by laser. (smaller wavelength and laser type)

    BD (Blue-Ray Disk): uses blue-ray technology

    7

  • 8

    USB Flash memory: EEPROM (NAND-type)

    100 times faster than HDD, 100 times expensive than HDD

    Wears out! Wear-level technique to lessen the rewrite limit problem.

    Magnetic Tapes:

    cheap storage for archieval purposes

    Only Sequential acces

    Charactericstics of 9-track tape

    Tape density: bpi = Bpi (6250 ~ 30.000)

    Tape speed: ips (30 ~ 200)

    Gap size: 0,3 ~ 0,75 inch (i)

    technologies used in storage

    Data block Gap (R/W durma/kalkma iin braklan alan)

    track frame

    9-track tape

  • Tape length How much tape is needed to store 1 million 100-B records if we have tape

    with 6250 bpi and 0,3 inch-size gaps?

    g: length of gap size

    n: # of data blocks

    b: length of data block

    s= n * (b+g)

    Blocking factor (bf) = # of records / block

    bf=1b=100/6250 Bpi = 0,016 i s= 1 million * (0,3+0,016) =26.333 feet

    bf=50b=100*50/6250 Bpi = 0,8 i s= (1 million/50) * (0,3+0,8) = 1833 feet

    Effective recording density (erd)

    =#of Bytes /data_block /length required to store data block

    bf=1 100 /0,316 = 316,4 bpi

    bf=50 100*50 / 1,1 = 4545,45 bpi 9

  • Tape data trasmission rate Nominal trans. rate = Tape density (bpi) * Tape speed (ips)

    Effective trans. rate = effective recording density (erd) * Tape

    speed (ips)

    If we have 200 ips tape, determine the nominal and effective

    trans rates?

    Nominal trans. rate= 6250 Bpi * 200 ips= 1250 KBps

    bfr = 1 effective trans. rate = 316,4 Bpi * 200 = 63,3 KBps

    bfr = 50 effective trans. rate = 4545,45 bpi * 200 = 909 KBps

    10

  • Basic Organization of DISK (hard or floopy):

    11

    Since there is only 1 datapath to the

    computer, only 1 read/write head can

    be active at a time.

  • Disk Organizations: Sector,Cluster

    12

    It has to do with abstraction. Improves disk access by decreasing ts.

    Sector the fix-length, smallest addressable portion of a disk. Typically 512 bytes- 4K. Cylinder-Head-Sector addressing (physical CHS addressing) to access the

    sector, then bring it into memory (buffer) OS does not use CHS addressing, rather it uses LBS (logical block

    addresing) which orders all sectors from 0 to the last sector. If needed, a firmware (bios) on disk converts LBS adress to CHS address

    Cluster is the smallest unit of space that can be allocated to a file by OS. OS views the file as a series of clusters. A cluster has a fix number of contiguous sectors.

    FAT in File Manager (a module in OS) ties physical sectors to the logical clusters by using FAT.

    (A) track

    (B) geometrical sector

    (C) track sector

    (D) cluster

  • Disk performance metrics-1

    13

    Note: B = byte

    1KB=1024B (210B), 1MB=1024KB (220B), 1GB=1024MB (230B)

    Capacity= C

    seek time (ts), rotation speed (tr), transfer rate (tt)

    C = (# of platters) * (# of tracks/platter) * (# of B/track) example: 80GB,160GB

    ts = the time it takes for the actuator to move the disk head from its current location to requested track, ex: tsmin=0, tsmax=15-20msec,

    tsave.= (1/3)*tsmax= 5ms. The slowest part of total cost.

    tr = the time spent to move the head over the requested sector.

    Ex: 10000rpm 6msec is the full rotation time. in average ~1/2 of

    full rotation,

    tt = (# of bytes transfered / # of bytes on a track) * rotation time the speed at which bytes pass by the disk head, to be transfered

    to/from memory.

    TOTAL ACCESS TIME= ts+tr+tt

    transfer_rate: B/msec (Sample value= 100MB/sec)

  • Example1

    14

    10.000 rotateperminute disk

    Bytes / sector= 512

    Sectors / track = 170

    Tracks / cylinder = 16

    Ave. Seek time= 8 msec

    Ave. Rotational delay= 3 msec

    transfer rate= 1/6 * (512*170)= 14500 bytes / msec

    Transfer time for a single sector? = (6/170)msec

    10.000rpm

    tsave= 5 msec

    Transfer rate=83MB/sec

    Transfer time for 1B= 1/83MB = 0,000012 msec

    Transfer time for 1MB = 0,012 msec

    Average access to 1 B= 5 + 3 + 0,000012 = 8,000012 msec

    Average access to 1 MB = 5 + 3 +0,012 msec= 8,012 msec

    That is why we transfer a sector for each access !...

    Example 2: in real disk we transfer at least a sector, why?

  • Internal / external fragmentation

    15

    fragmentation means that something is broken into parts that are detached, isolated or incomplete.

    May occur at al level of organization. Sector, block, cluster, file...

    Since the sector size is fix, usually there is no convenient fit between records and sectors. This leads to internal fragmentation within the sectors. Think similiar unused holes at different levels..

    Sector spaning is a simple solution for this problem. Disadvantage?

    aaaaa aabbb bbccc

    aaaaa aa--- bbbbb ccc--

    Disk may have lots of small-sized chunks of unallocated blocks, but no large

    chunks. Thus it may not be possible to allocate space for a large file, even though

    disk has plenty of free space. This is called external fragmentation.

  • Block-level interface(block,page )

    16

    Block is a sequence of bytes.

    Adv.: OS hides hardware details (like different sector sizes, different addressing) by block-level interface. OS

    maintains mapping b/w blocks and sectors.

    Blocking increase througput (successful data transfer rate).

    A Block size is at least 1 sector-size and determined by OS.. OS views the disk as a series of blocks. Block numbers start from 0.

    BF = # of records stored in each block.

    While Page is a block-sized area allocated in main memory.

    OS provides several methods to acces disk blocks.

    readblock(n,p): read data from block-n to page-p

    writeblock(n,p): write data in page-p to block-n.

    allocate(k,n): find k contigious available blocks as close to block-n as possible.

    deallocate(k,n): deallocate k contigious blocks as close to block-n as possible.

    OS keeps track of which blocks are available for allocation. 2 basic strategies exists:

    Disk map

    Free list

  • Block size:

    In terms of Prefered size Application

    Block contention Small OLAP

    Random row

    access speed

    Small OLAP

    Sequential row

    access speed

    Large Desicion support

    Data warehouse

    17

    Block contention increase with larger blocks. Thus, OLAP applications/web

    search applications, which has higher random access prefers small-sized block.

    Desicion support/data warehouse applciations, which has higher sequential access

    prefers large-sized block.

  • Disk performance metrics-2 (Additional metrics)

    Block transfer time (btt)= (B/ transfer_rate)

    bulk_transfer_rate (btr) = rate of transfering useful bytes in the blocks (kullanl veri transfer hz)

    btr = B/(B+G) * transfer_rate

    (G is gap size)

    Bulk time to transfer of k consecutive blocks on the same

    cylinder?

    ts + tr + (k* (B/bulk_transfer_rate))

    18

  • File-level interface

    19

    Client views file a sequence of bytes. (No notion of block,

    sector..) . Client can directly access to a byte in the file..

    OS hides the details from client. For example, in the below code, blocks are accessed through pages. I/O buffers are allocated....

    f.seek() method performs 2 conversions:

    specified byte position logical block reference (simple)

    Logical block reference physical block reference (depends on file system implementation)

    How many disk access requires for

    f.read() and f.write()?

    Look at journey of a byte at the following slides

  • File implementation strategies

    20

    Contiguous Allocation: each file as a sequence of contiguous blocks. Simplest strategy. Both internal and external fragmentation

    Extent-based allocation: similiar to contiguous allocation. Reduces internal/external frag. by storing a file a sequence of fix-length extents. File is extended 1 extent at a time

    Indexed allocation: extend file 1 block at a time. Least possible amount of fragmentation. Keeps track of allocated blocks of the file with a special index block. We need multiple level of indexing for large files Ex: UNIX file system

  • 21

    Junk s 21th logical block 53.block (clustering)

    701.block (extent)

    Junk s 2 th logical block 16.block (indexing)

    File implementation strategies

    Extent-size: 8 blocks

  • Example #3 (effect of sector spanning )

    22

    512 bytes per sector 63 sectors per track

    16 tracks per cylinder 4092 cylinders

    Disk capacity? 512*63*16*4092

    We have a file with 50.000 fixed-length records. How many cylinders does the file requires if each data record requires 240 byte?

    In case of sector spaning:

    Cylinder capacity = 512*63*16 = 516,096 bytes

    File size = 50,000*240 = 12,000,000 bytes

    Number of cylinders required = 12,000,000/516,096 = 23.25

    In case of Internal fragmentation: (no sector spanning)

    File requires 25.000 sectors

    63*16 = 1008 sectors per cylinder

    25,000/1008 = 24.8 cylinders required.

    Analysis:

    Sector spanning has an adv., because it requires less space for the file.

    On the other, some records can be retrieved by accesing two sectors. This is the disadvantage.

    Observe fragmentation problem at different levels(like cluster..)

  • Example #4 ( file access performance)

    23

    256-byte

    .

    .

    34.000records

    Tracksize:170*512

    byte

    100 tracks (not

    contigious)

    8704 KB file Track-based

    access

    cluster-based

    access Track-based

    sequential

    access=

    (8msec+3msec+6

    msec)*100=1.7sec cluster-based random access=

    (8msec+3msec+(1/21.25) *6msec)*2125

    =23,97sec

    34.000*256B

    Cluster size=4096 B

    This is 4096 / (170*512) of track

    There are 2125 clusters

    10.000 rotateperminute disk, 512 B/sector, 170 Sectors / track, 16 Tracks / cylinder,

    tsave = 8 msec. Allocate a file having 34.000 records, each of which is 256-B-length.

    a.) with cluster size of 4 KB.

    b.) with a cluster size of 1track.

  • Example #5 (random sector access time) # of platters: 4

    8192 track/platter-surface

    256 sectors / track

    512 bytes / sector

    disk ap: 3.5 inches (1 inche: 2,54cm)

    Gaps take %10 of the track space.

    Rpm: 3840

    The head takes 1 msec for every 500 cylinder plus 1

    msec for start/stop.

    What is the best, worst and average random sector I/O=?

    min. = 0.05 msec

    max. = 33.05 msec

    ave. = 14.65 msec

    If a block is 4096 bytes, what is the best, worst and average random

    I/O=?

    min. I/O = transfer time = 0.5 msec

    24

  • Example#6 (file access performance) # of platters: 4

    8192 track/platter-surface

    256 sectors / track

    512 bytes / sector

    disk ap: 3.5 inches (1 inche: 2,54cm)

    Gaps take %10 of the track space.

    Rpm: 3840

    The head takes 1 msec for every 500 cylinder plus 1 msec

    for start/stop.

    25

    How much does it take to read 1 MB of data which is all

    stored in consecutive tracks?

    Answer: 138.8 msec

    If all sectors are scattered on the disk

    30.003,2 msec ( minutes)

    How much does it take to read 1000 MB of data which is all

    stored in consecutive cylinders?

    Answer: 126.005,8 ( 2 minutes and 6 sec)

    If all sectors are scattered on the disk

    30.003.200 msec ( 8 hours +20 minutes)

  • Example #6-devam (effect of disk access algorithms)

    Requested

    cylinder

    Arraival time Complete time

    (by fifo algoritm)

    Complete time

    (by elevator algoritm)

    1000 0 7.85 7.85 (1.)

    3000 0 20.7 20.7 (2.)

    7000 0 37.55 37.55 (3.)

    2000 20 56.6 77.5 (6.)

    8000 30 77.45 48.4 (4.)

    5000 40 92.3 63.25 (5.)

    26

  • A journey of Byte

    27

    Do not send the sector immediately to disk. Why?

    write (textfile, ch, 1)

    A system call to OS

    File Manager handles the request

    FM access the information of the textfile

    i.e about the physical location(cylinder,track..) of the file..

    FM uses FAT to locate the location

    of sector that is to contain the byte

    FM finds an available I/O buffer space then read

    the sector from disk into the system buffer in MM.

    Then write ch into the appropriate place in the sector in MM.

    FM give instruction to I/O processor where the byte is stored

    in the MM and where it need to be sent in the disk.

    I/O processor check if the drive is available and

    also buffers the chunks of proper size of disk.

    I/O processor sends data to disk cotroller

    controller instructs the drive to move the arm to the

    proper track and wait until the proper sector come

    under the arm and then sends the sector bit-by-bit.

    logical

    physical

    In which case do we need to send immediately?

    Write(append) the 1-B value in ch in program to textfile

  • I/O Processor / direct memory acces controller

    28

    I/O processor: handles the task of communicating disk, process independently from the main cpu.

    I/O processor (a special purpose device) take the commands from OS and communicates with disk controller.

    Once the buffer is full, I/O processor send the sectors bytes, one at a time, as soon as the controller is available.

    User prog.

    char c in

    data area

    File Mgr. in OS:

    char c in

    system buffer

    I/O processor/

    DMA controller

    Disk

    controller

    DISK

    1-) Move mode / locate mode: to eliminate data transfer OH.

    2-) Scatter input / gather output (vectored I/O): to eliminate 2-

    step process to scatter OH and useful data of block.

    1) 2)

  • Disk Controller DISK CONTROLLER CONTROLS THE DISK while hiding the details of disk access.

    Disk controller is an interface b/w computer and disk-drive. Transfers R-W request/from disk, controlling disk arm, provides reliability by applying checksums to sectors, remap the bad sectors.

    Disk controller moves the head to the correct position, correct track, correct sector for reading and writing.

    ATA(advanced tech. attachment)=IDE(integrated drive electronics)

    EX: 133 MB/s with ATA/133, 150 MB/S with SATA-serialATA, ATAPI

    2 IDE port on PC, each port can atmost access 2 disk, one master and the other is slave. SCSI (small computer system interface) a system bus standart coordinating many type of

    devices on a single bus. Provides a basement for RAID disks.

    Ex:max 16 devices in ULtra 320 SCSI.

    IDE SCSI

    Cost Cheap Expensive

    #of devices 2 16

    Maintainance Easy Hard

    Usage At home Business

    Speed 133MB/s 320 MB/s 29

  • Disk Bottleneck, Improving Disk Access Time

    30

    CPU rate (high performance network) is dramatically higher than Disk I/O transfer rate This causes disk bottleneck.

    Solution: (read 3.1.8)

    Multiprogramming (cpu works on other jobs while waiting for the data to arrive)

    Cylinders (2 tracks nearby ? at the same cylinders)

    Disk cache

    Parallelism (example: Disk Striping, RAID): This helps to achieve better reliability as well.

    Efficient use of RAM(buffering)

  • Disk Cache

    31

    Disk cache A kind of buffering! Block of memory set aside to contain blocks of data from disk. Disk cache is bundled with disk drive.

    Improves performance.

    When data is requested from secondary storage, the file manager looks into the disk cache to see if it contains the requested data.

    Compare the following access times: transfer a sector= ts +1/2 tr + sector rotation time

    transfer a track = ts + tr

    Which one do you prefer? Transfer sector or transfer track?

    The real value of disk cache is prefetching.

  • Disk Striping

    32

    Two 20GB drives are always faster than a single 40GB drive. Because

    simultaneous access to 2 sectors. 2 problems arises:

    Cost increases

    Load balancing is required for disks to be working as uniformly as possible.

    Using Two 20GB drives is efficient if both can be always kept busy.

    To increase efficiency, we have to balance the workload among the multiple

    disks..For balancing workload, Disk Striping can be used.

    Disk striping uses the disk controller to hide the smaller disks from the OS, giving it

    the illusion of large single disk.

    Striping distributes database among

    small drives equally.

  • Disk reliability, improving disk reliability

    33

    2 reasons to decrase reliability:

    Magnetic material can degenerate

    Head crash

    2 approaches to increase reliability. These tasks are governed by disk controller again.

    Mirroring

    Storing Parity (use a single disk to back up any # of other disks)

    2 problems:

    Note that thare are 4 disk accesess for a single sector write.

    More vulnerable to non-recoverable multi-disk failure

    High speed and robust

    Problem:

    Cost is high because High number of disks..

  • RAID

    34

    In high speed networks, Storage area Network (SAN) provides RAID (redundant array of independent disks) organization. RAID supports large data, provides reliability, resource sharing, performance improvement, disk striping.

    RAID-0 : only striping, no guard against disk failure

    RAID-1: mirrored striping

    RAID-2: byte striping, error-correcting codes instead of parity (difficult to implement, no longer used)

    RAID-3: byte striping and 1 parity disk

    RAID-4: sector striping and 1 parity disk.

    RAID-5: similiar to RAID-4, but parity information is distributed among disks.(Nth sector of each disk stores parity info.)

    RAID-6: similiar to RAID-5, store 2 kinds of parity info., thus needs another disk for additional parity info.

  • Buffer Management

    35

    Working with large chunks of data in MM so that Read data in memory multiple times (caching)

    the # of access to disk is reduced..

    Sysem buffer vs. user buffer

    Buffer manegement by OS: organizing >2 buffers including system buffers.. coordination uses some techniques such as Least Recently Used, FIFO, clock-replacement

    algorithm..

    How many buffers do we need?

    1: Even if we ( the program) transmit data in only one direction, 1 buffer causes problems like I/O bound processing..(CPU wants to be filling the buffer at the same time that I/O is being performed= Enabling I/O-CPU overlapping ONLY by using at least 2 buffers!! Fig.3.22 )

    2: At least 2 (one for input, the other for output): still similiar problems occur. Solution : Apply Multiple buffering strategy :Tradeoff: (as cost of memory decrease using

    many buffers is possible) the more buffers there are but the more complex management is required.

    1 buffer for each page

    transfer the buffer to the disk with 1 access when either

    The page is being replaced

    File is closed

    For data integrity purposes (Recovery management)

    Buffer pool with 4 buffers (pages)