Disk Scheduling Disk Scheduling (Cont.)lsc.fie.umich.mx/~juan/Materias/UofO/cis415/lectures/415disk.pdf · n Disk bandwidth is the total number of bytes transferred divided by time

1

Silberschatz, Galvin and Gagne ”200213.1Operating System Concepts

Chapter 14: Mass-Storage Systems

n Logical Disk Structuren Disk Schedulingn Disk Managementn RAID Structure


Logical Disk Structure

n Disk drives are addressed as large 1-dimensional arraysof logical blocks, where the logical block is the smallestunit of transfer.

n In the simplest arrangement, the 1-dimensional array oflogical blocks is mapped into the sectors of the disksequentially.F Sector 0 is the first sector of the first track on the outermost

cylinder.F Mapping proceeds in order through that track, then the rest

of the tracks in that cylinder, and then through the rest of thecylinders from outermost to innermost.


Disk Scheduling

n The operating system is responsible for using hardwareefficiently — for the disk drives, this means having a fastaccess time and high disk bandwidth.

n Access time has two major componentsF Seek time is the time for the disk are to move the heads to

the cylinder containing the desired sector.F Rotational latency is the additional time waiting for the disk

to rotate the desired sector to the disk head.n Minimize seek timen Seek time ª seek distancen Disk bandwidth is the total number of bytes transferred

divided by time to transfer the data


Disk Scheduling (Cont.)

n Several algorithms exist to schedule the servicing of diskI/O requests.

n We illustrate them with a request queue to access logicalblocks 0-199.

98, 183, 37, 122, 14, 124, 65, 67

Head pointer currently at block 53


FCFS

Illustration shows total head movement of 640 cylinders.


SSTF

n Selects the request with the minimum seek time from thecurrent head position.

n SSTF scheduling is a form of SJF scheduling; may causestarvation of some requests.

n Illustration shows total head movement of 236 cylinders.

2


SSTF (Cont.)


SCAN

n The disk arm starts at one end of the disk, and movestoward the other end, servicing requests until it gets to theother end of the disk, where the head movement isreversed and servicing continues.

n Sometimes called the elevator algorithm.n Illustration shows total head movement of 208 cylinders.


SCAN (Cont.)


C-SCAN

n Provides a more uniform wait time than SCAN.n The head moves from one end of the disk to the other.

servicing requests as it goes. When it reaches the otherend, however, it immediately returns to the beginning ofthe disk, without servicing any requests on the return trip.

n Treats the cylinders as a circular list that wraps aroundfrom the last cylinder to the first one.


C-SCAN (Cont.)


C-LOOK

n Version of C-SCANn Arm only goes as far as the last request in each direction,

then reverses direction immediately, without first going allthe way to the end of the disk.

3


C-LOOK (Cont.)


Selecting a Disk-Scheduling Algorithm

n SSTF is common and has a natural appealn SCAN and C-SCAN perform better for systems that place a

heavy load on the disk.n Either SSTF or LOOK is a reasonable choice for the default

algorithm.n Performance depends on the number and types of requests.

n Requests for disk service can be influenced by the file-allocationmethod.

n The disk-scheduling algorithm is written as a separate moduleof the operating system, allowing it to be replaced with a

different algorithm if necessary.


Physical Disk Management

n Low-level formatting, or physical formatting — Dividing adisk into sectors that the disk controller can read andwrite.

n To use a disk to hold files, the operating system stillneeds to record its own data structures on the disk.F Partition the disk into one or more groups of cylinders.F Logical formatting or “making a file system”.

n Boot block initializes system.F The bootstrap is stored in ROM.F Bootstrap loader program.


MS-DOS Disk Layout


RAID Structure

n RAID – multiple disk drives provides reliability viaredundancy.

n RAID is arranged into six different levels.


RAID (cont)

n Several improvements in disk-use techniques involve theuse of multiple disks working cooperatively.

n Disk striping spreads the blocks in a file across multipledisks in certain patterns.

n RAID schemes improve performance and improve thereliability of the storage system by storing redundant data.F Mirroring or shadowing keeps duplicate of each disk.F Block interleaved parity uses much less redundancy.

4


RAID Levels


n Data is Striped for improved performanceF Distributes data over multiple disks to make them appear as a single fast large diskF Allows multiple I/Os to be serviced in parallel

4 Multiple independent requests serviced in parallel4 A block request may be serviced in parallel by multiple disks

n Data is Redundant for improved reliabilityF Large number of disks in an array lowers the reliability of the array

4 Reliability of N disks = Reliability of 1 disk /N4 Example:

– 50,000 hours / 70 disks = 700 hours– Disk System MTTF drops from 6 years to 1 month

F Arrays without redundancy are unreliable to be useful

RAID


n RAID 0 (Non-redundant)F Stripes Data; but does not employ redundancyF Lowest cost of any RAIDF Best Write performance - no redundant informationF Any single disk failure is catastrophicF Used in environments where performance is more important than reliability.

RAID Silberschatz, Galvin and Gagne ”200213.22Operating System Concepts

D0 D3D2D1

D4 D7D6D5

D8 D11D10D9

D12 D15D14D13

D19D18D17D16

Stripe Unit

Stripe

RAID

Disk 1 Disk 4Disk 3Disk 2


n RAID 1 (Mirrored)F Uses twice as many disks as non-redundant arrays - 100% Capacity Overhead - Two copies of

data are maintainedF Data is simultaneously written to both arraysF Data is read from the array with shorter queuing, seek and rotation delays - Best Read

Performance.F When a disk fails, mirrored copy is still availableF Used in environments where availability and performance (I/O rate) are more important than

storage efficiency.


n RAID 2 (Memory Style ECC)F Uses Hamming code - parity for distinct overlapping subsets of dataF # of redundant disks is proportional to log of total # of disks - Better for large # of disks - e.g.,

4 data disks require 3 redundant disksF If disk fails, other data in subset is used to regenerate lost dataF Multiple redundant disks are needed to identify faulty disk

RAID

5


n RAID 3 (Bit Interleaved Parity)F Data is bit -wise over the data disksF Uses Single parity disk to tolerate disk failures - Overhead is 1/NF Logically a single high capacity, high transfer rate diskF Reads access data disks only; Writes access both data and parity disksF Used in environments that require high BW (Scientific, Image Processing, etc.) , and not high I/O

rates

100100

111010

110111

100101

110011


n RAID 4 (Block Interleaved Parity)F Similar to bit-interleaved parity disk array; except data is block- interleaved (Striping Units)F Read requests smaller than one striping unit, access one Striping unitF Write requests update the data block; and the parity block.F Generating parity requires 4 I/O accesses (RMW)F Parity disk gets updates on all writes - a bottleneck

RAID


n RAID 5 (Block-Interleaved Distributed Parity)F Eliminates the parity disk bottleneck in RAID 4 - Distributes parity among all the disksF Data is distributed among all disksF All disks participates in read requests - Better performance than RAID 4F Write requests update the data block; and the parity block.F Generating parity requires 4 I/O accesses (RMW)F Left symmetry v.s. Right Symmetry - Allows each disk to be traversed once before any disk twice


D0 PD3D2D1

D4 D7PD6D5

D8 D11D10PD9

D12 D15D14D13P

P D19D18D17D16

Stripe Unit

Stripe

RAID


D0’ D0 PD3D2D1

+

+

Old Parity(2. Read)

Old Data1. Read

NewData

D0’ P’D3D2D1

3. WriteNew Data 4. Write New

Parity


n RAID 6 (P + Q Redundancy)F Uses Reed-Solomon codes to protect against up to 2 disk failuresF Data is distributed among all disksF Two sets of parity P & QF Write requests update the data block; and the parity blocks.F Generating parity requires 6 I/O accesses (RMW) - update both P & QF Used in environments that require stringent reliability requirements

RAID

6


n ComparisonsF Read/Write Performance

4 RAID 0 provides the best Write performance4 RAID 1 provides the best Read Performance

F Cost - Total # of Disks4 RAID 1 is most expensive - 100% capacity overhead - 2N Disks4 RAID 0 is least expensive - N Disks - no redundancy4 RAID 2 needs N + ceiling(log2N) + 14 RAID 3, RAID 4 & RAID 5 needs N + 1 disks

RAID

Comparisons


n Preferred EnvironmentsF RAID 0: Performance & capacity are more important than reliabilityF RAID 1: High I/O rate, high availability environmentsF RAID 2: Large I/O Data TransferF RAID 3: High BW Applications (Scientific, Image Processing…)F RAID 4: High bit BW ApplicationsF RAID 5 & RAID 6: Mixed Applications

RAID


RAID Level Small Reads Small Writes Large Reads Large Writes Storage Efficiency

RAID 0 1 1 1 1 1

RAID 1 1 1/2 1 1/2 1/2

RAID 3 1/G 1/G (G-1)/G (G-1)/G (G-1)/G

RAID 5 1 max(1/G,1/4) 1 (G-1)/G (G-1)/G

RAID 6 1 max(1/G,1/6) 1 (G-2)/G (G-2)/G

The table below, which shows Throughput per $$ relative to RAID 0, assumes that G drives in an error correcting group


n What RAID for which applicationF Fast Workstation:

4 Caching is important to improve I/O rate4 If large files are installed, then RAID 0 may be necessary4 It is preferred to put the OS and swap files in separate drives from user drives to

minimize movement between swap file area & user area.F Small Server:

4 RAID 1 is preferredF Mid-Size Server:

4 If more capacity is needed, then RAID 5 is recommendedF Large Server: e.g. Database Servers

4 RAID 5 is preferred4 Separate different I/Os in mechanically independent arrays; place index & data

files in databases in different arrays

RAID


Price per Megabyte of DRAM, From 1981 to 2000


Price per Megabyte of Magnetic Hard Disk, From 1981 to 2000

7


Price per Megabyte of a Tape Drive, From 1984-2000

Documents

Disk Scheduling Disk Scheduling (Cont.)lsc.fie.umich.mx/~juan/Materias/UofO/cis415/lectures/415disk.pdf · n Disk bandwidth is the total number of bytes transferred divided by time