module 3 capp mgu s6

Embed Size (px)

Citation preview

  • 7/26/2019 module 3 capp mgu s6

    1/39

    Module 3

    Memory hierarchyComputer memory is organized into a hierarchy.

    At the highest level (closest to the processor) are the processor registers. Next comes one or more levelscache,When multiple levels are used, they are denoted L1, L2, and so on. Next comes main memory, which is usua

    made out of dynamic random-access memory (DRAM). All of these are considered internal to the computer system

    The hierarchy continues with external memory, with the next level typically being a fixed hard disk, and one or mo

    levels below that consisting of removable media such as optical disks and tape.As one goes down the memohierarchy, one finds decreasing cost/bit, increasing capacity, and slower access time.

    Fastest access is to the data held in processor registers. Registers are at the top of the memory hierarchy. Relativsmall amount of memory that can be implemented on the processor chip. This is processor cache. Two levels

    cache. Small Level 1 (L1) cache is on the processor chip. Larger Level 2 (L2) cache is in between main memory a

    processor. Cached made of SRAM since speed is a concern. Next level is main memory, is implemented us

    DRAM. Much larger, but much slower than cache memory. Next level is magnetic disks that provide huge amountinexpensive storage.

    Memory characteristics

    Key Characteristics of Computer Memory Systems

  • 7/26/2019 module 3 capp mgu s6

    2/39

    The term location in Table refers to whether memory is internal and external to the computer. Internal memory

    often equated with main memory. But there are other forms of internal memory.The processor requires its own lo

    memory, in the form of registers. External memory consists of peripheral storage devices, such as disk and tape, thare accessible to the processor via I/O controllers.

    An obvious characteristic of memory is its capacity. For internal memory, this is typically expressed in termsbytes (1 byte =8 bits) or words. Common word lengths are 8, 16, and 32 bits. External memory capacity is typica

    expressed in terms of bytes.

    A related concept is the unit of transfer. For internal memory, the unit of transfer is equal to the number of electri

    lines into and out of the memory module

    Word: The natural unit of organization of memory. The size of the word is typically equal to the number of b

    used to represent an integer and to the instruction length. For external memory, data are often transferred in mu

    larger units than a word, and these are referred to as blocks. Another distinction among memory types is the metho

    of accessing units of data.These includethe following:

    Sequential access:Memory is organized into units of data, called records. Access must be made in a specific lin

    sequence

    Direct access: As with sequential access, direct access involves a shared readwrite mechanism. The time to accesgiven location is independent of the sequence of prior accesses and is constant

    Associative: This is a random access type of memory that enables one to make a comparison of desired bit locatio

    within a word for a specified match, and to do this for all words simultaneouslyAnother important characteristic of memory is performance Access time (latency): For random-access memory, t

    is the time it takes to perform a read or write operation. For non-random-access memory, access time is the time

    takes to position the readwrite mechanism at the desired location. Memory cycle time: This concept is primarapplied to random-access memory and consists of the access time plus any additional time required before a seco

    access can commence.

    Transfer rate: This is the rate at which data can be transferred into or out of a memory unit. Several physicharacteristics of data storage are important. In a volatile memory, information decays naturally or is lost whelectrical power is switched off. In a nonvolatile memory, information once recorded remains without deteriorat

    until deliberately changed; no electrical power is needed to retain information. Magnetic-surface memories

    nonvolatile. Semiconductor memory may be either volatile or nonvolatile. For random-access memory,

    organization is a key design issue. By organization is meant the physical arrangement of bits to form words.

    Internal organization of semiconductor RAM memories The basic element of a semiconductor memory is

    memory cell. Memory cells share certain properties:

  • 7/26/2019 module 3 capp mgu s6

    3/39

    They exhibit two stable (or semistable) states, which can be used to representbinary 1 and 0.

    They are capable of being written into (at least once), to set the state.

    They are capable of being read to sense the state.

    Each memory cell can hold one bit of information.

    Memory cells are organized in the form of an array.

    One row is one memory word.

    All cells of a row are connected to a common line, known as the wordline.

    Word line is connected to the address decoder.

    Sense/write circuits are connected to the data input/output lines of the memory chip.

    RAMOne distinguishing characteristic of RAM is that it is possible both to read data from the memory and to write n

    data into the memory easily and rapidly. Both the reading and writing are accomplished through the use of electrisignals. The other distinguishing characteristic of RAM is that it is volatile. A RAM must be provided with a constpower supply. If the power is interrupted, then the data are lost.Thus, RAM can be used only as tempor

    storage.The two traditional forms of RAM used in computers are

    DRAM and SRAM.

    SRAMSRAM is a digital device. In a SRAM, binary values are stored using traditional flip-flop logic-gate configurations

    static RAM will hold its data as long as power is supplied to it.

    Consist of circuits that are capable of retaining their state as long as the power is applied.Volatile memories, because their contents are lost when power is interrupted.

    Access times of static RAMs are in the range of few nanoseconds.

    However, the cost is usually high.

  • 7/26/2019 module 3 capp mgu s6

    4/39

    SRAM Cell

    Two transistor inverters are cross connected to implement a basic flipflop.

    The cell is connected to one word line and two bits lines by transistors T1 and T2When word line is at ground level, the transistors are turned off and the latch retains its state

    Read operation: In order to read state of SRAM cell, the word line is activated to close switches T1 and T

    Sense/Write circuits at the bottom monitor the state of b and b.

    For a write operation, the desired bit value is applied to line b, while its complement is applied to bline . Trequired signlas are generated by sense/write circuitry.This forces the transistors into the proper state.

    CMOS SRAM Cell is shown below

    Four transistors (T1,T2,T3,T4) are cross connected in an arrangement that produces a stable logic state. In logic st

    1, point C1 is high and point C2 is low; in this state,T1 and T4 are off and T2 and T3 are on.1 In logic state 0, po

    C1 is low and point C2 is high; in this state,T1 and T4 are on and T2 and T3 are off. Both states are stable as longthe direct current (dc) voltage is applied. No refresh is needed to retain data. Advantage of CMOS SRAM-low pow

    consumption

    Advantage of SRAMSRAM can be accessed very quickly. Access time of few nano seconds.

    DRAM

    A dynamic RAM (DRAM) is made with cells that store data as charge on capacitors. The presence or absence

    charge in a capacitor is interpreted as a binary 1 or 0. Because capacitors have a natural tendency to dischardynamic RAMs require periodic charge refreshing to maintain data storage. The term dynamic refers to this tenden

    of the stored charge to leak away, even with power continuously applied.

    Do not retain their state indefinitely.

    Contents must be periodically refreshed.

    Contents may be refreshed while accessing them for reading

    DRAM CELL

  • 7/26/2019 module 3 capp mgu s6

    5/39

    Figure shows a typical DRAM structure for an individual cell those stores 1 bit.The address line is activated when

    bit value from this cell is to be read or written. The transistor acts as a switch that is closed (allowing current to flo

    if a voltage is applied to the address line and open (no current flows) if no voltage is present on the address line. the write operation, a voltage signal is applied to the bit line; a high voltage represents 1, and a low voltage represe

    0. A signal is then applied to the address line, allowing a charge to be transferred to the capacitor. For the re

    operation, when the address line is selected, the transistor turns on and the charge stored on the capacitor is fed o

    onto a bit line and to a sense amplifier. The sense amplifier compares the capacitor voltage to a reference value a

    determines if the cell contains logic 1 or a logic 0. The readout from the cell discharges the capacitor, which mustrestored to complete the operation. Although the DRAM cell is used to store a single bit (0 or 1), it is essentially

    analog device. The capacitor can store any charge value within a range; a threshold value determines whether charge is interpreted as 1 or 0.

    SRAM VERSUS DRAM

    Both static and dynamic RAMs are volatile; that is, power must be continuously supplied to the memory to preserthe bit values. A dynamic memory cell is simpler and smaller than a static memory cell.Thus, a DRAM is more den

    (smaller cells more cells per unit area) and less expensive than a corresponding SRAM. On the other hand, a DRA

    requires the supporting refresh circuitry. For larger memories, the fixed cost of the refresh circuitry is more th

    compensated for by the smaller variable cost of DRAM cells.Thus, DRAMs tend to be favored for large memorequirements. A final point is that SRAMs are generally somewhat faster than DRAMs. Because of these relat

    characteristics, SRAM is used for cache memory (both on and off chip), and DRAM is used for main memory.Asynchronous DRAMs

  • 7/26/2019 module 3 capp mgu s6

    6/39

    Each row can store 512 bytes. 12 bits to select a row, and 9 bits to select a group in a row. Total of 21 bits.

    First apply the row address,RAS signal latches the row address. Then apply the column address, CAS signal latcthe address.

    Timing of the memory unit is controlled by a specialized unit which generates RAS and CAS.This is asynchron

    DRAM

    Fast Page ModeSuppose if we want to access the consecutive bytes in the selected row. This can be done without having to resel

    the row. Add a latch at the output of the sense circuits in each row. All the latches are loaded when the row

    selected. Different column addresses can be applied to select and place different bytes on the data lines.Consecutsequence of column addresses can be applied under the control signal CAS, without reselecting the row. This allo

    a block of data to be transferred at a much faster rate than random accesses. A small collection/group of bytesusually referred to as a block. This transfer capability is referred to as the fast page mode feature.

    Synchronous DRAMs

    Operation is directly synchronized with processor clock signal. The outputs of the sense circuits are connected to a latch.

    During a Read operation, the contents of the cells in a row areloaded onto the latches.

    During a refresh operation, the contents of the cells are refreshedwithout changing the contents of the latches. Data held inthe latches correspond to the selected columns are transferred to the output.

    For a burst mode of operation, successive columns are selected using

    Column address counter and clock.CAS signal need not be generated externally. A new data is placed during rais

    edge of the clock

    Latency and BandwidthMemory latency is the time it takes to transfer a word of data to or from memory

    Memory bandwidth is the number of bits or bytes that can be transferred in one second.

    DDRSDRAMsCell array is organized in two banks.Datais transferred on both edges of clock hence bandwidth is double.

    Flash memory

    Read the contents of a single cell, but write the contents of an entire block of cells. Flash devices have greadensity. Higher capacity and low storage cost per bit. Power consumption of flash memory is very low, making

    attractive for use in equipment that is battery-driven. Like EEPROM, flash memory uses an electrical eras

    technology. An entire flash memory can be erased in one or a few seconds, which is much faster than EPROM

  • 7/26/2019 module 3 capp mgu s6

    7/39

    addition, it is possible to erase just blocks of memory rather than an entire chip. Flash memory gets its name beca

    the microchip is organized so that a section of memory cells are erased in a single action or flash. Like EPROflash memory uses only one transistor per bit, and so achieves the high density. Single flash chips are not sufficien

    large, so larger memory modules are implemented using flash cards and flash drives.

    Cache memory

    Cache memory is an architectural arrangement which makes the main memory appear faster to the processor thanreally is.

    The cache contains a copy of portions of main memory. When the processor attempts to read a word of memory

    check is made to determine if the word is in the cache. If so, the word is delivered to the processor. If not, a blockmain memory, consisting of some fixed number of words, is read into the cache and then the word is delivered to processor. Because of the phenomenon of locality of reference, when a block of data is fetched into the cache

    satisfy a single memory reference, it is likely that there will be future references to that same memory location or

    other words in the block.

    Cache memory is based on the property of computer programs known as locality of reference.Analysis of programs indicates that many instructions in localized areas of a program are executed repeated

    during some period of time, while the others are accessed relatively less frequently.These instructions may be the ones in a loop, nested loop or few procedures calling each other repeatedly.

    This is called locality of reference.

    Temporal locality of reference:

    Recently executed instruction is likely to be executed again very soon.Spatial locality of reference:

    Instructions with addresses close to a recently instruction are likely to be executed soon.

    Processor issues a Read request, a block of words is transferred from the main memory to the cache, one word atime. Subsequent references to the data in this block of words are found in the cache. At any given time, only so

    blocks in the main memory are held in the cache. Which blocks in the main memory are in the cache is determineda mapping function. When thecache is full, and a block of words needs to be transferred from the main memo

    some block of words in the cache must be replaced. This is determinedby a replacement algorithm.If the data ithe cache it is called a Read or Write hit.

    Read hit:

    The data is obtained from the cache. Write hit:

    Cache has a replica of the contents of the main memory. Contents of the cache and the main memory may

    updated simultaneously. This is the write-through protocol. Update the contents of the cache, and mark it as upda

    by setting a bit known as the dirty bit or modified bit. The contents of the main memory are updated when this blois replaced. This is write-back or copy-back protocol

    If the data is not present in the cache, then a Read miss or Write missoccurs.Read miss:

    Block of words containing this requested word is transferred from the memory. After the block is transferred,

    desired word is forwarded to the processor.

    The desired word may also be forwarded to the processor as soon as it is transferred without waiting for the ent

    block to be transferred. This is called load-through or early-restart. Write-miss:

    Write-through protocol is used, then the contents of the main memory are updated directly.

  • 7/26/2019 module 3 capp mgu s6

    8/39

    If write-back protocol is used, the block containing the addressed word is first brought into the cache. The desi

    word is overwritten with new information.

    Cache Coherence ProblemA bit called as valid bit is provided for eachblock. If the block contains valid data, then the bit is set to 1, else i

    0. Valid bits are set to 0, when the power is just turned on. When a block is loaded into the cache for the first tim

    the valid bit is set to 1. Data transfers between main memory and disk occur directly bypassing the cache. When data on a disk changes, the main memory block is also updated. However, if the data is also resident in the cac

    then the valid bit is set to 0 if the data in the disk and main memory changes and the write-back protocol is be

    used, the data in the cache may also have changed and is indicated by the dirty bit. The copies of the data in cache, and the main memory are different. This is called the cache coherence problem. One option is to force a wri

    back before the main memory is updated from the disk.Mapping function

    Mapping functions determine how memory blocks are placed in the cache. A simple processor example:

    Cache consisting of 128 blocks of 16 words each.

    Total size of cache is 2048 (2K) words.

    Main memory is addressable by a 16-bit address.Main memory has 64K words.

    Main memory has 4K blocks of 16 words each.

    Three mapping functions:

    Direct mappingAssociative mapping

    Set-associative mapping.

  • 7/26/2019 module 3 capp mgu s6

    9/39

    Direct mapping

    Block j of the main memory maps to j modulo 128 of the cache. 0 maps to 0, 129 maps to 1.

    More than one memory block is mapped onto the same position in thecache. May lead to contention for cache blocks even if the cache is not full.

    Resolve the contention by allowing new block to replace the old blockleading to a trivial replacement algorith

    Memory address is divided into three fields: Low order 4 bits determine one of the 16 words in a block. When a nblock is brought into the cache, the next 7 bits determine which cache block this new block is placed in. High orde

    bits determine which of the possible 32 blocks is currently present in the cache. These are tag bits. Simple

    implement but not very flexible.

    Associative mapping

  • 7/26/2019 module 3 capp mgu s6

    10/39

    Main memory block can be placed into any cache position. Memory address is divided into two fields: Low orde

    bits identify the word within a block. High order 12 bits or tag bits identify a memory block when it is resident in cache. Replacement algorithms can be used to replace an existing block in the cache when the cache is full.

    Flexible, and uses cache space efficiently.

    Cost is higher than direct-mapped cache because of the need to search all 128 patterns to determine whethegiven block is in the cache.

  • 7/26/2019 module 3 capp mgu s6

    11/39

    Set-Associative mapping

    Blocks of cache are grouped into sets. Mapping function allows a block of the main memory to reside in any block

    a specific set. Divide the cache into 64 sets, with two blocks per set. Memory block 0, 64, 128 etc. map to blockand they can occupy either of the two positions.Memory address is divided into three fields: 6 bit field determines

    set number. High order 6 bit fields are compared to the tag fields of the two blocks in a set. Set-associative mappcombination of direct and associative mapping. Number of blocks per set is a design parameter. One extreme is

    have all the blocks in one set, requiring no set bits (fully associative mapping).Other extreme is to have one block

    set, is the same as direct mapping.

    Replacement algorithmOnce the cache has been filled, when a new block is brought into the cache, one of the existing blocks must

    replaced. For direct mapping, there is only one possible line for any particular block, and no choice is possible. F

    the associative and set associative techniques, a replacement algorithm is needed. To achieve high speed, suchalgorithm must be implemented in hardware. the most effective algorithm is least recently used (LRU): Replace t

    block in the set that has been in the cache longest with no reference to it. For two-way set associative, this is easimplemented. Because we are assuming that more recently used memory locations are more likely to be referencLRU should give the best hit ratio. LRU is also relatively easy to implement for a fully associative cache Becauseits simplicity of implementation, LRU is the most popular replacement algorithm. Another possibility is first-in-firout (FIFO): Replace that block in the set that has been in the cache longest. FIFO is easily implemented as a rou

    robin or circular buffer technique. Still another possibility is least frequently used (LFU): Replace that block in set that has experienced the fewest references. LFU could be implemented by associating a counter with each line

  • 7/26/2019 module 3 capp mgu s6

    12/39

    technique not based on usage (i.e., not LRU, LFU, FIFO, or some variant) is to pick a line at random from among

    candidate lines.

    Measurement and improvement of cache performance Performance parameters

    Hit rate

    The no;of hits rated as a fraction of all attempted access is called a hit rate.

    Miss penaltyNo:of misses stated as a fraction of total attempted accesses.

    High performance computer will have high hit rate and less miss penalty.

    Impact of cache on on overall performance of computerLet h be the hit rate

    M-Miss penaltyC-time to access information in cache Average access time experienced by processorT ave =hc(1-h)M

    Hit rate can be improved by increasing block size, while keeping cache size constant to take advantage of spatial

    locality

    Miss penalty increases as block size increases.Miss penalty can be reduced if load-through approach is used wheloading new blocks into cache.

    Block sizes that are neither very small nor very large give best results.

    Enhancing cache performance

    In high performance processors 2 levels of caches are normally used.L1 cache- for fast accssess

    L2-smaller but larger ensure high hit rate.Avg access time in a system with 2 levels of caches isT ave = h1c1+(1-h1)h2c2+(1-h1)(1-h2)M

    h1-hit rate in L1cache

    c1- time to access information in L1 cacheh2- hit rate in L2cache

    c2- time to access information in L1 cache

    (1-h1)(1-h2)-No:of misses in L2 cache.

    Other Performance Enhancements

    Write bufferA writ buffer can be introduced for temporary storage of write requests.

    Write-through:Each write operation involves writing to the main memory.

    If the processor has to wait for the write operation to be complete, itslows down the processor.

    Processor does not depend on the results of the write operation.

    Write buffer can be included for temporary storageof write requests. Processor places each write request into the buffer and continuesexecution.

    If a subsequent Read request references data which is still in the writebuffer, then this data is referenced in the wr

    buffer.Write-back:

    Block is written back to the main memory when it is replaced.

    If the processor waits for this write to complete, before reading the newblock, it is slowed down.

    Fast write buffer can hold the block to be written, and the new block canbe read first.PrefetchingNew data are brought into the processor when they are first needed. Processor has to wait before the data transfe

    complete. Prefetch the data into the cache before they are actually needed, or a before a Read miss occuPrefetching can be accomplished through software by including a special Instruction in the machine language of

    processor. Inclusion of prefetch instructions increases the length of the programs.Prefetching can also

    accomplished using hardware: Circuitry that attempts to discover patterns in memory references and then prefetcaccording to this pattern.

  • 7/26/2019 module 3 capp mgu s6

    13/39

    Lockup-Free Cache

    Prefetching scheme does not work if it stops other accesses to the cache until the prefetch is completed. A cachethis type is said to be locked whileit services a miss. Cache structure which supports multiple outstanding misse

    called a lockup free cache. Since only one miss can be serviced at a time, a lockup free cache must include circu

    that keep track of all the outstanding misses. Special registers may hold the necessary information about these miss

    Virtual memoriesVirtual memory is an architectural solution to increase the effective size of the memory system. Physical m

    memory in a computer is generally not as large as the entire possible addressable space. Physical memory typica

    ranges from a few hundred megabytes to 1G bytes. Large programs that cannot fit completely into the main memhave their parts stored on secondary storage devices such as magnetic disks. Pieces of programs must be transfer

    to the main memory from secondary storage before they can be executed. Techniques that automatically moprogram and data between main memory and secondary storage when they are required for execution are calvirtual-memory techniques. Programs and processors reference an instruction or data independent of the size of

    main memory. Processor issues binary addresses for instructions and data. These binary addresses are called logi

    or virtual addresses. Virtual addresses are translated into physical addresses by a combination of hardware a

    software subsystems. If virtual address refers to a part of the program that is currently in the main memory, iaccessed immediately. If the address refers to a part of the program that is not currently in the main memory, it is f

    transferred to the main memory before it can be used.

    Virtual memory organization

    Memory management unit (MMU) translates virtual addresses into physical addresses. If the desired data instructions are in the main memory they are fetched as described previously. If the desired data or instructions

    not in the main memory, they must be transferred from secondary storage to the main memory. MMU causes

    operating system to bring the data from the secondary storage into the main memory.

    Address translationEach virtual or logical address generated by a processor is interpreted as a virtual page number (high-order bits) p

    an offset (low-order bits) that specifies the location of a particular byte within that page. program and data composed of fixed-length units called pages. A page consists of a block of words that occupy contiguous locations

    the main memory. Page is a basic unit of information that is transferred between secondary storage and m

    memory. Size of a page commonly ranges from 2K to 16K bytes. Pages should not be too small, because the acc

    time of a secondary storage device is much larger than the main memory. Pages should not be too large, else a lar

    portion of the page may not be used, and it will occupy valuable space in the main memory. than the main memoPages should not be too large, else a large portion of the page may not be used, and it will occupy valuable space

    the main memory. Information about the main memory location of each page is kept in the page table.

    Main memory address where the page is stored.Current status of the page.

    Area of the main memory that can hold a page is called as page frame. Starting address of the page table is kept i

    page table base register. Virtual page number generated by the processor is added to the contents of the page ta

  • 7/26/2019 module 3 capp mgu s6

    14/39

    base register. This provides the address of the corresponding entry in the page table. The contents of this location

    the page table give the starting address of the page if the page is currently in the main memory. Page table entryfor a page also includes some control bits which describe the status of the page while it is in the main memory. O

    bit indicates the validity of the page. Indicates whether the page is actually loaded into the main memory. Allows

    operating system to invalidate the page without actually removing it. One bit indicates whether the page has b

    modified during its residency in the main memory. This bit determines whether the page should be written back to disk when it is removed from the main memory. Similar to the dirty or modified bit in case of cache memory. Oth

    control bits for various other types of restrictions that may be imposed. For example, a program may only have r

    permission for a page, but not write or modify permissions. The page table is used by the MMU for every read awrites access to the memory. Ideal location for the page table is within the MMU. Page table is quite large.MMU

    implemented as part of the processor chip. So it is impossible to include a complete page table on the chip. Page tabis kept in the main memory. A copy of a small portion of the page table can be accommodated within the MMU.Thportion consists of page table entries that correspond to the most recently accessed pages small cache called

    Translation Look aside Buffer (TLB) is included in the MMU.TLB holds page table entries of the most recen

    accessed pages. Page table entry for a page includes: Address of the page frame where the page resides in the m

    memory. Some control bits are there. In addition to the above for each page, TLB must hold the virtual page numbfor each page

    Associative-mapped TLB

    High-order bits of the virtual address generated by the processor select the virtual page. These bits are comparedthe virtual page numbers in the TLB.If there is a match, a hit occurs and the corresponding address of the page fram

    is read. If there is no match, a miss occurs and the page table within the main memory must be consulted. S

    associative mapped TLBs are found in commercial processors. A control bit is provided in the TLB to invalidate

    entry. If an entry is invalidated, then the TLB gets the information for that entry from the page table. Follows same process that it would follow if the entry is not found in the TLB or if a miss occurs. If a program generates

    access to a page that is not in the main memory a page fault is said to occur. Whole page must be brought into

    main memory from the disk, before the execution can proceed.Upon detecting a page fault by the MMU, follow

    actions occur:MMU asks the operating system to intervene by raising an exception.

    processing of the active task which caused the page fault is interrupted.

    Control is transferred to the operating system.

  • 7/26/2019 module 3 capp mgu s6

    15/39

    Operating system copies the requested page from secondary storage to the main memory.

    Once the page is copied, control is returned to the task which was interrupted.Servicing of a page fault requires transferring the requested page from secondary storage to the main memory. T

    transfer may incur a long delay. While the page is being transferred the operating system may:

    Suspend the execution of the task that caused the page fault.

    Begin execution of another task whose pages are in the main memory. Enables efficient use of the processor. ensure that the interrupted task can continue correctly when it resumes execution ,there are two possibilities:

    Execution of the interrupted task must continue from the point where it was interrupted.

    The instruction must be restarted. When a new page is to be brought into the main memory from secondstorage, the main memory may be full. Some page from the main memory must

    be replaced with this new page. How to choose which page to replace is similar to the replacement that occurs whthe cache is full. The principle of locality of reference can also be applied here.A replacement strategy similar to LRcan be applied. Since the size of the main memory is relatively larger compared to cache, a relatively large amount

    programs and data can be held in the main memory. This minimizes the frequency of transfers between second

    storage and main memory. A page may be modified during its residency in the main memory. Write-through proto

    cannot be used, since it will incur a long delay each time a small amount of data is written to the disk.

    MMUThe Memory Management Unit (MMU) translates logical or virtual addresses into physical addresses. MMU uses

    contents of the page table base register to determine the address of the page table to be used in the translati

    Changing the contents of the page table base register can enable us to use a different page table, and switch from ospace to another. At any given time, the page table base register can point to one page table. Thus, only one pa

    table can be usedin the translation process at a given time. Pages belonging to only one space are accessible at any given time.Processor usually has two states of operation:

    Supervisor state.

    User state.Supervisor state:

    Operating system routines are executed.

    User state:

    User programs are executed.Certain privileged instructions cannot be executed in user state.

    These privileged instructions include the ones which change page table base register.

    Prevents one user from accessing the space of other users.Secondary memoriesmagnetic and optical disks

    Magnetic disks

    A disk is a circular platter constructed of nonmagnetic material, called the substrate, coated with a magnetiza

    material. Traditionally, the substrate has been an aluminum or aluminum alloy material. More recently, glsubstrates have been introduced

    Magnetic Read

  • 7/26/2019 module 3 capp mgu s6

    16/39

    Magnetic Read and Write MechanismsData are recorded on and later retrieved from the disk via a conducting coil named the head; in many systems, th

    are two heads, a read head and a write head. During a read or write operation, the head is stationary while the pla

    rotates beneath it. The write mechanism exploits the fact that electricity flowing through a coil produces a magnefield. Electric pulses are sent to the write head, and the resulting magnetic patterns are recorded on the surface belo

    with different patterns for positive and negative currents. The write head itself is made of easily magnetizamaterial and is in the shape of a rectangular doughnut with a gap along one side and a few turns of conducting walong the opposite side. An electric current in the wire induces a magnetic field across the gap, which in tu

    magnetizes a small area of the recording

    medium. Reversing the direction of the current reverses the direction of the magnetization on the recording mediuThe traditional read mechanism exploits the fact that a magnetic field moving relative to a coil produces an electri

    current in the coil. When the surface of the disk passes under the head, it generates a current of the same polarity

    the one already recorded. The structure of the head for reading is in this case essentially the same as for writing atherefore the same head can be used for both. Such single heads are used in floppy disk systems and in older ri

    disk systems. Contemporary rigid disk systems use a different read mechanism, requiring a separate read he

    positioned for convenience close to the write head. The read head consists of a partially shielded magneto resist

    (MR) sensor.The MR material has an electrical resistance that depends on the direction of the magnetization of tmedium moving under it. By passing a current through the MR sensor, resistance changes are detected as volt

    signals. The MR design allows higher-frequency operation, which equates to greater storage densities and operat

    speeds

    Organization of Data on a Disk

  • 7/26/2019 module 3 capp mgu s6

    17/39

    The head is a relatively small device capable of reading from or writing to a portion of the platter rotating beneath

    This gives rise to the organization of data on the platter in a concentric set of rings, called tracks. Each track is

    same width as the head.There is thousands of tracks per surface. Adjacent tracks are separated by gaps. T

    prevents, or at least minimizes, errors due tomisalignment of the head or simply interference of magnetic fields. Data are transferred to and from the disk

    sectors. There are typically hundreds of sectors per track, and these may be of either fixed or variable length. In mcontemporary systems, fixed-length sectors are used, with 512 bytes being the nearly universal sector size. A bit n

    the center of a rotating disk travels past a fixed point (such as a readwrite head) slower than a bit on the outsiTherefore, some way must be found to compensate for the variation in speed so that the head can read all the bits

    the same rate. This can be done by increasing the spacing between bits of information recorded in segments of

    disk. The information can then be scanned at the same rate by rotating the disk at a fixed speed, known as tconstant angular velocity (CAV). The disk is divided into a number of pie-shaped sectors and into a series

    concentric tracks. The advantage of using CAV is that individual blocks of data can be directly addressed by tra

    and sector. To move the head from its current location to a specific address, it only takes a short movement of

    head to a specific track and a short wait for the proper sector to spin under the head. The disadvantage of CAV is tthe amount of data that can be stored on the long outer tracks is the only same as what can be stored on the short in

    tracksAn example of disk formatting. each track contains 30 fixed-length sectors of 600 bytes each. Each sector holds 5bytes of data plus control information useful to the disk controller. The ID field is a unique identifier or address u

    to locate a particular sector. The SYNCH byte is a special bit pattern that delimits the beginning of the field. T

    track number identifies a track on a surface. The head number identifies a head, because this disk has multisurfaces. The ID and data fields each contain an error detecting code

  • 7/26/2019 module 3 capp mgu s6

    18/39

    Disk Controller

    Disk Performance ParametersOn a movablehead system, the time it takes to position the head at the track is known as seek time. In either ca

    once the track is selected, the disk controller waits until the appropriate sector rotates to line up with the head. T

    time it takes for the beginning of the sector to reach the head is known as rotational delay, or rotational latency. Tsum of the seek time, if any, and the rotational delay equals the access time, which is the time it takes to get i

    position to read or write. Once the head is in position, the read or write operation is then performed as the sec

    moves under the head; this is the data transfer portion of the operation; the time required for the transfer is transfer time. SEEK TIME Seek time is the time required to move the disk arm to the required Track. The seek ti

    consists of two key components: the initial startup time, and the time taken to traverse the tracks that have to

    crossed once the access arm is up to speed. TRANSFER TIME The transfer time to or from the disk depends on trotation speed of the disk in the following fashion:

    where Ts is the average seek time.

    Various types of magnetic disks.

    In a fixed-head disk, there is one read-write head per track. All of the heads are mounted on a rigid arm that extenacross all tracks.. In a movablehead disk, there is only one read-write head. Again, the head is mounted on an ar

    Because the head must be able to be positioned above any track, the arm can be extended or retracted for this purp

    A nonremovable disk is

    permanently mounted in the disk drive; the hard disk in a personal computer is a nonremovable disk. A remova

    disk can be removed and replaced with another disk. The advantage of the latter type is that unlimited amounts

    data are available with a limited number of disk systems. Furthermore,such a disk may be moved from one compu

  • 7/26/2019 module 3 capp mgu s6

    19/39

    system to another. Floppy disks and ZIP cartridge disks are examples of removable disks. floppy disk, which i

    small, flexible platter and the least expensive type of disk For most disks, the magnetizable coating is applied to bosides of the platter, which is then referred to as double sided. Some less expensive disk systems use single-sid

    disks.

    Optical Disks

    Types

    CDCompact Disk. A nonerasable disk that stores digitized audio information. The standard system uses 12-cm disks a

    can record more than 60 minutes of uninterrupted playing time.

    CD-ROM

    Compact Disk Read-Only Memory. A nonerasable disk used for storing computer data.The standard system uses cm disks and can hold more than 650 Mbytes.

    CD-RCD Recordable. Similar to a CD-ROM.The user can write to the disk only once.

    CD-RW

    CD Rewritable. Similar to a CD-ROM.The user can erase and rewrite to the disk multiple times.

    DVDDigital Versatile Disk. A technology for producing digitized, compressed representation of video information, as w

    as large volumes of other digital data. Both 8 and 12 cm diameters are used, with a double-sided capacity of up to

    Gbytes.The basic DVD is read-only (DVD-ROM).

    DVD-R

    DVD Recordable. Similar to a DVD-ROM.The user can write to the disk only once. Only one-sided disks canused.

    DVD-RWDVD Rewritable. Similar to a DVD-ROM.The user can erase and rewrite to the disk multiple times. Only one-sid

    disks can be used.

    Blu-Ray DVDHigh definition video disk. Provides considerably greater data storage density than DVD, using a 405-nm(blue-vio

    laser. A single layer on a single side can store 25 Gbytes.

    Compact DiskThe disk is formed from a resin, such as polycarbonate. Digitally recorded information (either music or compu

    data) is imprinted as a series of microscopic pits on the surface of the polycarbonate. This is done, first of all, wit

    finely focused, high-intensity laser to create a master disk. The master is used, in turn, to make a die to stamp copies onto polycarbonate. The pitted surface is then coated with a highly reflective surface, usually aluminum

    gold. This shiny surface is protected against dust and scratches by a top coat of

    clear acrylic. Finally, a label can be silkscreened onto the acrylic.

    Information is retrieved from a CD or CD-ROM by a low-powered laser housed in an optical-disk player, or drunit. The laser shines through the clear polycarbonate while a motor spins the disk past it .The intensity of

    reflected light of the laser changes as it encounters a pit. Specifically, if the laser beam falls on a pit, which ha

    somewhat rough surface, the light scatters and a low intensity is reflected back to the source. The areas between p

  • 7/26/2019 module 3 capp mgu s6

    20/39

    are called lands. A land is a smooth surface, which reflects back at higher intensity. The change between pits a

    lands is detected by a photo sensor and converted into a digital signal . The beginning or end of a pit represent1when no change in elevation occurs between intervals, a 0 is recorded.

    To achieve greater capacity, CDs and CD-ROMs do not organize information on concentric tracks. Instead, the dcontains a single spiral track, beginning near the center and spiraling out to the outer edge of the disk. Sectors n

    the outside of the disk are the same length as those ear the inside. Thus, information is packed evenly across the d

    in segments of the same size and these are scanned at the same rate by rotating the disk at a variable speed.The p

    are then read by the laser at a constant linear velocity (CLV). The disk rotates more slowly for accesses near the ou

    edge than for those near the center. Thus, the capacity of a track and the rotational delay both increase for positionearer the outer edge of the disk. The data capacity for a CD-ROM is about 680 MB.

    Data on the CD-ROM are organized as a sequence of blocks.A typical block format consists of the following fields:

    Sync: The sync field identifies the beginning of a block. It consists of a byte of

    all 0s, 10 bytes of all 1s, and a byte of all 0s. Header: The header contains the block address and the mode byte. Mode 0

    specifies a blank data field; mode 1 specifies the use of an error-correcting code and 2048 bytes of data; mode 2

    specifies 2336 bytes of user data with no error-correcting code.

    Data: User data. Auxiliary: Additional user data in mode 2. In mode 1, this is a 288-byte error correcting code.

    The CD-ROM has two advantages:The optical disk together with the information stored on it can be mass replicated inexpensively

    The optical disk is removable, allowing the disk itself to be used for archival storage. The disadvantages of CD-ROM are as follows:

    It is read-only and cannot be updated.

    It has an access time much longer than that of a magnetic disk drive, as much as half a second.

  • 7/26/2019 module 3 capp mgu s6

    21/39

    CD RECORDABLETo accommodate applications in which only one or a small number of copies of a set of data is needed, the write-onread-many CD, known as the CD recordable (CD-R), has been developed. For CD-R, a disk is prepared in such a w

    that it can be subsequently written once with a laser beam of modest intensity. For a CD-R, the medium include

    dye layer. The dye is used to change reflectivity and is activated by a high-intensity laser. The resulting disk canread on a CD-R drive or a CD-ROM drive. The CD-R optical disk is attractive for archival storage of documents a

    files. It provides a permanent record of large volumes of user data.

    CD REWRITABLE

    The CD-RW optical disk can be repeatedly written and overwritten. The phase change disk uses a material that two significantly different reflectivities in two different phase states. There is an amorphous state, in which

    molecules exhibit a random orientation that reflects light poorly; and a crystalline state, which has a smooth surfathat reflects light well. A beam of laser light can change the material from one phase to the other. The primadisadvantage of phase change optical disks is that the material eventually and permanently loses its desira

    properties The CD-RW has the obvious advantage over CD-ROM and CD-R that it can be rewritten and thus used

    a true secondary storage.

    Digital Versatile DiskThe DVD takes video into the digital age. It delivers movies with impressive picture quality, and it can be random

    accessed like audio CDs, which DVD machines can also play. Vast volumes of data can be crammed onto the di

    currently seven times as much as a CD-ROM. The DVDs greatercapacity is due to three differences from CDs B

    are packed more closely on a DVD The DVD employs a second layer of pits and lands on top of the first layerdual layer DVD has a semi reflective layer on top of the reflective layer, and by adjusting focus, the lasers in DV

    drives can read each layer separately. This technique almost doubles the capacity of the disk.The DVDROM cantwo sided, whereas data are recorded on only one side of a CD.This brings total capacity up to 17 GB

    High-Definition Optical DisksHigh-definition optical disks are designed to store high-definition videos and to provide significantly greater stor

    capacity compared to DVDs. The higher bit density is achieved by using a laser with a shorter wavelength, in blue-violet range. The data pits, which constitute the digital 1s and 0s, are smaller on the high definition optical di

    compared to DVD because of the shorter laser wavelength. Two competing disk formats and technologies initia

    competed for market acceptance: HD DVD and Blu-ray DVD. The HD DVD scheme can store 15 GB on a sinlayer on a single side. Blu-ray positions the data layer on the disk closer to the laser. This enables a tighter focus a

    less distortion and thus smaller pits and tracks. Blu-ray can store 25 GB on a single layer. Three versions

    available: read only (BD-ROM), recordable once (BDR), and rerecord able (BD-RE).

    Multiple I/O devices may be connected to the processor and the memory via a bus.

    Bus consists of three sets of lines to carry address, data and controlsignals.

    Each I/O device is assigned an unique address. To access an I/O device, the processor places the address on the addresslines.

    The device recognizes the address, and responds to the control signals.

  • 7/26/2019 module 3 capp mgu s6

    22/39

    I/O devices and the memory may share the same address space:

    Memory-mapped I/O. Any machine instruction that can access memory can beused to transfer data to or from an I/O device.

    Simpler software.

    I/O devices and the memory may have different address spaces:

    Special instructions to transfer data to and from I/O devices. I/O devices may have to deal with fewer address lines.

    I/O address lines need not be physicallyseparate from memory address lines.

    In fact, address lines may be shared between I/O devices andmemory, with a control signal to indicate whether ia memory address or an I/O address.

    I/O device is connected to the bus using an I/O interface circuit which has: - Address decoder, control circuit, and

    data and status registers.

    Address decoder decodes the address placed on the address lines thusenabling the device to recognize its address

    Data register holds the data being transferred to or from the processor. Status register holds information necessary for the operation of the I/Odevice.

    Data and status registers are connected to the data lines, and have uniqueaddresses.

    I/O interface circuit coordinates I/O transfers.

    I/O Accessing techniquesThe rate of transfer to and from I/O devices is slower than the speed of the processor. This creates the need

    mechanisms to synchronize data transfers between them.Three techniques for this pupose areProgrammed I/OInterrupt-driven I/O

    Direct memory access (DMA)

    With programmed I/O, data are exchanged between the processor and theI/O module. The processor executes a program that gives it direct control of the I/O operation, including sens

    device status, sending a read or write command, and transferring the data. When the processor issues a command

    the I/O module, it must wait until the I/O operation is complete. If the processor is faster than the I/O module, thiswasteful of processor time. With interrupt driven I/O, the processor issues an I/O command, continues to exec

    other instructions, and is interrupted by the I/O module when the latter has completed its work. With bo

    programmed and interrupt I/O, the processor is responsible for extracting data from main memory for output a

    storing data in main memory for input. The alternative is known as direct memory access (DMA).

    Programmed I/OProcessor repeatedly monitors a status flag to achieve the necessary synchronization. Processor polls the I/O devi

    When the processor is executing a program and encounters an instruction relating to I/O, it executes that instructiby issuing a command to the appropriate I/O module.With programmed I/O, the I/O module will perform

    requested action and then set the appropriate bits in the I/O status register it is the responsibility of the proces

    periodically to check the status of the I/O module until it finds that the operation is complete There are four typesI/O commands that an I/O module may receive when it is addressed by a processor:

  • 7/26/2019 module 3 capp mgu s6

    23/39

    Control: Used to activate a peripheral and tell it what to do.

    Test: Used to test various status conditions associated with an I/O module and its peripherals

    Read: Causes the I/O module to obtain an item of data from the peripheral and place it in an internal buffer

    Write: Causes the I/O module to take an item of data (byte or word) from the data bus and subsequently transmit t

    data item to the peripheral.

    Figure gives an example of the use of programmed I/O to read in a block of data from a peripheral device (e.g

    record from tape) into memory. Data are read in one word (e.g., 16 bits) at a time. For each word that is read in, processor must remain in a status-checking cycle until it determines that the word is available in the I/O modul

    data register Disadvantage of this technique:

    It is a time-consuming process that keeps the processor busy needlessly.Interrupt driven I/OIn program-controlled I/O, when the processor continuously monitors the status of the device, it does not perform a

    useful tasks. An alternate approach would be for the I/O device to alert the processor when it becomes ready. Thi

    done by sending a hardware signal called an interrupt to the processor. At least one of the bus control lines, calledinterrupt-request line is dedicated for this purpose. Processor can perform other useful tasks while it is waiting for

    device to be ready. Interrupt driven I/0o IS SHOWN BELOW.

  • 7/26/2019 module 3 capp mgu s6

    24/39

    The processor issues a READ command. It then goes off and does something else (e.g., the processor may

    working on several different programs at the same time When the interrupt from the I/O module occurs, the processaves the context (e.g., program counter and processor registers) of the current program and processes the interru

    In this case, the processor reads the word of data from the I/O module and stores it in memory. It then restores

    context of the program it was working on (or some other program) and resumes execution. interrupt I/O sconsumes a lot of processor time, because every word of data that goes from memory to I/O module or from

    module to memory must pass through the processor

    Interrupt Processing

    Processor is executing the instruction located at address i when an interrupt occurs.

    Routineexecuted in response to an interrupt request is called the interrupt-service routine.

    When an interrupt occurs, control must be transferred to the interruptservice routine.

    But before transferring control, the current contents of the PC (i+1), mustbe saved in a known location. This will enable the return-from-interrupt instruction to resume execution at i+1.

    Return address, or the contents of the PC are usually stored on theprocessor stack.The occurrence of an interrupt triggers a number of events, both in the processor hardware and in software Figu

    shows a typical sequence

  • 7/26/2019 module 3 capp mgu s6

    25/39

    When an I/O device completes an I/O operation, the following sequence of hardware events occurs:1. The device issues an interrupt signal to the processor.

    2. The processor finishes execution of the current instruction before responding to the interrupt.

    3. The processor tests for an interrupt, determines that there is one, and sends an acknowledgment signal to the dev

    that issued the interrupt. The acknowledgment allows the device to remove its interrupt signal.4. The processor now needs to prepare to transfer control to the interrupt routine. To begin, it needs to sa

    information needed to resume the current program at the point of interrupt.The minimum information required is

    the status of the processor, which is contained in a register called the program status word (PSW), and (b) the locatiof the next instruction to be executed, which is If the processor uses a polling mechanism to poll the status regis t

    of I/O devices to determine which device is requesting an interrupt.

    In this case the priority is determined by the order in which the devicesare polled.

    The first device with status bit set to 1 is the device whose interruptrequest is accepted.

    Devices are connected to form a daisy chain.

    Devices share the interrupt-request line, and interrupt-acknowledge line is connected to form a daisy chain. When devices raise an interrupt request, the interrupt-request line is activated.

    The processor in response activates interrupt-acknowledge.

    Received by device 1, if device 1 does not need service, it passes thesignal to device 2.

    Device that is electrically closest to the processor has the highest priority.

  • 7/26/2019 module 3 capp mgu s6

    26/39

    Priority bus arbitration

    Each device has a separate interrupt-request and interrupt-acknowledge line.

    Each interrupt-request line is assigned a different priority level.

    Interrupt requests received over these lines are sent to a priorityarbitration circuit in the processor. If the interrupt request has a higher priority level than the priority of the processor, then the request is accepted.

    Drawbacks of Interrupt-Driven I/O

    Requires the active intervention of the processor to transfer data between memory and an I/O module, and any data

    transfer must traverse a path through the processor.

    Drawbacks of Programmed I/O and Interrupt-Driven I/OThe I/O transfer rate is limited by the speed with which the processor can test and service a device.

    The processor is tied up in managing an I/O transfer; a number of instructions must be executed for each I/O

    transferDirect Memory AccessWhen large volumes of data are to be moved, a more efficient technique is required: direct memory access (DMA

    The DMA module transfers the entire block of data, one word at a time, directly to or from memory, without gothrough the processor A special control unit may be provided to transfer a block of data directly between an I

    device and the main memory, without continuous intervention by the processor. Control unit which performs th

    transfers is a part of the I/O devices interface circuit. This control unit is calledas a DMA controller.DMA controperforms functions that would be normally carried out by the processor: For each word, it provides the mem

    address and all the control signals. To transfer a block of data, it increments the memory addresses and keeps track

    the number of transfers.

    DMA Function

    DMA involves an additional module on the system bus. The DMA module shown in figure is capable of mimick

    the processor and, indeed, of taking over control of the system from the processor. It needs to do this to transfer d

    to and from memory over the system bus. For this purpose, the DMA module must use the bus only when processor does not need it, or it must force the processor to suspend operation temporarily. The latter technique

  • 7/26/2019 module 3 capp mgu s6

    27/39

    more common and is referred to as cycle stealing, because the DMA module in effect steals a bus cycle. When t

    processor wishes to read or write a block of data, it issues a command to the DMA module, by sending to the DMmodule the following information:

    Whether a read or write is requested, using the read or write control line between the processor and the DM

    module

    The address of the I/O device involved, communicated on the data linesThe starting location in memory to read from or write to, communicated on the data lines and stored by the DM

    module in its address register

    The number of words to be read or written, again communicated via the datelines and stored in the data couregister The DMA module transfers the entire block of data, one word at a time, directly to or from memory, with

    going through the processor. When the transfer is complete, the DMA module sends an interrupt signal to processor. Thus, the processor is involved only at the beginning and end of thetransfer as shown

    DMA controller can be used to transfer a block of data from an external device to the processor, without requirany help from the processor. As a result the processor is free to execute other programs. However, the DM

    controller should perform the task of transferring data to or from an I/O device for a program that is being execu

    by a processor. That is, the DMA controller does not and should not have the capability to determine when a dtransfer operation should take place. The processor must initiate DMA transfer of data, when it is indicated

    required by the program that is being executed by the processor. When the processor determines that the program t

    is being executed requires a DMA transfer, it informs the DMA controller which sits in the interface circuit of device of three things, namely, the starting address of the memory location, the number of words that needs to

    transferred, and the direction of transfer that is, whether the data needs to be transferred from the I/O device to t

    memory or from the memory to the I/O device. After initiating the DMA transfer, the processor suspends the progr

    that initiated the transfer, and continues with the execution of some other program. The program whose executionsuspended is said to be in the blocked state.

    Let us consider a memory organization with two DMA controllers. In this memory organization, a DMA controlleused to connect a high speed network to the computer bus. In addition, disk controller which also controls two dismay have DMA capability. The disk controller controls two disks and it also has DMA capability. The disk control

    provides two DMA channels. The disk controller can two independent DMA operations, as if each disk has its o

  • 7/26/2019 module 3 capp mgu s6

    28/39

    DMA controller. Each DMA controller has three registers, one to store the memory address, one to store the wo

    count, and the last to store the status and control information. There are two copies of these three registers in orderperform independent DMA operations. That is, these registers are duplicated. Processor also has to transfer data

    and from the main memory. Also, the DMA controller is responsible for transferring data to and from the I/O dev

    to the main memory. Both the processor and the DMA controller have to use the external bus to talk to the m

    memory. Usually, DMA controllers are given higher priority than the processor to access the bus. Now, we also neto decide the priority among different DMA devices that may need to use the bus. Among these different DM

    devices, high priority is given to high speed peripherals such as a disk or a graphics display device. Usually,

    processor originates most cycles on the bus. The DMA controller can be said to steal memory access cycles on frthe bus. Thus, the processor and the DMA controller use the bus in an interwoven fashion. This interweav

    technique is called as cycle stealing. An alternate approach would be to provide DMA controllers exclusive capabito initiate transfers on the bus, and hence exclusive access to the main memory. This is known as the block modethe burst mode of operation.

    Processor and DMA controllers both need to initiate data transfers on the bus and access main memory. The proc

    of using the bus to perform a data transfer operation is called as the initiation of a transfer operation. At any poin

    time only one device is allowed to initiate transfers on the bus. The device that is allowed to initiate transfers on tbus at any given time is called the bus master. When the current bus master releases control of the bus, another dev

    can acquire the status of the bus master. How does one determine which is the next device which will acquire t

    status of the bus master. Note that there may be several DMA controllers plus the processor which requires access

    the bus. The process by which the next device to become the bus master is selected and bus mastership is transferrto it is called bus arbitration. There are two types of bus arbitration processes. Centralized arbitration a

    distributed arbitration. In case of centralizedarbitration, a single bus arbiter performs the arbitration. Whereas in case of distributed arbitration all devices whneed to initiate data transfers on the bus participate or are involved in the selection of the next bus master.

    Centralized Bus Arbitration

    Bus arbiter may be the processor or a separate unit connected to the bus.

    Normally, the processor is the bus master, unless it grants busmembership to one of the DMA controllers. DMA controller requests the control of the bus by asserting the BusRequest (BR) line.

    In response, the processor activates the Bus-Grant1 (BG1) line, indicating that the controller may use the bus wh

    it is free.

    BG1 signal is connected to all DMA controllers in a daisy chainfashion. BBSY signal is 0, it indicates that the bus is busy. When BBSY becomes 1, the DMA controller which asserted

    can acquire control of the bus.

  • 7/26/2019 module 3 capp mgu s6

    29/39

    Distributed arbitration

    All devices waiting to use the bus share the responsibility of carrying out the arbitration process. Arbitration procdoes not depend on a central arbiter and hence distributed arbitration has higher reliability. Each device is assigne

    4-bit ID number. All the devices are connected using 5 lines, 4 arbitration lines to transmit the ID, and one line for

    Start-Arbitration signal. To request the bus a device: Asserts the Start-Arbitration signal. Places its 4-bit ID num

    on the arbitration lines. The pattern that appears on theArbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.

    Arbitration process:

    Each device compares the pattern that appears on the arbitration lines to its own ID, starting with MSB.If it detects a difference, it transmits 0s on the arbitration lines for that and all lower bit positions. The pattern tha

    appears on the arbitration lines is the logical-OR of all the 4-bit device IDs placed on the arbitration lines.

    If this pattern is the same as the device ID and that device has won the arbitration

    BusesProcessor, main memory, and I/O devices are interconnected by means of a bus. The device that initiates the d

    transfer on the bus by issuing read or write control signals is called as a master. The device that is being addressed

    the master is called a slave or a target. Bus provides a communication path for the transfer of data. Bus also includ

    lines to support interrupts and arbitration. A bus protocol is the set of rules that govern the behavior of variodevices connected to the bus, as to when to place information on the bus, when to assert control signals, etc.Bus lines may be grouped into three types:

    Data

    Address

    Control

    Control signals specify:Whether it is a read or a write operation.

    Required size of the data, when several operand sizes (byte, word, long word) are possible. Timing information

    indicate when the processor and I/O devices may place data or receive data from the bus.Schemes for timing of d

    transfers over a bus can be classified into:

    SynchronousAsynchronous

  • 7/26/2019 module 3 capp mgu s6

    30/39

    Synchronous bus

    Timing of an input transfer (read )on a synchronous bus

    At t0- Master places the device address and command on the bus, and indicates that it is a Read operation. At

    Addressed slave places data on the data lines At t2-Master strobes the data on the data lines into its input buffer, a Read operation. In case of a Write operation, the master places the data on the bus along with the address a

    commands at time t0.The slave strobes the data into its input buffer at time t2. Once the master places the dev

    address and command on the bus, it takes time for this information to propagate to the devices: This time depends the physical and electrical characteristics of the bus. Also, all the devices have to be given enough time to decode

    address and control signals, so that the addressed slave can place data on the bus. Width of the pulse t1 - t0 depen

    on:

    Maximum propagation delay between two devices connected to thebus.

    Time taken by all the devices to decode the address and control signals, so that the addressed slave can respond

    time t1. At the end of the clock cycle, at time t2, the master strobes the data on the data lines into its input buffeits a Read operation.Strobe means to capturethe values of the data and store them into a buffer.When data are

    be loaded into a storage buffer register, the data should be available for a period longer than the setup time of

    device.Width of the pulse t2 - t1 should be longer than:

    Maximum propagation time of the bus plusSet up time of the input buffer register of the master.

    Detailed timing diagram for read operation

  • 7/26/2019 module 3 capp mgu s6

    31/39

    Signals do not appear on the bus as soon as they are placed on the bus, due to the propagation delay in the interfa

    circuits. Signals reach the devices after a propagation delay which depends on the characteristics of the bus. Dmust remain on the bus for some time after t2 equal to the hold time of the buffer. Data transfer has to be comple

    within one clock cycle. Clock period t2 - t0 must be such that the longest propagation delay on the bus and

    slowest device Interface must be accommodated. Forces all the devices to operate at the speed of the slowest devi

    Processor just assumes that the data are available at t2 in case of a Read operation, or are read by the device in casea Write operation. Most buses have control signals to represent a response from the slave. Control signals serve t

    purposes:

    Inform the master that the slave has recognized the address, and isready to participate in a data transfer operation Enable to adjust the duration of the data transfer operation based on the speed of the participating slaves.

    Input transfer using multiple clock cycles.High-frequency bus clock is used. Data transfer spans several clock cycles instead of just one clock cycle as in earlier case.

    Slave-ready signal is an acknowledgement from the slave to the master to confirm that the valid data has been seDepending on when the slave-ready signal is asserted, the duration of the data transfer can change.

    Asynchronous bus

    Data transfers on the bus is controlled by a handshake between the master and the slave. Common clock in

    synchronous bus case is replaced by two timing control lines:Master-ready

    Slave-ready

    Master-ready signal is asserted by the master to indicate to the slave that it is ready to participate in a data transfSlave-ready signal is asserted by the slave in response to the master-ready from the master, and it indicates to

    master that the slave is ready to participate in a data transfer. Data transfer using the handshake protocol:

    Master places the address and command information on the bus.

    Asserts the Master-ready signal to indicate to the slaves that the address and command information has been plaon the bus.

    All devices on the bus decode the address.

    Address slave performs the required operation, and informs the processor it has done so by asserting the Sla

    ready signal.Master removes all the signals from the bus, once Slave-ready is asserted.

    If the operation is a Read operation, Master also strobes the data into its input buffer.

  • 7/26/2019 module 3 capp mgu s6

    32/39

    Handshake control of data transfer during an input operation

    t0 - Master places the address and command information on the bus.

    t1 - Master asserts the Master-ready signal. Master-ready signal is asserted at t1 instead of t0

    t2 - Addressed slave places the data on the bus and asserts the Slave-readysignal.

    t3 - Slave-ready signal arrives at the master.

    t4 - Master removes the address and command information.t5 - Slave receives the transition of the Master-ready signal from 1 to 0. It removes the data and the Slave-ready

    signal from the bus.

    Asynchronous vs. Synchronous bus

    Advantages of asynchronous bus:Eliminates the need for synchronization between the sender and the receiver.

    Can accommodate varying delays automatically, using the Slave-ready signal.

    Disadvantages of asynchronous bus:Data transfer rate with full handshake is limited by two-round trip delays.

    Data transfers using a synchronous bus involves only one round trip delay, and hence a synchronous bus can

    achieve faster rates.

  • 7/26/2019 module 3 capp mgu s6

    33/39

    Bus standards.

    A system using different interface standards

    Bridge circuit translates signals and protocols from processor bus to PCI bus.

    Three widely used bus standards:

    PCI (Peripheral Component Interconnect)SCSI (Small Computer System Interface)

    USB (Universal Serial Bus)

    PCI Bus

    Peripheral Component InterconnectIntroduced in 1992

    Low-cost bus

    Processor independentPlug-and-play capability

    In todays computers, most memory transfers involve a burst of datarather than just one word. The PCI is design

    primarily to support this mode of operation.

    The bus supports three independent address spaces: memory, I/O, and configuration.we assumed that the master maintains the address information on the bus until data transfer is completed. But,

    address is needed only long enough for the slave to be selected. Thus, the address is needed on the bus for one clo

    cycle only, freeing the address lines to be used for sending data in subsequent clock cycles. The result is a signific

    cost reduction.A master is called an initiator in PCI terminology. The addressed device that responds to read and write comman

    is called a target Data transfer signals on the PCI bus

  • 7/26/2019 module 3 capp mgu s6

    34/39

    A read operation on the PCI bus

    Device Configuration

    When an I/O device is connected to a computer, several actions are needed to configure both the device and software that communicates with it

    PCI incorporates in each I/O device interface a small configuration ROM memory that stores information ab

    that device.The configuration ROMs of all devices are accessible in the configuration address space. The PCI initializat

    software reads these ROMs and determines whether the device is a printer, a keyboard, an Ethernet interface, o

    disk controller. It can further learn bout various device options and characteristics.Devices are assigned addresses during the initialization process.

  • 7/26/2019 module 3 capp mgu s6

    35/39

    This means that during the bus configuration operation, devices cannot be accessed based on their address, as th

    have not yet been assigned one.Hence, the configuration address space uses a different mechanism. Each device has an input signal cal

    Initialization Device Select, IDSEL#

    Electrical characteristics:

    PCI bus has been defined for operation with either a 5 or 3.3 V power supply

    SCSI BusThe acronym SCSI stands for Small Computer System Interface.

    It refers to a standard bus defined by the American National Standards Institute (ANSI) under the designatX3.131 .

    In the original specifications of the standard, devices such as disks are connected to a computer via a 50-wcable, which can be up to 25 meters in length and can transfer data at rates up to 5 megabytes/s.

    The SCSI bus standard has undergone many revisions, and its data transfer capability has increased very rapid

    almost doubling every two years.

    SCSI-2 and SCSI-3 have been defined, and each has several options.

    Because of various options SCSI connector may have 50, 68 or 80 pins.Devices connected to the SCSI bus are not part of the address space of the processor

    The SCSI bus is connected to the processor bus through a SCSI controller. This controller uses DMA to trans

    data packets from the main memory to the device, or vice versa.

    A packet may contain a block of data, commands from the processor to the device, or status information about device.

    A controller connected to a SCSI bus is one of two types an initiator or a target. An initiator has the abilityselect a particular target and to send commands specifying the operations to be performed. The disk controloperates as a target. It carries out the commands it receives from the initiator.

    The initiator establishes a logical connection with the intended target.

    Once this connection has been established, it can be suspended and restored as needed to transfer commands abursts of data.

    While a particular connection is suspended, other device can use the bus to transfer information.

    This ability to overlap data transfer requests is one of the key features of the SCSI bus that leads to its hi

    performanceData transfers on the SCSI bus are always controlled by the target controller.

    To send a command to a target, an initiator requests control of the bus and, after winning arbitration, selects the

    controller it wants to communicate with and hands control of the bus over to it.Then the controller starts a data transfer operation to receive a command from the initiator.

    Assume that processor needs to read block of data from a disk drive and that data are stored in disk sectors that

    not contiguous.

    The processor sends a command to the SCSI controller, which causes the following sequence of events to tplace:

    The SCSI controller, acting as an initiator, contends for control of the bus.

    When the initiator wins the arbitration process, it selects the target controller and hands over control of the busit.

    The target starts an output operation (from initiator to target); in response to this, the initiator sends a comma

    specifying the required read operation. The target, realizing that it first needs to perform a disk seek operation, sen

    a message to the initiator indicating that it will temporarily suspend the connection between them. Then it releasesthe bus.

    The target controller sends a command to the disk drive to move the read head to the first sector involved in

    requested read operation. Then, it reads the data stored in that sector and stores them in a data buffer. When it is reato begin transferring data to the initiator, the target requests control of the bus. After it wins arbitration, it reselects

    initiator controller, thus restoring the suspended connection.

  • 7/26/2019 module 3 capp mgu s6

    36/39

    The SCSI bus signals.

    Main Phases involved

    Arbitration

    A controller requests the bus by asserting BSY and by asserting itsassociated data line

    When BSY becomes active, all controllers that are requesting bus examine data lines

    Selection

    Controller that won arbitration selects target by asserting SEL and data line of target. After that initiator releasesBSY line.

    Target responds by asserting BSY lineTarget controller will have control on the bus from then

    Information Transfer

    Handshaking signals are used between initiator and target

    At the end target releases BSY line

    Reselection

    Arbitration and selection on the SCSI bus.

  • 7/26/2019 module 3 capp mgu s6

    37/39

    Device 6 wins arbitration and selects device 2.

    USB

    Universal Serial Bus (USB) is an industry standard developed through a collaborative effort of several computer

    and communication companies, including Compaq, Hewlett-Packard, Intel, Lucent, Microsoft, Nortel Networks, anPhilips.

    Speed

    Low-speed(1.5 Mb/s)

    Full-speed(12 Mb/s)

    High-speed(480 Mb/s)

    Port Limitation

    Device Characteristics

    Plug-and-play

    Electrical Characteristics

    The cables used for USB connections consist of four wires.

    Two are used to carry power, +5V and Ground.A hub or an I/O device may be powered directly from the bus(self powered BUS), or it may have its own exter

    power connection.The other two wires are used to carry data.Different signaling schemes are used for different speeds of transmission.

    At low speed, 1s and 0s are transmitted by sending a high voltage state (5V) on one or the other o the two sign

    wires. For highspeed links, differential transmission is used.

  • 7/26/2019 module 3 capp mgu s6

    38/39

    Universal Serial Bus tree structure

    To accommodate a large number of devices that can be added or removed at any time, the USB has the tree struct

    as shown in the figure. Each node of the tree has a device called a hub, which acts as an intermediate control po

    between the host and the I/O devices. At the root of the tree, a root hub connects the entire tree to the host compute

    The leaves of the tree are the I/O devices being served (for example, keyboard, Internet connection, speaker,digital TV)

    In normal operation, a hub copies a message that it receives from its upstream connection to all its downstre

    ports. As a result, a message sent by the host computer is broadcast to all I/O devices, but only the addressed devwill respond to that message. However, a message from an I/O device is sent only upstream towards the root of tree and is not seen by other devices. Hence, the USB enables the host to communicate with the I/O devices, bu

    does not enable these devices to communicate

    with each other.

    AddressingWhen a USB is connected to a host computer, its root hub is attached to the processor bus, where it appears a

    single device. The host software communicates with individual devices attached to the USB by sending packets

    information, which the root hub forwards to the appropriate device in the USB tree.Each device on the USB, whether it is a hub or an I/O device, is assigned a 7-bit address. This address is local

    the USB tree and is not related in any way to the addresses used on the processor bus.

    A hub may have any number of devices or other hubs connected to it, and addresses are assigned arbitrarily. Wha device is first connected to a hub, or when it is powered on, it has the address 0. The hardware of the hub to wh

    this device is connected is capable of detecting that the device has been connected, and it records this fact as part

    its own status information. Periodically, the host polls each hub to collect status information and learn about ndevices that may have been added or disconnected.

    When the host is informed that a new device has been connected, it uses a sequence of commands to send a re

    signal on the corresponding hub port, read information from the device about its capabilities, send configurat

    information to the device, and assign the device a unique USB address. Once this sequence is completed the devbegins normal operation and responds only to the new address.

    USB Protocols

    All information transferred over the USB is organized in packets, where a packet onsists of one or more bytes

    information. There are many types of packets that perform a variety of control functions.The information transferred on the USB can be divided into two broad categories: control and data.

    Control packets perform such tasks as addressing a device to initiate data transfer, acknowledging that data ha

    been received correctly, or indicating an error.Data packets carry information that is delivered to a device.

    A packet consists of one or more fields containing different kinds of information. The first field of any packet

    called the packet identifier, PID, which identifies the type of that packet.

  • 7/26/2019 module 3 capp mgu s6

    39/39

    They are transmitted twice. The first time they are sent with their true values, and the second time with each

    complementedThe four PID bits identify one of 16 different packet types. Some control packets, such as ACK (Acknowledg

    consist only of the PID byte.

    An output transfer

    Isochronous Traffic on USBOne of the key objectives of the USB is to support the transfer of isochronous data.

    Devices that generates or receives isochronous data require a time reference to control the sampling process.

    provide this reference. Transmission over the USB is divided into frames of equal length.A frame is 1ms long for low-and full-speed data.

    The root hub generates a Start of Frame control packet (SOF) precisely once every 1 ms to mark the beginning o

    new frame.

    The arrival of an SOF packet at any device constitutes a regular clock signal that the device can use for its opurposes.

    To assist devices that may need longer periods of time, the SOF packet carries an 11-bit frame number.Following each SOF packet, the host carries out input and output transfers for isochronous devices.This means that each device will have an opportunity for an input or output transfer once every 1 ms.