Lec4b Supp

Embed Size (px)

Citation preview

  • 7/29/2019 Lec4b Supp

    1/56

    Chapter 4B: The Processor,Part B

    Mary Jane Irwin ( www.cse.psu.edu/~mji )

    [Adapted from Computer Organization and Design, 4th Edition,

    CSE431 Chapter 4B.1 Irwin, PSU, 2008

    , ,

  • 7/29/2019 Lec4b Supp

    2/56

    Review: Why Pipeline? For Performance!

    me c oc cyc es

    A Once the

    I

    n

    s

    Inst 1

    LUeg eg

    AL

    IM Re DM Re

    p pe ne s u ,

    one instructionis completed

    r.

    O Inst 2

    U

    ALUIM Reg DM Reg

    ,CPI = 1

    r

    d

    e Inst 3ALUIM Reg DM Reg

    Inst 4ALUIM Reg DM Reg

    CSE431 Chapter 4B.2 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    3/56

    Review: MIPS Pipeline Data and Control Paths

    ID/EXEX/MEM

    Control

    PCSrc

    Add

    4 ShiftAdd

    IF/ID

    MEM/WB

    RegWriteBranch

    ReadAddress

    InstructionMemory

    PC

    Read Addr 1

    Read Addr 2

    Write Addr

    Register

    File

    ReadData 1

    ALU

    DataMemory

    Address Read

    MemtoReg

    ALUSrc

    Write Data

    eaData 2 Write Data

    Si n

    ALU

    cntrlMemWrite MemRead

    16 32Extend ALUOp

    CSE431 Chapter 4B.3 Irwin, PSU, 2008

    eg s

  • 7/29/2019 Lec4b Supp

    4/56

    Review: Can Pipelining Get Us Into Trouble?

    Yes: Pipeline Hazards

    z structural hazards: attempt to use the same resource by twodifferent instructions at the same time

    z data hazards: attempt to use data before it is ready- An instructions source operand(s) are produced by a prior

    z control hazards: attempt to make a decision about program

    control flow before the condition has been evaluated and the

    - branch and jump instructions, exceptions

    Pipeline control must detect the hazard and then takeaction to resolve hazards

    CSE431 Chapter 4B.4 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    5/56

    Review: Register Usage Can Cause Data Hazards

    Value of $1 10 10 10 10 10/-20 -20 -20 -20 -20

    ALUIM Reg DM Regadd$1,

    ALUIM Reg DM Regsub $4,$1,$5

    ALUIM Reg DM Regand $6,$1,$7

    LUIM Reg DM Reg

    A

    or $8,$1,$9

    CSE431 Chapter 4B.5 Irwin, PSU, 2008

    U, ,

  • 7/29/2019 Lec4b Supp

    6/56

    One Way to Fix a Data Hazard

    Iadd$1,

    AL

    IM Reg DM Reg

    Can fix datahazard by

    stall

    n

    s

    t

    but impacts CPI

    stall

    .

    O

    r

    d

    e

    rsub $4,$1,$5

    ALUIM Reg DM Reg

    and $6,$1,$7ALUIM Reg DM Reg

    CSE431 Chapter 4B.6 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    7/56

    Another Way to Fix a Data Hazard

    Iadd$1,

    ALUIM Reg DM Reg

    Fix data hazardsby forwarding

    results as soon as

    st

    r.sub $4,$1,$5

    ALUIM Reg DM Reg

    they are availableto where they are

    needed

    O

    r and $6,$1,$7

    ALUIM Reg DM Reg

    e

    r or $8,$1,$9ALUIM Reg DM Reg

    xor $4,$1,$5ALUIM Reg DM Reg

    CSE431 Chapter 4B.7 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    8/56

    Another Way to Fix a Data Hazard

    ALUIM Reg DM Reg

    Fix data hazardsby forwarding

    results as soon asIadd$1,

    ALUIM Reg DM Reg

    they are availableto where they are

    needed

    n

    st sub $4,$1,$5

    ALUIM Reg DM Reg

    .

    O

    r and $6,$1,$7

    ALUIM Reg DM Reg

    e

    r or $8,$1,$9

    ALUIM Reg DM Regxor $4,$1,$5

    CSE431 Chapter 4B.8 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    9/56

    Data Forwarding (aka Bypassing)

    a e e resu rom e ear es po n a ex s s n anyof the pipeline state registers and forward it to thefunctional units (e.g., the ALU) that need it that cycle

    For ALU functional unit: the inputs can come from anypipeline register rather than just from ID/EX by

    z adding multiplexors to the inputs of the ALU

    z connecting the Rd write data in EX/MEM or MEM/WB to either (or

    both) of the EXs stage Rs and Rt ALU mux inputs

    z adding the proper control hardware to control the new muxes

    Other functional units may need similar forwarding logic

    e.g., e With forwarding can achieve a CPI of 1 even in the

    CSE431 Chapter 4B.9 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    10/56

    Data Forwarding Control Conditions

    1. EX Forward Unit:i f ( EX/ MEM. RegWr i t eand ( EX/ MEM. Regi st er Rd ! = 0)

    Forwards theresult from the

    . .

    For war dA = 10i f ( EX/ MEM. RegWr i t eand ( EX/ MEM. Regi st er Rd ! = 0)

    previous instr.to either inputof the ALU

    and ( EX/ MEM. Regi st er Rd = I D/ EX. Regi st er Rt ) )For war dB = 10

    Forwards theresult from the

    . i f ( MEM/ WB. RegWr i t eand ( MEM/ WB. Regi st er Rd ! = 0)and ( MEM/ WB. Regi st er Rd = I D/ EX. Regi st er Rs) )

    previous instr.to either input

    For war dA = 01i f ( MEM/ WB. RegWr i t eand ( MEM/ WB. Regi st er Rd ! = 0)

    CSE431 Chapter 4B.10 Irwin, PSU, 2008

    . .For war dB = 01

  • 7/29/2019 Lec4b Supp

    11/56

    Forwarding Illustration

    I add$1,AL

    IM Reg DM Reg

    n

    s

    t sub $4,$1,$5ALUIM Reg DM Reg

    .

    O

    r and $6 $7 $1

    A

    LUIM Reg DM Reg

    d

    e

    r

    EX forwarding MEM forwarding

    CSE431 Chapter 4B.11 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    12/56

    Yet Another Complication!

    no er po en a a a azar can occur w en ere s aconflict between the result of the WB stage instructionand the MEM stage instruction which should be

    forwarded?

    I

    n

    add$1,$1,$2AL

    UIM Reg DM Reg

    t

    r. add$1,$1,$3ALUIM Reg DM Reg

    r

    d

    e

    add $1,$1,$4ALUIM Reg DM Reg

    CSE431 Chapter 4B.12 Irwin, PSU, 2008

    r

  • 7/29/2019 Lec4b Supp

    13/56

    Yet Another Complication!

    no er po en a a a azar can occur w en ere s aconflict between the result of the WB stage instructionand the MEM stage instruction which should be

    forwarded?

    I

    n

    add$1,$1,$2ALUIM Reg DM Reg

    t

    r. add$1,$1,$3ALUIM Reg DM Reg

    r

    d

    e

    add $1,$1,$4ALUIM Reg DM Reg

    CSE431 Chapter 4B.13 Irwin, PSU, 2008

    r

  • 7/29/2019 Lec4b Supp

    14/56

    Corrected Data Forwarding Control Conditions

    1. EX Forward Unit:i f ( EX/ MEM. RegWr i t eand ( EX/ MEM. Regi st er Rd ! = 0)and ( EX/ MEM. Regi st er Rd = I D/ EX. Regi st er Rs) )

    Forwards theresult from the

    =

    i f ( EX/ MEM. RegWr i t eand ( EX/ MEM. Regi st er Rd ! = 0)and ( EX/ MEM. Regi st er Rd = I D/ EX. Regi st er Rt ) )

    previous instr.

    to either inputof the ALU

    2. MEM Forward Unit:i f ( MEM/ WB. RegWr i t e

    For war dB = 10

    and ( MEM/ WB. Regi st er Rd ! = 0)and ( EX/ MEM. Regi st er Rd ! = I D/ EX. Regi st er Rs)and ( MEM/ WB. Regi st er Rd = I D/ EX. Regi st er Rs) )

    =

    Forwards theresult from the

    i f ( MEM/ WB. RegWr i t eand ( MEM/ WB. Regi st er Rd ! = 0)

    secondprevious instr.

    CSE431 Chapter 4B.14 Irwin, PSU, 2008

    an EX MEM. Reg st er R = I D EX. Reg st er Rtand ( MEM/ WB. Regi st er Rd = I D/ EX. Regi st er Rt ) )

    For war dB = 01

    of the ALU

  • 7/29/2019 Lec4b Supp

    15/56

    Datapath with Forwarding HardwarePCSrc

    ID/EXEX/MEM

    Control

    Add

    4 ShiftAdd

    IF/ID

    MEM/WBBranch

    ReadAddress

    InstructionMemory

    PC

    Read Addr 1

    Read Addr 2

    Write Addr

    Register

    File

    ReadData 1

    ALU

    DataMemory

    Address Read

    Write Data

    eaData 2

    16 32

    Write Data

    Sign

    ALU

    cntrl

    Forward

    CSE431 Chapter 4B.15 Irwin, PSU, 2008

    Unit

  • 7/29/2019 Lec4b Supp

    16/56

    Datapath with Forwarding HardwarePCSrc

    ID/EXEX/MEM

    Control

    Add

    4 ShiftAdd

    IF/ID

    MEM/WBBranch

    ReadAddress

    InstructionMemory

    PC

    Read Addr 1

    Read Addr 2

    Write Addr

    Register

    File

    ReadData 1

    ALU

    DataMemory

    Address Read

    Write Data

    eaData 2

    16 32

    Write Data

    Sign

    ALU

    cntrl

    ForwardID/EX.RegisterRt

    EX/MEM.RegisterRd

    MEM/WB.RegisterRd

    CSE431 Chapter 4B.16 Irwin, PSU, 2008

    UnitID/EX.RegisterRs

  • 7/29/2019 Lec4b Supp

    17/56

    Memory-to-Memory Copies

    or oa s mme a e y o owe y s ores memory- o-memory copies) can avoid a stall by adding forwardinghardware from the MEM/WB register to the data memoryinput.

    z Would need to add a Forward Unit and a mux to the MEM stage

    I

    str.

    lw $1,4($2) LUIM Reg DM Reg

    Ord

    sw $1,4($3)ALUIM Reg DM Reg

    CSE431 Chapter 4B.17 Irwin, PSU, 2008

    r

  • 7/29/2019 Lec4b Supp

    18/56

    Forwarding with Load-use Data Hazards

    I

    n

    lw $1,4($2)

    ALUIM Reg DM Reg

    s

    tr. A

    A

    LUIM Reg DM Regsub $4,$1,$5

    O

    r

    d

    and $6,$1,$7LUeg eg

    AL

    IM Re DM Ree

    r

    , , U

    AL

    IM Reg DM Reg

    ALUIM Reg DM

    CSE431 Chapter 4B.18 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    19/56

    Forwarding with Load-use Data Hazards

    I

    n

    lw $1,4($2)

    ALUIM Reg DM Reg

    stalls

    tr. A

    A

    LUIM Reg DM Regsub $4,$1,$5

    O

    r

    d

    su , ,LUeg eg

    AL

    IM Re DM Re

    an , ,

    e

    r

    , ,

    or $8,$1,$9

    U

    AL

    IM Reg DM Regxor $4,$1,$5

    xor $4,$1,$5ALUIM Reg DM

    CSE431 Chapter 4B.19 Irwin, PSU, 2008 Will still need one stall cycle even with forwarding

  • 7/29/2019 Lec4b Supp

    20/56

    Load-use Hazard Detection Unit

    ee a azar e ec on n n e s age a nser sa stall between the load and its use

    .

    i f ( I D/ EX. MemReadand ( ( I D/ EX. Regi st er Rt = I F/ I D. Regi st er Rs)or I D/ EX. Re i st er Rt = I F/ I D. Re i st er Rtst al l t he pi pel i ne

    The first line tests to see if the instruction now in the EXstage is a l w; the next two lines check to see if thedestination register of the l wmatches either source

    -instruction)

    After this one c cle stall, the forwardin lo ic can handle

    CSE431 Chapter 4B.20 Irwin, PSU, 2008

    the remaining data hazards

  • 7/29/2019 Lec4b Supp

    21/56

    Hazard/Stall Hardware

    , Prevent the instructions in the IF and ID stages from

    ro ressin down the i eline done b reventin the

    PC register and the IF/ID pipeline register from changingz Hazard detection Unit controls the writing of the PC (PC. wr i t e)

    .

    Insert a bubble between the l winstruction (in the EX

    stage) and the load-use instruction (in the ID stage) (i.e.,insert a noop in the execution stream)

    z Set the control bits in the EX, MEM, and WB control fields of theID/EX i eline re ister to 0 noo . The Hazard Unit controls the

    mux that chooses between the real control values and the 0s.

    Let the l winstruction and the instructions after it in the

    CSE431 Chapter 4B.21 Irwin, PSU, 2008

    p pe ne e ore n e co e procee norma y own e

    pipeline

  • 7/29/2019 Lec4b Supp

    22/56

    Adding the Hazard/Stall HardwarePCSrc

    ID/EXEX/MEM

    Hazard

    Unit0

    Add

    4 ShiftAdd

    IF/ID

    MEM/WB

    Control

    Branch

    1

    ReadAddress

    InstructionMemory

    PC

    Read Addr 1

    Read Addr 2

    Write Addr

    Register

    File

    ReadData 1

    ALU

    DataMemory

    Address Read

    Write Data

    eaData 2

    16 32

    Write Data

    Sign

    ALU

    cntrl

    Forward

    CSE431 Chapter 4B.22 Irwin, PSU, 2008

    Unit

  • 7/29/2019 Lec4b Supp

    23/56

    Adding the Hazard/Stall HardwarePCSrc

    ID/EXEX/MEM

    Hazard

    Unit0

    ID/EX.MemRead

    Add

    4 ShiftAdd

    IF/ID

    MEM/WB

    Control

    Branch

    1

    0

    ReadAddress

    InstructionMemory

    PC

    Read Addr 1

    Read Addr 2

    Write Addr

    Register

    File

    ReadData 1

    ALU

    DataMemory

    Address Read

    Write Data

    eaData 2

    16 32

    Write Data

    Sign

    ALU

    cntrl

    Forward

    CSE431 Chapter 4B.23 Irwin, PSU, 2008

    UnitID/EX.RegisterRt

  • 7/29/2019 Lec4b Supp

    24/56

    Control Hazards

    (i.e., PC = PC + 4); incurred by change of flow instructions

    z Unconditional branches (j , j al , j r )

    z onditional branches beq, bne

    z Exceptions

    z Stall (impacts CPI)

    z Move decision point as early in the pipeline as possible, therebyreducing the number of stall cycles

    z Delay decision (requires compiler support)

    z Predict and ho e for the best !

    Control hazards occur less frequently than data hazards,but there is nothing as effective against control hazards as

    CSE431 Chapter 4B.24 Irwin, PSU, 2008

    forwarding is for data hazards

  • 7/29/2019 Lec4b Supp

    25/56

    Datapath Branch and Jump HardwareJump

    ID/EXEX/MEM

    PCSrc

    Shift

    left 2

    Add

    4

    IF/ID

    MEM/WB

    Control

    BranchShift Add

    PC+4[31-28]

    ReadAddress

    InstructionMemory

    PC

    Read Addr 1

    Read Addr 2

    Write Addr

    Register

    File

    ReadData 1

    ALU

    DataMemory

    Address Read

    Write Data

    eaData 2

    16 32

    Write Data

    Sign

    ALU

    cntrl

    Forward

    CSE431 Chapter 4B.25 Irwin, PSU, 2008

    Unit

  • 7/29/2019 Lec4b Supp

    26/56

    Jumps Incur One Stall

    umps no eco e un , so one us s nee ez To flush, set I F. Fl ush to zero the instruction field of the

    IF/ID pipeline register (turning it into a noop)

    I jALUIM Reg DM Reg

    Fix jump

    flush

    n

    s

    t

    waiting

    flushALUIM

    RegDM Reg

    .

    O

    rj target

    ALUIM Reg DM Reg

    e

    r

    CSE431 Chapter 4B.26 Irwin, PSU, 2008

    or una e y, umps are very n requen on y o e

    SPECint instruction mix

  • 7/29/2019 Lec4b Supp

    27/56

    Two Types of Stalls

    Noop ns ruc on or u e nser e e ween woinstructions in the pipeline (as done for load-usesituations)

    z Keep the instructions earlierin the pipeline (later in the code)from progressing down the pipeline for a cycle (bounce them inplace with write control signals)

    z Insert noop by zeroing control bits in the pipeline register at theappropriate stage

    z Let the instructions later in the pipeline (earlier in the code)y w

    Flushes (or instruction squashing) were an instruction in

    for instructions located sequentially afterj instructions)z Zero the control bits for the instruction to be flushed

    CSE431 Chapter 4B.27 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    28/56

    Supporting ID Stage JumpsJump

    ID/EXEX/MEM

    PCSrc

    Shift

    left 2

    Add

    4

    IF/ID

    MEM/WB

    Control

    BranchShift Add

    PC+4[31-28]

    ReadAddress

    InstructionMemory

    PC

    Read Addr 1

    Read Addr 2

    Write Addr

    Register

    File

    ReadData 1

    ALU

    DataMemory

    Address Read

    0

    Write Data

    eaData 2

    16 32

    Write Data

    Sign

    ALU

    cntrl

    Forward

    CSE431 Chapter 4B.28 Irwin, PSU, 2008

    Unit

  • 7/29/2019 Lec4b Supp

    29/56

    Review: Branch Instrs Cause Control Hazards

    A

    I

    ns

    LUeg eg

    AL

    IM Re DM Re

    r.

    O Inst 3

    U

    A

    LUIM Reg DM Reg

    r

    d

    e Inst 4ALUIM Reg DM Reg

    CSE431 Chapter 4B.29 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    30/56

    One Way to Fix a Branch Control Hazard

    I

    n

    beqALUIM Reg DM Reg

    Fix branchhazard bywaitin

    flush

    s

    tr.

    flushbutaffects CPI

    ALUIM Reg DM Reg

    flushOr

    d

    LUIM Reg DM Reg

    ALIM Re DM Re

    flusher

    be tar et

    ALUIM Reg DM Reg

    U

    ALU

    Inst 3IM Reg DM

    CSE431 Chapter 4B.30 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    31/56

    Another Way to Fix a Branch Control Hazard

    the pipeline as possible i.e., during the decode cycle

    I beq Fix branchALUIM Reg DM Reg

    flush

    n

    s

    t

    waiting

    flushALUIM Reg DM Reg

    .

    O

    rbeqtarget

    ALUIM Reg DM Reg

    e

    r Inst 3

    ALUIM Reg DM

    CSE431 Chapter 4B.31 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    32/56

    Reducing the Delay of Branches

    z Reduces the number of stall (flush) cycles to two

    z Adds an and gate and a 2x1 mux to the EX timing path

    Add hardware to compute the branch target address andevaluate the branch decision to the ID stage

    z Reduces the number of stall (flush) cycles to one(like with jumps)

    - But now need to add forwardin hardware in ID sta e

    z Computing branch target address can be done in parallel withRegFile read (done for all instructions only used when needed)

    ,comparing and updating the PC adds a mux, a comparator, and anand gate to the ID timing path

    CSE431 Chapter 4B.32 Irwin, PSU, 2008

    For deeper pipelines, branch decision points can be even

    laterin the pipeline, incurring more stalls

  • 7/29/2019 Lec4b Supp

    33/56

    ID Branch Forwarding Issues

    is taken care of by thenormal RegFile write

    ,MEM add2 $3,EX add1 $4,ID be 1 2 LooIF next _seq_i nst r

    EX/MEM pipeline stage to

    the ID comparison

    ,MEM add2 $1,EX add1 $4,

    ar ware or cases e IF next _seq_i nst r

    i f ( I Dcont r ol . Br anchand ( EX/ MEM. Regi st erRd ! = 0) Forwards the

    and ( EX MEM. Regi st er Rd = I F I D. Regi st er Rs) )For war dC = 1

    i f ( I Dcont r ol . Br anchand EX/ MEM. Re i st er Rd ! = 0

    second

    previous instr.

    CSE431 Chapter 4B.33 Irwin, PSU, 2008

    and ( EX/ MEM. Regi st er Rd = I F/ I D. Regi st er Rt ) )For war dD = 1

    of the compare

  • 7/29/2019 Lec4b Supp

    34/56

    ID Branch Forwarding Issues, cont

    e ns ruc on mme a e ybefore the branch producesone of the branch source

    WB add3 $3,MEM add2 $4,EX add1 $1,

    operands, then a stall needsto be inserted (between the

    eq , , oop

    IF next _seq_i nst r

    occurring at the same time as the ID stage branch

    compare operationz Bounce the beq (in ID) and next_seq_instr (in IF) in place

    (ID Hazard Unit deasserts PC. Wr i t e and I F/ I D. Wr i t e)

    z Insert a stall between the add in the EX stage and the beq in

    the ID stage by zeroing the control bits going into the ID/EXpipeline register (done by the ID Hazard Unit)

    CSE431 Chapter 4B.34 Irwin, PSU, 2008

    ,instruction currently in IF (I F. Fl ush)

  • 7/29/2019 Lec4b Supp

    35/56

    Supporting ID Stage BranchesBranch

    PCSrc

    ID/EXEX/MEM

    Hazard

    Unit10

    4 Shift

    left 2

    Add

    IF/ID

    MEM/WB

    Control

    pare

    Add

    lush

    ReadAddress

    InstructionMemory

    P

    C

    Read Addr 1

    Read Addr 2

    Write Addr

    RegFile

    Read Data 1 ALU

    DataMemory

    Read Data

    Co

    IF.F

    0

    Write Data

    ReadData 2

    16

    Write Data

    SignExtend

    ALU

    cntrl

    Forward

    ForwardUnit

    CSE431 Chapter 4B.35 Irwin, PSU, 2008

    Unit

  • 7/29/2019 Lec4b Supp

    36/56

    Delayed Branches

    ,then we can eliminate all branch stalls with delayedbranches which are defined as always executing the next

    branch takes effect afterthat next instructionz MIPS compiler moves an instruction to immediately after the

    thereby hiding the branch delay

    With deeper pipelines, the branch delay grows requiringmore than one delay slot

    expensive but more flexible (dynamic) hardware branch prediction

    z Growth in available transistors has made hardware branch

    CSE431 Chapter 4B.36 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    37/56

    Scheduling Branch Delay Slots

    add $1, $2, $3i f $2=0 t hen

    . . .add $1, $2, $3i f $1=0 t hen

    sub $4, $5, $6

    add $1, $2, $3

    i f $1=0 t hendelay slot

    sub $4, $5, $6

    becomes becomes becomes

    i f $2=0 t hen

    add $1, $2, $3

    i f $1=0 t hen

    add $1,$2,$3add $1, $2, $3i f $1=0 t hen

    sub $4,$5,$6

    A is the best choice, fills delay slot and reduces IC

    , ,

    CSE431 Chapter 4B.37 Irwin, PSU, 2008

    n an , e sub ns ruc on may nee o e cop e , ncreas ng

    In B and C, must be okay to execute sub when branch fails

  • 7/29/2019 Lec4b Supp

    38/56

    Static Branch Prediction

    and proceeding without waiting to see the actual branchoutcome

    1. Predict not taken always predict branches will not betaken, continue to fetch from the sequential instruction,

    z If taken, flush instructions afterthe branch (earlier in the pipeline)

    - in IF, ID, and EX stages if branch logic in MEM three stalls- In IF and ID stages if branch logic in EX two stalls

    - in IF stage if branch logic in ID one stall

    z ensure that those flushed instructions havent changed themac ne s a e au oma c n e p pe ne s nce mac nestate changing operations are at the tail end of the pipeline(MemWrite (in MEM) or RegWrite (in WB))

    CSE431 Chapter 4B.38 Irwin, PSU, 2008

  • 7/29/2019 Lec4b Supp

    39/56

    Flushing with Misprediction (Not Taken)

    4 beq $1,$2,2In

    ALUIM Reg DM Reg

    s

    tr.

    AL

    UIM Reg DM Reg

    8 sub $4,$1,$5

    O

    r

    de

    r

    To flush the IF stage instruction, assert I F. Fl ush tozero the instruction field of the IF/ID pipeline register

    CSE431 Chapter 4B.39 Irwin, PSU, 2008

    rans orm ng n o a noop

  • 7/29/2019 Lec4b Supp

    40/56

    Flushing with Misprediction (Not Taken)

    4 beq $1,$2,2In

    ALUIM Reg DM Reg

    flush

    s

    tr.

    AL

    UIM Reg DM Reg

    8 sub $4,$1,$5

    O

    r

    d

    16 and $6,$1,$7LUIM Reg DM Reg

    A

    e

    r

    or r , , U

    To flush the IF stage instruction, assert I F. Fl ush tozero the instruction field of the IF/ID pipeline register

    CSE431 Chapter 4B.40 Irwin, PSU, 2008

    (transforming it into a noop)

  • 7/29/2019 Lec4b Supp

    41/56

    Branching Structures

    branching structures

    Loop: beq $1, $2, Out1nd l oop i nst r

    .

    ..

    l ast l oop i nst r

    bottom of the loop to return to thetop of the loop and incur theum stall overhead

    j LoopOut : f al l out i nstr

    loop branching structures Loop: 1st l oop i nst r

    2nd l oop i nst r.

    .

    .

    l ast l oop i nst r

    CSE431 Chapter 4B.41 Irwin, PSU, 2008

    ne , , oopf al l out i nst r

  • 7/29/2019 Lec4b Supp

    42/56

    Static Branch Prediction, cont

    and proceeding

    .

    z Predict taken always incurs one stall cycle (if branchdestination hardware has been moved to the ID stage)

    z Is there a way to cache the address of the branch targetinstruction ??

    As the branch penalty increases (for deeper pipelines),a simple static prediction scheme will hurt performance.

    more ar ware, s poss e o ry o pre cbranch behaviordynamically during program execution

    -

    CSE431 Chapter 4B.42 Irwin, PSU, 2008

    . time using run-time information

  • 7/29/2019 Lec4b Supp

    43/56

    Dynamic Branch Prediction

    (BHT)) in the IF stage addressed by the lower bits of thePC, contains bit(s) passed to the ID stage through the

    taken the last time it was executez Prediction bit may predict incorrectly (may be a wrong prediction

    with the same low order PC bits) but the doesnt affectcorrectness, just performance

    - Branch decision occurs in the ID sta e after determinin that thefetched instruction is a branch and checking the prediction bit(s)

    z If the prediction is wrong, flush the incorrect instruction(s) inpipeline, restart the pipeline with the right instruction, and invert

    e pre c on s

    - A 4096 bit BHT varies from 1% misprediction (nasa7, tomcatv) to18% (eqntott)

    CSE431 Chapter 4B.43 Irwin, PSU, 2008

    B h T t B ff

  • 7/29/2019 Lec4b Supp

    44/56

    Branch Target Buffer

    ,tell where its taken to!

    z A branch target buffer (BTB) in the IF stage caches the branch,

    instruction. The prediction bit in IF/ID selects which next

    instruction will be loaded into IF/ID at the next clock edge- Would need a two read port

    instruction memory

    BTB

    z Or the BTB can cache thebranch taken instruction while theinstruction memory is fetching the Read

    InstructionMemory

    C0

    If the rediction is correct stalls can be avoided no matter

    next sequential instruction

    Address

    CSE431 Chapter 4B.44 Irwin, PSU, 2008

    which direction they go

    1 bit P di ti A

  • 7/29/2019 Lec4b Supp

    45/56

    1-bit Prediction Accuracy

    -z Assume predict_bit = 0 to start (indicating

    branch not taken) and loop control is ate o om o e oop co e

    1. First time through the loop, the predictormispredicts the branch since the branch is

    oop: oop ns r2nd l oop i nst r

    .

    .

    a en ac o e op o e oop; nverprediction bit (predict_bit = 1)

    2. As long as branch is taken (looping),

    .

    l ast l oop i nst rbne $1, $2, Loop

    f al l out i nst rprediction is correct

    3. Exiting the loop, the predictor againmispredicts the branch since this time the

    For 10 times throu h the loo we have a 80 rediction

    branch is not taken falling out of the loop;invert prediction bit (predict_bit = 0)

    CSE431 Chapter 4B.45 Irwin, PSU, 2008

    accuracy for a branch that is taken 90% of the time

    2 bit P di t

  • 7/29/2019 Lec4b Supp

    46/56

    2-bit Predictors

    -must be wrong twice before the prediction bit is changed

    Taken

    oop: oop ns r2nd l oop i nst r

    .

    .

    PredictTaken

    PredictTaken

    Not taken

    Taken

    .

    l ast l oop i nst rbne $1, $2, Loop

    f al l out i nst r

    Predict

    Not taken

    Not taken

    Taken

    Not Taken Not TakenNot taken

    Taken

    CSE431 Chapter 4B.46 Irwin, PSU, 2008

    2 bit P di t

  • 7/29/2019 Lec4b Supp

    47/56

    2-bit Predictors

    -must be wrong twice before the prediction bit is changed

    Taken

    oop: oop ns r2nd l oop i nst r

    .

    .wrong on loopfall out

    PredictTaken

    PredictTaken

    Not taken

    Taken

    .

    l ast l oop i nst rbne $1, $2, Loop

    f al l out i nst r

    11

    1011

    Predict

    Not taken

    Not taken

    Taken right on 1s

    iteration

    000

    Not Taken Not TakenNot taken

    Taken

    stores theinitial FSM

    01

    CSE431 Chapter 4B.47 Irwin, PSU, 2008

    state

    Dealing with Exceptions

  • 7/29/2019 Lec4b Supp

    48/56

    Dealing with Exceptions

    control hazard. Exceptions arise fromz R-type arithmetic overflow

    z ry ng o execu e an un e ne ns ruc on

    z

    An I/O device requestz An OS service request (e.g., a page fault, TLB exception)

    z

    The pipeline has to stop executing the offending

    instruction in midstream, let all prior instructionscomp e e, us a o ow ng ns ruc ons, se a reg s er oshow the cause of the exception, save the address of theoffending instruction, and then jump to a prearranged

    a ress e a ress o e excep on an er co e The software (OS) looks at the cause of the exception

    and deals with it

    CSE431 Chapter 4B.48 Irwin, PSU, 2008

    Two Types of Exceptions

  • 7/29/2019 Lec4b Supp

    49/56

    Two Types of Exceptions

    z caused by external events

    z may be handled between instructions, so can let theinstructions currently active in the pipeline complete before

    passing control to the OS interrupt handlerz simply suspend and resume user program

    Traps (Exception) synchronous to program execution

    z caused by internal events

    z condition must be remedied by the trap handler forthatinstruction, so much stop the offending instruction midstream

    z the offending instruction may be retried (or simulated by theOS) and the program may continue or it may be aborted

    CSE431 Chapter 4B.49 Irwin, PSU, 2008

    Where in the Pipeline Exceptions Occur

  • 7/29/2019 Lec4b Supp

    50/56

    Where in the Pipeline Exceptions Occur

    ALUIM Reg DM Reg

    Arithmetic overflow

    Stage(s)? Synchronous?

    Undefined instruction

    I/O service request

    Hardware malfunction

    CSE431 Chapter 4B.50 Irwin, PSU, 2008

    Where in the Pipeline Exceptions Occur

  • 7/29/2019 Lec4b Supp

    51/56

    Where in the Pipeline Exceptions Occur

    ALUIM Reg DM Reg

    Arithmetic overflow

    Stage(s)? Synchronous?

    EX yes

    Undefined instruction yesID

    IF MEM

    I/O service request noany

    Hardware malfunction noany

    CSE431 Chapter 4B.51 Irwin, PSU, 2008

    eware a mu p e excep ons can occursimultaneously in a single clock cycle

    Multiple Simultaneous Exceptions

  • 7/29/2019 Lec4b Supp

    52/56

    Multiple Simultaneous Exceptions

    I Inst 0ALUIM Reg DM Reg

    n

    st Inst 1ALUIM Reg DM Reg

    .

    O

    r

    Inst 2ALUIM Reg DM Reg

    e

    rInst 3

    ALUIM Reg DM Reg

    Inst 4

    LUIM Reg DM Reg

    CSE431 Chapter 4B.52 Irwin, PSU, 2008

    Hardware sorts the exceptions so that the earliest

    instruction is the one interrupted first

    Multiple Simultaneous Exceptions

  • 7/29/2019 Lec4b Supp

    53/56

    Multiple Simultaneous Exceptions

    I Inst 0ALUIM Reg DM Reg

    n

    st Inst 1ALUIM Reg DM Reg

    D$ page fault

    .

    O

    r

    Inst 2ALUIM Reg DM Reg

    ar me c over ow

    e

    rInst 3

    ALUIM Reg DM Reg

    Inst 4

    LUIM Reg DM Reg

    I$ page fault

    CSE431 Chapter 4B.53 Irwin, PSU, 2008

    Hardware sorts the exceptions so that the earliest

    instruction is the one interrupted first

    Additions to MIPS to Handle Exceptions (Fig 6 42)

  • 7/29/2019 Lec4b Supp

    54/56

    Additions to MIPS to Handle Exceptions (Fig 6.42)

    in Cause the exceptions and a signal to control writes to it(CauseWr i t e)

    EPC register (records the addresses of the offending

    instructions) hardware to record in EPC the address ofthe offendin instruction and a si nal to control writes to it(EPCWr i t e)

    z Exception software must match exception to instruction

    A way to load the PC with the address of the exceptionhandler

    the exception handler address - (e.g., 8000 0180hex for arithmeticoverflow)

    CSE431 Chapter 4B.54 Irwin, PSU, 2008

    follow it

    Datapath with Controls for Exceptions

  • 7/29/2019 Lec4b Supp

    55/56

    Datapath with Controls for ExceptionsBranch

    PCSrc

    ID/EXEX/MEM

    Hazard

    Unit10

    8000 0180hexEX.Flush

    0

    ID.Flush

    4 Shift

    left 2

    Add

    IF/ID

    MEM/WB

    Control

    pareAdd

    lush

    CauseEPC

    0

    Read

    Address

    InstructionMemory

    PC

    Read Addr 1

    Read Addr 2

    Write Addr

    RegFile

    Read Data 1

    ALU

    DataMemory

    Read Data

    Co

    IF.F

    0

    Write Data

    ReadData 2

    16

    Write Data

    SignExtend

    ALU

    cntrl

    Forward

    ForwardUnit

    CSE431 Chapter 4B.55 Irwin, PSU, 2008

    Unit

    Summary

  • 7/29/2019 Lec4b Supp

    56/56

    Summary

    All modern day processors use pipelining forperformance (a CPI of 1 and fast a CC)

    Pi eline clock rate limited b slowest i eline sta e so designing a balanced pipeline is important

    Must detect and resolve hazards

    correctly

    z Data hazards

    - Stall (impacts CPI)- Forward (requires hardware support)

    z Control hazards put the branch decision hardware in asearly a stage in the pipeline as possible

    - a mpac s

    - Delay decision (requires compiler support)- Static and dynamic prediction (requires hardware support)

    CSE431 Chapter 4B.56 Irwin, PSU, 2008