Computer Architecture Chapter 4-2 The Processor: Enhancing …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-2-1.pdf · 2018-10-29 · Computer Pipelining Hazards • Structural

1성균관대소프트웨어대학신동군

Computer Architecture

Chapter 4-2

The Processor:Enhancing Performance with Pipelining


Pipelining is Natural!

• Laundry Example

• Ann, Brian, Cathy and Don each have one load of clothes to wash, dry, fold, and store

• Washer takes 30 minutes

• Dryer takes 30 minutes

• Folder takes 30 minutes

• Storer takes 30 minutes to put clothes into drawers


Sequential Laundry

• Sequential laundry takes 8 hours for 4 loads

• If they learned pipelining, how long would laundry take?

Task

Ord

er


Pipelined Laundry: Start work ASAP

• Pipelined laundry takes 3.5 hours for 4 loads!

Task

Ord

er


Pipelining Lessons

• Pipelining doesn’t help latency of single task, it helps throughput of entire workload

• Multiple tasks operating simultaneously using different resources

• Potential speedup

= Number pipeline stages

• Pipeline rate limited by slowest pipeline stage

• Unbalanced lengths of pipe stages reduces speedup

• Time to fill pipeline and time to drain it reduces speedup

• Stall for Dependences

Task

Ord

er


Pipelining

Cafeteria Assembly line


The Five Stages of Load

• Ifetch: Instruction Fetch

– Fetch the instruction from the Instruction Memory

• Reg/Dec: Registers Fetch and Instruction Decode

• Exec: Calculate the memory address

• Mem: Read the data from the Data Memory

• Wr: Write the data back to the register file


Pipelining

• Improve performance by increasing instruction throughput

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)


Ideal Speedup

• Q: Ideal speedup is number of stages in the pipeline.

• Do we achieve this?

– Imperfect balance & Pipeline overhead

• For 1003 instructions:

– non-pipelined: 1003*800ps = 802,400ps

– pipelined: 1003*200ps+4*200ps = 201,400ps

Pipelining: improves by increasing instruction throughputNOT by decreasing the exec. time of an individual instruction

Time between instructionspipelined

Time between instructionsno-pipelined

Number of pipe stages=

8024

2014

8

2


Instruction Set for Pipelining

• MIPS is made for pipelining!

– All instructions are the same length:

• Helps Ifetch & Decode State

– A few instruction formats & fixed source operands fields

• Can read operands & decode opcode at the same time

– Memory operands only appear in loads/stores

• Calculate the memory address in Execute stage

– Operands must be aligned in memory

• One data memory access for a single data transfer instruction


Can pipelining get us into trouble?

• Yes: Pipeline Hazards

– structural hazards: attempt to use the same resource two different ways at the same time

– data hazards: attempt to use item before it is ready

• instruction depends on result of prior instruction still in the pipeline

– control hazards: attempt to make a decision before condition is evaluated

• branch instructions

• Can always resolve hazards by waiting

– pipeline control must detect the hazard

– take action (or delay action) to resolve hazards


Computer Pipelining Hazards

• Structural Hazards

– Conflict for use of a resource

– With a single memory, Instruction fetch would have to stall

– Cause a pipeline “bubble”

– Require separate instruction/data memories (or caches)



• Data Hazards


Forwarding/Bypassing

• Observation:

– Don’t need to wait for the instruction to complete.

– As soon as the ALU computes the sum for the add, we can supply it as an input for the subtract


Forwarding/Bypassing

• Load-Use Data Hazard

• Can’t always avoid stalls by forwarding

Stall even with forwarding


Pipeline Scheduling

• Compiler can re-schedule instructions to avoid stalls

Assumption: No memory-to-memory forwarding


Pipeline Scheduling

• Compiler can re-schedule instructions to avoid stalls



• Control Hazards: Bubble or Pipeline Stall on Branch

Assumption: In the second stage,branch is resolved,target address computed, andthe PC updated



• Control Hazards: Branch prediction

Prediction

correct

Prediction

incorrect



• Control Hazards: Delayed branch

– Solution used in MIPS

– If you run SPIM in a bare mode, you must pay attention to delayed branches

– Compiler fills delayed branch slots


Single-Cycle Datapath

• What do we need to add to actually split the datapath into stages?


Pipelined Datapath


LW: IF Stage


LW: ID Stage


LW: EX Stage


LW: MEM Stage


LW: WB Stage

Can you find a BUG here?


Corrected Version for Load


EX for Store


MEM for Store


WB for Store


Graphically Representing Pipelines

• Can help with answering questions like:

– how many cycles does it take to execute this code?

– what is the ALU doing during cycle 4?

– use this representation to help understand datapaths


Conventional Pipelined Execution Representation


Single-Cycle Pipeline Diagram


Single Cycle, Multiple Cycle, vs. Pipeline


Why Pipeline?

• Suppose we execute 100 instructions

• Single Cycle Machine

– 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

• Multicycle Machine

– 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns

• Ideal pipelined machine

– 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns


Why Pipeline? Because the resources are there!


Pipelined Control


Pipelined Control

• Divide control lines into five groups according to pipeline stages

• What needs to be controlled in each stage?

– Instruction Fetch and PC Increment

– Instruction Decode / Register Fetch

– Execution

– Memory Stage

– Write Back


Pipelined Control

• Pass control signals along just like the data


Pipelined Control

Control signalsfor last three stagescreated in ID


Pipelined Execution Example

lw $10, 20($1)

sub $11, $2, $3

and $12, $4, $5

or $13, $6, $7

add $14, $8, $9


Clock 1


Clock 2


Clock 3


Clock 4


Clock 5


Clock 6


Clock 7


Clock 8


Clock 9

Documents

Computer Architecture Chapter 4-2 The Processor: Enhancing …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-2-1.pdf · 2018-10-29 · Computer Pipelining Hazards • Structural