Computer Architecture Chapter 4-2 The Processor: Enhancing...

Preview:

Citation preview

1성균관대소프트웨어대학신동군

Computer Architecture

Chapter 4-2

The Processor:Enhancing Performance with Pipelining

2성균관대소프트웨어대학신동군

Pipelining is Natural!

• Laundry Example

• Ann, Brian, Cathy and Don each have one load of clothes to wash, dry, fold, and store

• Washer takes 30 minutes

• Dryer takes 30 minutes

• Folder takes 30 minutes

• Storer takes 30 minutes to put clothes into drawers

3성균관대소프트웨어대학신동군

Sequential Laundry

• Sequential laundry takes 8 hours for 4 loads

• If they learned pipelining, how long would laundry take?

Task

Ord

er

4성균관대소프트웨어대학신동군

Pipelined Laundry: Start work ASAP

• Pipelined laundry takes 3.5 hours for 4 loads!

Task

Ord

er

5성균관대소프트웨어대학신동군

Pipelining Lessons

• Pipelining doesn’t help latency of single task, it helps throughput of entire workload

• Multiple tasks operating simultaneously using different resources

• Potential speedup

= Number pipeline stages

• Pipeline rate limited by slowest pipeline stage

• Unbalanced lengths of pipe stages reduces speedup

• Time to fill pipeline and time to drain it reduces speedup

• Stall for Dependences

Task

Ord

er

6성균관대소프트웨어대학신동군

Pipelining

Cafeteria Assembly line

7성균관대소프트웨어대학신동군

The Five Stages of Load

• Ifetch: Instruction Fetch

– Fetch the instruction from the Instruction Memory

• Reg/Dec: Registers Fetch and Instruction Decode

• Exec: Calculate the memory address

• Mem: Read the data from the Data Memory

• Wr: Write the data back to the register file

8성균관대소프트웨어대학신동군

Pipelining

• Improve performance by increasing instruction throughput

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

9성균관대소프트웨어대학신동군

Ideal Speedup

• Q: Ideal speedup is number of stages in the pipeline.

• Do we achieve this?

– Imperfect balance & Pipeline overhead

• For 1003 instructions:

– non-pipelined: 1003*800ps = 802,400ps

– pipelined: 1003*200ps+4*200ps = 201,400ps

Pipelining: improves by increasing instruction throughputNOT by decreasing the exec. time of an individual instruction

Time between instructionspipelined

Time between instructionsno-pipelined

Number of pipe stages=

8024

2014

8

2

10성균관대소프트웨어대학신동군

Instruction Set for Pipelining

• MIPS is made for pipelining!

– All instructions are the same length:

• Helps Ifetch & Decode State

– A few instruction formats & fixed source operands fields

• Can read operands & decode opcode at the same time

– Memory operands only appear in loads/stores

• Calculate the memory address in Execute stage

– Operands must be aligned in memory

• One data memory access for a single data transfer instruction

11성균관대소프트웨어대학신동군

Can pipelining get us into trouble?

• Yes: Pipeline Hazards

– structural hazards: attempt to use the same resource two different ways at the same time

– data hazards: attempt to use item before it is ready

• instruction depends on result of prior instruction still in the pipeline

– control hazards: attempt to make a decision before condition is evaluated

• branch instructions

• Can always resolve hazards by waiting

– pipeline control must detect the hazard

– take action (or delay action) to resolve hazards

12성균관대소프트웨어대학신동군

Computer Pipelining Hazards

• Structural Hazards

– Conflict for use of a resource

– With a single memory, Instruction fetch would have to stall

– Cause a pipeline “bubble”

– Require separate instruction/data memories (or caches)

13성균관대소프트웨어대학신동군

Computer Pipelining Hazards

• Data Hazards

14성균관대소프트웨어대학신동군

Forwarding/Bypassing

• Observation:

– Don’t need to wait for the instruction to complete.

– As soon as the ALU computes the sum for the add, we can supply it as an input for the subtract

15성균관대소프트웨어대학신동군

Forwarding/Bypassing

• Load-Use Data Hazard

• Can’t always avoid stalls by forwarding

Stall even with forwarding

16성균관대소프트웨어대학신동군

Pipeline Scheduling

• Compiler can re-schedule instructions to avoid stalls

Assumption: No memory-to-memory forwarding

17성균관대소프트웨어대학신동군

Pipeline Scheduling

• Compiler can re-schedule instructions to avoid stalls

18성균관대소프트웨어대학신동군

Computer Pipelining Hazards

• Control Hazards: Bubble or Pipeline Stall on Branch

Assumption: In the second stage,branch is resolved,target address computed, andthe PC updated

19성균관대소프트웨어대학신동군

Computer Pipelining Hazards

• Control Hazards: Branch prediction

Prediction

correct

Prediction

incorrect

20성균관대소프트웨어대학신동군

Computer Pipelining Hazards

• Control Hazards: Delayed branch

– Solution used in MIPS

– If you run SPIM in a bare mode, you must pay attention to delayed branches

– Compiler fills delayed branch slots

21성균관대소프트웨어대학신동군

Single-Cycle Datapath

• What do we need to add to actually split the datapath into stages?

22성균관대소프트웨어대학신동군

Pipelined Datapath

23성균관대소프트웨어대학신동군

LW: IF Stage

24성균관대소프트웨어대학신동군

LW: ID Stage

25성균관대소프트웨어대학신동군

LW: EX Stage

26성균관대소프트웨어대학신동군

LW: MEM Stage

27성균관대소프트웨어대학신동군

LW: WB Stage

Can you find a BUG here?

28성균관대소프트웨어대학신동군

Corrected Version for Load

29성균관대소프트웨어대학신동군

EX for Store

30성균관대소프트웨어대학신동군

MEM for Store

31성균관대소프트웨어대학신동군

WB for Store

32성균관대소프트웨어대학신동군

Graphically Representing Pipelines

• Can help with answering questions like:

– how many cycles does it take to execute this code?

– what is the ALU doing during cycle 4?

– use this representation to help understand datapaths

33성균관대소프트웨어대학신동군

Conventional Pipelined Execution Representation

34성균관대소프트웨어대학신동군

Single-Cycle Pipeline Diagram

35성균관대소프트웨어대학신동군

Single Cycle, Multiple Cycle, vs. Pipeline

36성균관대소프트웨어대학신동군

Why Pipeline?

• Suppose we execute 100 instructions

• Single Cycle Machine

– 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

• Multicycle Machine

– 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns

• Ideal pipelined machine

– 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

37성균관대소프트웨어대학신동군

Why Pipeline? Because the resources are there!

38성균관대소프트웨어대학신동군

Pipelined Control

39성균관대소프트웨어대학신동군

Pipelined Control

• Divide control lines into five groups according to pipeline stages

• What needs to be controlled in each stage?

– Instruction Fetch and PC Increment

– Instruction Decode / Register Fetch

– Execution

– Memory Stage

– Write Back

40성균관대소프트웨어대학신동군

Pipelined Control

• Pass control signals along just like the data

41성균관대소프트웨어대학신동군

Pipelined Control

Control signalsfor last three stagescreated in ID

42성균관대소프트웨어대학신동군

Pipelined Execution Example

lw $10, 20($1)

sub $11, $2, $3

and $12, $4, $5

or $13, $6, $7

add $14, $8, $9

43성균관대소프트웨어대학신동군

Clock 1

44성균관대소프트웨어대학신동군

Clock 2

45성균관대소프트웨어대학신동군

Clock 3

46성균관대소프트웨어대학신동군

Clock 4

47성균관대소프트웨어대학신동군

Clock 5

48성균관대소프트웨어대학신동군

Clock 6

49성균관대소프트웨어대학신동군

Clock 7

50성균관대소프트웨어대학신동군

Clock 8

51성균관대소프트웨어대학신동군

Clock 9

Recommended