Computer Architecture Chapter 4-2 The Processor: Enhancing...

1성균관대소프트웨어대학신동군

Computer Architecture

Chapter 4-2

The Processor:Enhancing Performance with Pipelining

Pipelining is Natural!

• Laundry Example

• Ann, Brian, Cathy and Don each have one load of clothes to wash, dry, fold, and store

• Washer takes 30 minutes

• Dryer takes 30 minutes

• Folder takes 30 minutes

• Storer takes 30 minutes to put clothes into drawers

Sequential Laundry

• Sequential laundry takes 8 hours for 4 loads

• If they learned pipelining, how long would laundry take?

Pipelined Laundry: Start work ASAP

• Pipelined laundry takes 3.5 hours for 4 loads!

Pipelining Lessons

• Pipelining doesn’t help latency of single task, it helps throughput of entire workload

• Multiple tasks operating simultaneously using different resources

• Potential speedup

= Number pipeline stages

• Pipeline rate limited by slowest pipeline stage

• Unbalanced lengths of pipe stages reduces speedup

• Time to fill pipeline and time to drain it reduces speedup

• Stall for Dependences

Pipelining

Cafeteria Assembly line

The Five Stages of Load

• Ifetch: Instruction Fetch

– Fetch the instruction from the Instruction Memory

• Reg/Dec: Registers Fetch and Instruction Decode

• Exec: Calculate the memory address

• Mem: Read the data from the Data Memory

• Wr: Write the data back to the register file

Pipelining

• Improve performance by increasing instruction throughput

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

Ideal Speedup

• Q: Ideal speedup is number of stages in the pipeline.

• Do we achieve this?

– Imperfect balance & Pipeline overhead

• For 1003 instructions:

– non-pipelined: 1003*800ps = 802,400ps

– pipelined: 1003*200ps+4*200ps = 201,400ps

Pipelining: improves by increasing instruction throughputNOT by decreasing the exec. time of an individual instruction

Time between instructionspipelined

Time between instructionsno-pipelined

Number of pipe stages=

Instruction Set for Pipelining

• MIPS is made for pipelining!

– All instructions are the same length:

• Helps Ifetch & Decode State

– A few instruction formats & fixed source operands fields

• Can read operands & decode opcode at the same time

– Memory operands only appear in loads/stores

• Calculate the memory address in Execute stage

– Operands must be aligned in memory

• One data memory access for a single data transfer instruction

Can pipelining get us into trouble?

• Yes: Pipeline Hazards

– structural hazards: attempt to use the same resource two different ways at the same time

– data hazards: attempt to use item before it is ready

• instruction depends on result of prior instruction still in the pipeline

– control hazards: attempt to make a decision before condition is evaluated

• branch instructions

• Can always resolve hazards by waiting

– pipeline control must detect the hazard

– take action (or delay action) to resolve hazards

Computer Pipelining Hazards

• Structural Hazards

– Conflict for use of a resource

– With a single memory, Instruction fetch would have to stall

– Cause a pipeline “bubble”

– Require separate instruction/data memories (or caches)

• Data Hazards

Forwarding/Bypassing

• Observation:

– Don’t need to wait for the instruction to complete.

– As soon as the ALU computes the sum for the add, we can supply it as an input for the subtract

Forwarding/Bypassing

• Load-Use Data Hazard

• Can’t always avoid stalls by forwarding

Stall even with forwarding

Pipeline Scheduling

• Compiler can re-schedule instructions to avoid stalls

Assumption: No memory-to-memory forwarding

Pipeline Scheduling

• Compiler can re-schedule instructions to avoid stalls

• Control Hazards: Bubble or Pipeline Stall on Branch

Assumption: In the second stage,branch is resolved,target address computed, andthe PC updated

• Control Hazards: Branch prediction

Prediction

correct

Prediction

incorrect

• Control Hazards: Delayed branch

– Solution used in MIPS

– If you run SPIM in a bare mode, you must pay attention to delayed branches

– Compiler fills delayed branch slots

Single-Cycle Datapath

• What do we need to add to actually split the datapath into stages?

Pipelined Datapath

LW: IF Stage

LW: ID Stage

LW: EX Stage

LW: MEM Stage

LW: WB Stage

Can you find a BUG here?

Corrected Version for Load

EX for Store

MEM for Store

WB for Store

Graphically Representing Pipelines

• Can help with answering questions like:

– how many cycles does it take to execute this code?

– what is the ALU doing during cycle 4?

– use this representation to help understand datapaths

Conventional Pipelined Execution Representation

Single-Cycle Pipeline Diagram

Single Cycle, Multiple Cycle, vs. Pipeline

Why Pipeline?

• Suppose we execute 100 instructions

• Single Cycle Machine

– 45 ns/cycle x 1 CPI x 100 inst = 4500 ns

• Multicycle Machine

– 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns

• Ideal pipelined machine

– 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Why Pipeline? Because the resources are there!

Pipelined Control

• Divide control lines into five groups according to pipeline stages

• What needs to be controlled in each stage?

– Instruction Fetch and PC Increment

– Instruction Decode / Register Fetch

– Execution

– Memory Stage

– Write Back

Pipelined Control

• Pass control signals along just like the data

Pipelined Control

Control signalsfor last three stagescreated in ID

Pipelined Execution Example

lw $10, 20($1)

sub $11, $2, $3

and $12, $4, $5

or $13, $6, $7

add $14, $8, $9

Clock 1

Clock 2

Clock 3

Clock 4

Clock 5

Clock 6

Clock 7

Clock 8

Clock 9

Computer Architecture Chapter 4-2 The Processor: Enhancing...

Documents

Analysis of Stability & Steady -State Errorswebstaff.kmutt.ac.th/~sarawan.won/AE/INC341/Lec4.pdf · Title: Microsoft PowerPoint - Lec4-ss and stability (AE).ppt [Compatibility Mode]

Is101 lec4

(19) 대한민국특허청(KR) (12) 공개특허공보(A)nyx.skku.ac.kr/publications/papers/10_2008_0080510.pdf · 갖는 플래시 메모리는 mp3 플레이어, 디지털 카메라

Транспортные системы горных предприятийtst.nmu.org.ua/ua/Lekcia/Lec4.pdf1 Лекция 4 Раздел ІІ Транспортные системы

Computer Core Practice1: Operating System Week3. System Call …nyx.skku.ac.kr/wp-content/uploads/2017/09/3주차-System... · 2017-09-11 · 25 1 Embedded Software Lab. 진주영,

Ecom lec4 fall16_jpa

Jurano강의 lec4 android_annotations_application

Lec4 Clustering

Computer Essential: 1. Computer e Dispositivi INTRODUZIONE · COS’E IL COMPUTER? INPUT COMPUTER OUTPUT ECDL Computer Essential: 1. Computer e Dispositivi Un insieme di dispositivi

Computer History Museum...Walter Orvedahl Computer Project Sigsby Rusk Computer Project Martin Graham Computer Project Ruth Patterson Computer project Kenneth Watson Computer Project

Rev0hbh Lec4 ADDER

Bio3 0910 Lec4 Meiosis

Cormen Algo-lec4

Відеоадаптер. Програмування відеоадаптераmmi.stu.cn.ua/wp-content/uploads/2016/09/lec4...здатністю і кількістю кольорів

CS2106 Lec4 IPC I

PENGANTAR APLIKASI KOMPUTER - E-Learning … · Web viewMini Computer Small Computer Medium Computer Large Computer Super Computer Virus Computer Local Area Network Internet (Komunikasi

(19) 대한민국특허청(KR) (12) 공개특허공보(A)nyx.skku.ac.kr/wp-content/uploads/2016/01/... · 2016-01-25 · (43) 공개일자 2016년01월07일 (51) 국제특허분류(Int

lec4-sem5-CVSwk3-year3-20120505 (3)

Es95d Lec4 Xxxxxxxxxx

Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군