GAS STATION Pipelining & Hazards IILecture 4 EECS 470 Slide 6 © Wenisch 2016 -- Portions ©...

Lecture 4 Slide 1 EECS 470

EECS 470

Lecture 4

Pipelining & Hazards II Winter 2021

Jon Beaumont

http://www.eecs.umich.edu/courses/eecs470

GAS STATION

Slides developed in part by Profs. Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Martin, Roth, Shen, Smith, Sohi, Tyson, Vijaykumar, and Wenisch of Carnegie Mellon University, Purdue University, University of Michigan, University of Pennsylvania, and University of Wisconsin.

Class Question

Which of the following best explains why pipelining results

in speedup?

a) Instructions are executed with shorter latency

b) Clock period is reduced

c) More instructions are executed at the same time

d) Magnets

Announcements

• Reminder Lab #1 due tomorrow by 12:30p

Get checked off by GSI/IA

Verilog assignment #1 due tomorrow Submit to autograder by 11:59p

HW # 1 due Thursday 2/4 Submit through Gradescope by 11:59p

• I have OH today from 3-4 OH format for all staff: Join Zoom link, put yourself on Office Hour

Queue You will be let into a breakout room when you are at the head

Last Time

• Baseline processor discussion Review 5-stage pipeline from EECS 370

• Hazards Detection Resolution

Software (avoidance) Hardware (stalling, forwarding)

Lingering Questions

• "How recent was the pipeline method developed? What will be the next best method?" Basic pipelines have been used since the very early days of

computing (1930s) Deep pipelines became very popular with vector processors in the

1970s Less popular know we'll discuss why

Recent trends have been not towards better performance, but

better reliability and power-effeciency EECS 573 (Microarchitectures) covers a lot of these interesting topics

• Remember, you can submit lingering questions to cover next lecture at: https://bit.ly/3oSr5FD

Balancing Pipeline Stages

TIF= 6 units

TID= 2 units

TEX= 9 units

TMEM= 5 units

TWB= 8 units

Can we do better in terms of either performance or efficiency?

Balancing Pipeline Stages

Two Methods for Stage Quantization: Merging of multiple stages Further subdividing a stage

Recent Trends: Deeper pipelines (more and more stages)

Pipeline depth growing more slowly since Pentium 4. Why?

Multiple pipelines Pipelined memory/cache accesses (tricky)

The Cost of Deeper Pipelines

Instruction pipelines are not ideal i.e. Instructions in different stages can have dependencies

Suppose add 1 2 3

nand 3 4 5

F D E M W F D E M W

t0 t1 t2 t3 t4 t5

Inst0 Inst1

F D E M W F D E M W

t0 t1 t2 t3 t4 t5

add nand E Stall

F E M D Stall D

(read-after-write

dependency)

Terminology

Pipeline Hazards: Potential violations of program dependences Must ensure program dependences are not violated

Hazard Resolution: Static Method: Performed at compiled time in software Dynamic Method: Performed at run time using hardware

Pipeline Interlock: Hardware mechanisms for dynamic hazard resolution Must detect and enforce dependences at run time

Handling Data Hazards

Avoidance (static) Make sure there are no hazards in the code

Detect and Stall (dynamic) Stall until earlier instructions finish

Detect and Forward (dynamic) Get correct value from elsewhere in pipeline

Handling Data Hazards: Avoidance

Programmer/compiler must know implementation details Insert noops between dependent instructions

add 1 2 3 noop noop nand 3 4 5

write R3 in cycle 5

read R3 in cycle 6

Problems with Avoidance

Binary compatibility New implementations may require more noops

Code size Higher instruction cache footprint Longer binary load times Worse in machines that execute multiple instructions / cycle

Intel Itanium – 25-40% of instructions are noops

Slower execution CPI=1, but many instructions are noops

Handling Data Hazards: Detect & Stall

Detection Compare regA & regB with DestReg of preceding insn.

3 bit comparators

Stall Do not advance pipeline register for Fetch/Decode Pass noop to Execute

Which of the "Avoidance" issues does "Detect & Stall" fix? (select all)

a) Binary compatibility

b) Code size

c) Slower execution

PC Inst

memory

Bits 0-2

Bits 16-18

offset

PC+1 PC+1

target

result

eq? instru

Bits 22-24

Fetch Decode Execute Memory WB

PC Inst

memory

offset

PC+1 PC+1

target

result

eq? instru

Fetch Decode Execute Memory WB

PC Inst

memory

offset

PC+1 PC+1

target

result

End of Cycle 1

PC Inst

memory

PC+1 PC+1

target

result

eq? na

End of Cycle 2

Hazard detection

PC Inst

memory

PC+1 PC+1

target

result

eq? na

First half of cycle 3

compare

Hazard

detected

compare

compare compare

Hazard

detected

compare

Hazard

PC Inst

memory

target

result

eq? na

PC Inst

memory

result

7 10 11

End of cycle 3

Hazard

PC Inst

memory

result

7 10 11

PC Inst

memory

7 10 11

End of cycle 4

No Hazard

PC Inst

memory

7 10 11

PC Inst

memory

7 21 11 77

5 data

End of cycle 5

Problems with Detect & Stall

CPI increases on every hazard

Are these stalls necessary? Not always! The new value for R3 is in the EX/Mem register

Reroute the result to the nand Called “forwarding” or “bypassing”

Handling Data Hazards: Detect & Forward

Detection Same as detect and stall, but…

each possible hazard requires different forwarding paths

Forward Add data paths for all possible sources Add mux in front of ALU to select source

“bypassing logic” often a critical path in wide-issue machines I.e. superscalar machines # paths grows quadratically with machine width

Sample Code Reminder

Run the following code on a pipelined datapath: nand 3 4 5 ; reg 5 = reg 3 ~& reg 4 add 6 3 7 ; reg 7 = reg 6 + reg 3 lw 3 6 10 ; reg 6 = Mem[reg3+10] sw 6 2 12 ; Mem[reg6+10] =reg 2

Poll: How many data dependencies are here? How many stalls will we see?

Hazard

PC Inst

memory

7 10 11 77

fwd fwd fwd

PC Inst

memory

7 10 11 77

5 data

End of cycle 3

New Hazard

PC Inst

memory

7 10 11 77

5 data

PC Inst

memory

7 10 11 77

7 5 3 data

End of cycle 4

PC Inst

memory

7 10 11 77

7 5 3 data

3 No Hazard

PC Inst

memory

7 21 11 77

7 5 data

End of cycle 5

PC Inst

memory

7 21 11 77

Hazard

PC Inst

memory

7 21 11 -2

6 7 data

End of cycle 6

PC Inst

memory

7 21 11 -2

6 7 data

Hazard

PC Inst

memory

7 21 11 -2

6 data

End of cycle 7

PC Inst

memory

7 21 11 -2

6 data

PC Inst

memory

7 21 11 -2

End of cycle 8

Control Hazards

beq 1 1 10

sub 3 4 5

F D E M W

t0 t1 t2 t3 t4 t5

beq sub squash

Handling Control Hazards

Avoidance (static) No branches? Convert branches to predication

Control dependence becomes data dependence

Detect and Stall (dynamic) Stop fetch until branch resolves

Speculate and squash (dynamic) Keep going past branch, throw away instructions if wrong

Avoidance: if-conversion

if (a == b) {

y = n / d;

sub t1 a, b

jnz t1, PC+2

add x x, #1

div y n, d

sub t1 a, b

add(t1) x x, #1

div(t1) y n, d

sub t1 a, b

add t2 x, #1

div t3 n, d

cmov(t1) x t2

cmov(t1) y t3

If you're interested:

https://en.wikipedia.org/wiki/Predication_(computer_architecture)

Handling Control Hazards: Detect & Stall

Detection In decode, check if opcode is branch or jump

Stall Hold next instruction in Fetch Pass noop to Decode

Problems with Detect & Stall

CPI increases on every branch

Are these stalls necessary? Not always! Branch is only taken half the time

Assume branch is NOT taken Keep fetching, treat branch as noop If wrong, make sure bad instructions don’t complete

Handling Control Hazards: Speculate & Squash

Speculate Assume branch is not taken

Squash Overwrite opcodes in Fetch, Decode, Execute with noop Pass target to Fetch

PC REG

memory

Control

Problems with Speculate & Squash

Always assumes branch is not taken

Can we do better? Yes. Predict branch direction and target! Why possible? Program behavior repeats.

GAS STATION Pipelining & Hazards IILecture 4 EECS 470 Slide 6 © Wenisch 2016 -- Portions ©...

Documents

Línea de investigación en BIOCHAR - unizar.es · •Reducción de las emisiones de CO 2 (ciclo negativo de C) Sohi, S.; et al. Biochar, Climate Change and Soil: A Review to Guide

Memória para CADaleardo/cursos/hpc/memoria2020.… · Memória RAM 220ns. Outro exemplo (muito antigo!!) (Hwang,1993) Outro exemplo (mais novo...) (Shen,Lipasti,2005/13) Operação

Antonia Wenisch Richard Kromp - Umweltbundesamt · 55 Evaluation NPP CERNAVODA 3/4 BILATERAL CONSULTATION Antonia Wenisch Richard Kromp REPORT REP-0149 Wien, 2008

2015. 05. 12. Bias-Free Branch Predictor 2014 47 th annual IEEE/ACM Int. Symposium on Microarchitecture Dibakar Gope and Mikko H. Lipasti 2015. 05. 12

Modern Processor Design Fundamentals of Superscalar Processors by John Paul Shen and Mikko H. Lipasti

wenisch tonis food drinks sommerkarte · vorspeisen & brot seite 7 suppen & bowls 8 fisch, veggie & vegan 9 grÜnzeug seite 11 burger & grill 13 steaks 15 from nose to tail seite

EVALUACIÓN DE EFECTOS DE VARIOS TIPOS DE … · de suelos y facilitar así el secuestro del carbono (Sohi et al., 2009). ... Suelo: corresponde a una mezcla de la capa arable (horizonte

VENTILATOR ASSOCIATED PNEUMONIA SEBAGAI … · Publikasi atau keseluruhan isi tesis pada jurnal atau forum ilmiah lain harus seijin dan ... NGT : Nasogastric Tube SOHI : Simplified

EECS 470 Lecture 13 Basic Caches · Lecture 13 EECS 470 Slide 1 © Wenisch 2016 -- Portions © Austin, Brehob, Falsafi, Hill, Hoe, Lipasti, Shen, Smith, Sohi, Tyson, Vijaykumar

· PDF filesunita shal narain desh bandhu goel ramalingam kumaraswamy girish kumar maheshwari shreekant chatterjee harjit singh sohi

FlashKuKu, 3) Lauri Mähönen LapinV, 4) Juho Huttunen Ha He, 66 kg: 1) Matias Lipasti 1K, 2) Toni Ojala JäVo, 3) Mik- ko Peltokangas Virkiä, 4) Rami Syrjä SPM 71 kg: 1) Tero Välimäki

GENUSS-EMPFEHLUNG - wenisch-metzgerei.de · mit Kümmel Ochsen-Hackfleisch besonders saftig! Haussalami unsere beliebte Rohwurst ... durch Marco Schroll. Von 19.00 – 23.00 Uhr

MBAFPI17 Fee 21 05 2017 - Recruitment Portalrecruitment-portal.in/ntc/MBAFPI17_Fee_03_05_2017.pdf170400264 JASMINE SOHI JAGMOHAN SINGH SARABJIT KAUR 170400291 Yugjeet Kaur Baljeet

CAPÍTULO 201 PROPÓSITO OCULTO SEGUNDA PARTE!...CAPÍTULO 201 PROPÓSITO OCULTO SEGUNDA PARTE! Los líderes entonces dudaron. Ellos también habían visto a Jegal Sohi y Moyong Yu

Fachgruppe Wirtschaft und Tourismus Rückblick 2018 · Werkleiter ist Jens Wenisch. Nach dieser „Leuchtturmansiedlung“konntenkontinuierlich neue Unternehmen für den Industriepark

Superscalar Organization - ECE/CS 752 Fall 2017 · Superscalar Organization ECE/CS 752 Fall 2017 Prof. Mikko H. Lipasti University of Wisconsin-Madison. ... Symposium on Computer

Amarjeet Sohi Letter

HOCH...Unsere Wenisch Inklusiv-Leistungen für alle Hochzeiten ab 50 Personen. Ihre Feier wird mit uns zu einem ganz besonderen erlebnis. – ” Pauschalen × Alle Hochzeitsgäste

Lipasti, Roope: Viikinkisolmu (WSOY)

[ ]H] ^ ]6`H` a - ftp.cs.wisc.eduftp.cs.wisc.edu/pub/sohi/trs/trace-processors.1310.pdf · a a ¯ u ¯ ') ; ¯ :- ¥[ ¬ ¨] £ ¢« ¥ ,¤ ¢ o ¡ z / - ! ( '* ¯` £,¤ * ¡