cs252-8

7/28/2019 cs252-8

http://slidepdf.com/reader/full/cs252-8 1/7

7/28/2019 cs252-8


Copyright M. Baltrush (CS252-8) 4

Data Hazard

F1 D1E1 W1

F2 D2 E2 W2

F3 D3E3 W3

F4 D4E4 W4

1 2 3 4 5 6 7 8 9 clock time

Execute requires more

than one clock cycle

Stall (Bubble)


Instruction (Control) Hazard

F1 D1E1 W1

F2 D2 E2 W2

F3 D3E3 W3

1 2 3 4 5 6 7 8 9 clock time

Stall

(Bubble)

Instruction is not

in cache.

Decode idle: 3, 4, 5

Execute idle: 4, 5, 6Write idle: 5, 6, 7


• Aß

3 + ABß 4 x A

• Sequentially = 32

• Without regard to data hazard = 20

Data Hazard

• Data is not available when needed

7/28/2019 cs252-8



Source is previous Destination

F1 D1 E1 W1

F2 D2E2 W2

F3 D3 E3 W3

Stall

(Bubble)

D2a

1 2 3 4 5 6 7 8 9 clock time

MulR2, R3, R4 Add R5, R4, R6


Operand Forwarding

• Send data from source operation to destination

operation without passing through register file.

E:execute(ALU)

W:write(Register

File)

Src1,Src2 Result

Forwarding Path


Software Solution

• Mul R2, R3, R4

• Add R5, R4, R6

• Mul R2, R3, R4

• NOP

• NOP

• Add R5, R4, R6Compiler at work

7/28/2019 cs252-8


7/28/2019 cs252-8



Instruction Queue Pre-fetch

• Queue instructions for dispatch

• Fetch unit recognizes branch and obtains

instruction – branch folding

• Masks cache misses


Hardware Change

Fetch

DispatchDecode

Execute Write

Instruction Queue


Conditional Branch Delayed Branch

• loop shift_left R1

decrement R2 branch = 0

nex t add R1, R3

• loop decrement R2

branch = 0shift_left R1

next add R1, R3

Original code Rearranged by compiler

Assumes a pipelined architecture

7/28/2019 cs252-8



Conditional Branch

Static Branch Prediction

• Speculative execution – hardware assumes

branch not taken/taken

• Conditional branches not random

• Compiler sets/resets branch prediction bit in

instruction


Conditional Branch

Dynamic Branch Prediction

SNT LNT

LT ST

BT

BT

BNT

BNT

BNT

BNT

BT

BT

ST: Strongly Likely TakenLT: Likely Taken

LNT: Likely Not TakeSNT: Strongly Not Taken


Instruction Set Influence: Addressing

• Load (X(R1)), R2 • Add #X, R1, R2

Load (R2), R2

Load (R2), R2

Both require 7 cycles to finish execution

Requires 3 memory accessesto obtain operand – stalls pipeline

Fewer instructions required

7/28/2019 cs252-8



Instruction Set Influence

Condition Codes

• Flexibility in reordering – as few instructions

as possible change condition codes

• Compiler knows which instructions can

change condition codes


Superscalar Operation

• Issue two instructions at once

– Requires multiple resources

– reorder buffer

– commitment unit

• Deadlocks

Documents

cs252-8