7
7/28/2019 cs252-8 http://slidepdf.com/reader/full/cs252-8 1/7

cs252-8

Embed Size (px)

Citation preview

Page 1: cs252-8

7/28/2019 cs252-8

http://slidepdf.com/reader/full/cs252-8 1/7

Page 2: cs252-8

7/28/2019 cs252-8

http://slidepdf.com/reader/full/cs252-8 2/7

Copyright M. Baltrush (CS252-8) 4

Data Hazard

F1 D1E1 W1

F2 D2 E2 W2

F3 D3E3 W3

F4 D4E4 W4

1 2 3 4 5 6 7 8 9 clock time

Execute requires more

than one clock cycle

Stall (Bubble)

Copyright M. Baltrush (CS252-8) 5

Instruction (Control) Hazard

F1 D1E1 W1

F2 D2 E2 W2

F3 D3E3 W3

1 2 3 4 5 6 7 8 9 clock time

Stall

(Bubble)

Instruction is not

in cache.

Decode idle: 3, 4, 5

Execute idle: 4, 5, 6Write idle: 5, 6, 7

Copyright M. Baltrush (CS252-8) 6

• Aß

3 + ABß 4 x A

• Sequentially = 32

• Without regard to data hazard = 20

Data Hazard

• Data is not available when needed

Page 3: cs252-8

7/28/2019 cs252-8

http://slidepdf.com/reader/full/cs252-8 3/7

Copyright M. Baltrush (CS252-8) 7

Source is previous Destination

F1 D1 E1 W1

F2 D2E2 W2

F3 D3 E3 W3

Stall

(Bubble)

D2a

1 2 3 4 5 6 7 8 9 clock time

MulR2, R3, R4 Add R5, R4, R6

Copyright M. Baltrush (CS252-8) 8

Operand Forwarding

• Send data from source operation to destination

operation without passing through register file.

E:execute(ALU)

W:write(Register 

File)

Src1,Src2 Result

Forwarding Path

Copyright M. Baltrush (CS252-8) 9

Software Solution

• Mul R2, R3, R4

• Add R5, R4, R6

• Mul R2, R3, R4

• NOP

• NOP

• Add R5, R4, R6Compiler at work

Page 4: cs252-8

7/28/2019 cs252-8

http://slidepdf.com/reader/full/cs252-8 4/7

Page 5: cs252-8

7/28/2019 cs252-8

http://slidepdf.com/reader/full/cs252-8 5/7

Copyright M. Baltrush (CS252-8) 13

Instruction Queue Pre-fetch

• Queue instructions for dispatch

• Fetch unit recognizes branch and obtains

instruction – branch folding 

• Masks cache misses

Copyright M. Baltrush (CS252-8) 14

Hardware Change

Fetch

DispatchDecode

Execute Write

Instruction Queue

Copyright M. Baltrush (CS252-8) 15

Conditional Branch Delayed Branch

• loop shift_left R1

decrement R2 branch = 0

nex t add R1, R3

• loop decrement R2

 branch = 0shift_left R1

next add R1, R3

Original code Rearranged by compiler  

 Assumes a pipelined architecture

Page 6: cs252-8

7/28/2019 cs252-8

http://slidepdf.com/reader/full/cs252-8 6/7

Copyright M. Baltrush (CS252-8) 16

Conditional Branch

Static Branch Prediction

• Speculative execution  – hardware assumes

 branch not taken/taken

• Conditional branches not random

• Compiler sets/resets branch prediction bit in

instruction

Copyright M. Baltrush (CS252-8) 17

Conditional Branch

 Dynamic Branch Prediction

SNT LNT

LT ST

BT

BT

BNT

BNT

BNT

BNT

BT

BT

ST: Strongly Likely TakenLT: Likely Taken

LNT: Likely Not TakeSNT: Strongly Not Taken

Copyright M. Baltrush (CS252-8) 18

Instruction Set Influence: Addressing

• Load (X(R1)), R2 • Add #X, R1, R2

Load (R2), R2

Load (R2), R2

Both require 7 cycles to finish execution

Requires 3 memory accessesto obtain operand – stalls pipeline

Fewer instructions required

Page 7: cs252-8

7/28/2019 cs252-8

http://slidepdf.com/reader/full/cs252-8 7/7

Copyright M. Baltrush (CS252-8) 19

Instruction Set Influence

Condition Codes

• Flexibility in reordering – as few instructions

as possible change condition codes

• Compiler knows which instructions can

change condition codes

Copyright M. Baltrush (CS252-8) 20

Superscalar Operation

• Issue two instructions at once

 – Requires multiple resources

 –  reorder buffer 

 –  commitment unit 

• Deadlocks