29
Department of Computer and IT Department of Computer and IT Engineering Engineering University of Kurdistan University of Kurdistan Computer Architecture Pipeline Processing By: Dr. Alireza Abdollahpouri By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan Computer Architecture

  • Upload
    jubal

  • View
    50

  • Download
    0

Embed Size (px)

DESCRIPTION

Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipeline Processing By: Dr. Alireza Abdollahpouri. مفهوم پردازش خط لوله اي. A. B. C. D. مثال: شستن لباسها - PowerPoint PPT Presentation

Citation preview

Page 1: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

 

Department of Computer and IT EngineeringDepartment of Computer and IT EngineeringUniversity of KurdistanUniversity of Kurdistan

Computer Architecture

Pipeline Processing

By: Dr. Alireza AbdollahpouriBy: Dr. Alireza Abdollahpouri

Page 2: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

2

مفهوم پردازش خط لوله اي

مثال: شستن لباسها••Ali, Bahram, Cathy, Dara

هر كدام مقداري لباس دارند كه مي خواهند بشورند، خشك كنند و اتو كنند.

دقيقه طول مي كشد.30عمل شستن •

دقيقه طول مي كشد.40عمل خشك كردن •

دقيقه طول مي كشد.20اتو زدن •

A B C D

Page 3: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

3

لباسشويي به صورت

متوالي

ساعت براي كار چهار 6در انجام متوالي اعمال مذكور •نفر طول ميكشد.

A

B

C

D

30 40 20 30 40 20 30 40 20 30 40 20

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

Page 4: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

4

لباسشويي به صورت خط لوله اي

ساعت براي كار چهار نفر طول ميكشد.3.5انجام اعمال مذكور به صورت خط لوله اي •

A

B

C

D

6 PM 7 8 9 10 11 Midnight

Task

Order

Time

30 40 40 40 40 20

Page 5: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

مفهوم پايه

خط لوله: چندین دستورالعمل به طور همزمان •در حال اجر هستند.

خط لوله به بخش ها یا قطعات تقسیم می شود.•:(Machine Cycle)چرخه ماشین •

زمان مورد نیاز برای گذر از یک مرحله–چرخه ماشین بوسیله کندترین مرحله خط لوله معین –

می گردد.معموال چرخه ماشین = پالس ساعت–

5

Page 6: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

خط لوله ای کردن

داشته باشیم که زمان اجرای (Task) تکلیف nاگر •

باشد )زمان انجام کل tnهر کدام از آن ها برابر با

(، با فرض اینکه تعداد قطعات خط n*tnتکالیف =

انجام پذیر باشد tp باشد و هر قطعه در kلوله

(:tp)پالس ساعت =

•Task اول در k پالس (k*tp).انجام می پذیرد

•Task های دیگر هر کدام در پالس زمانی بعدی )یک پالس زمانی( تکمیل خواهند شد، پس زمان الزم

tp(*n-1) تکلیف دیگر برابر با (n-1)برای انجام خواهد شد.

در نتیجه افزایش سرعت پردازش خط لوله نسبت •به پردازش غیر خط لوله ای از فرمول زیر محاسبه

S = ntn / (k + n - 1)tpمی گردد:

6

Page 7: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

7

نكاتي در مورد

پردازش خط لوله

اي

پردازش خط لوله اي يك كار خاص را •سريعتر نميكند، بلكه توان عملياتي كل

را بهبود ميبخشد.

كندترين مرحلهسرعت خط لوله توسط •محدود ميگردد.

چند كار با استفاده از منابع مختلف • باهم اجرا ميشوند.همزمان

در حالت ايده آل، تسريع به تعداد •مراحل خط لوله است.

مراحل نامتعادل )با زمان اجراي •نامساوي( سرعت و كارايي خط لوله را

كاهش ميدهد.

زماني كه براي پر كردن و خالي كردن •خط لوله صرف ميشود نيز باعث كاهش

سرعت خط لوله ميگردد.

A

B

C

D

6 PM 7 8 9

Task

Order

Time

30 40 40 40 40 20

Page 8: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

8

پنج مرحله سيكل دستورالعمل

• Ifetch: Instruction Fetch– Fetch the instruction from the Instruction Memory

• Reg/Dec: Registers Fetch and Instruction Decode• Exec: Calculate the memory address• Mem: Read the data from the Data Memory• Wr: Write the data back to the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

Ifetch Reg/Dec Exec Mem Wrlw

Page 9: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

9

MemoryAccess

Write

Back

InstructionFetch

Instr. DecodeReg. Fetch

ExecuteAddr. Calc

LMD

ALU

MU

X

Mem

ory

Reg File

MU

XM

UX

Data

Mem

ory

MU

X

SignExtend

4

Ad

der Zero?

Next SEQ PC

Addre

ss

Next PC

WB Data

Inst

RD

RS1

RS2

Imm

DLXپنج مرحله مسير داده پردازنده

Page 10: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

10

DLXپنج مرحله مسير داده پردازنده

MemoryAccess

Write

Back

InstructionFetch

Instr. DecodeReg. Fetch

ExecuteAddr. Calc

ALU

Mem

ory

Reg File

MU

XM

UX

Data

Mem

ory

MU

X

SignExtend

Zero?

IF/ID

ID/E

X

MEM

/WB

EX

/MEM

4

Ad

der

Next SEQ PC Next SEQ PC

RD RD RD WB

Data

Next PC

Addre

ss

RS1

RS2

Imm

MU

X

Page 11: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

11

تجسم خط لوله

Instr.

Order

Time (clock cycles)

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 6Cycle 7Cycle 5

Page 12: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

12

مشكالتي كه در پردازش خط لوله اي بوجود مي آيد

• Limits to pipelining: Hazards prevent next instruction from executing during its designated clock cycle– Structural hazards: HW cannot support this combination of

instructions

– Data hazards: Instruction depends on result of prior instruction still in the pipeline

– Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).

Page 13: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

13

One Memory Port/ Structural Hazards

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Instr 3

Instr 4

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 6Cycle 7Cycle 5

Reg

ALU

DMemIfetch Reg

Page 14: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

14

One Memory Port/ Structural Hazards

Instr.

Order

Time (clock cycles)

Load

Instr 1

Instr 2

Stall

Instr 3

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Cycle 1Cycle 2 Cycle 3Cycle 4 Cycle 6Cycle 7Cycle 5

Reg

ALU

DMemIfetch Reg

Bubble Bubble Bubble BubbleBubble

Page 15: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

15

Data Hazard on

r1add r1, r2, r3

sub r4, r1, r3

and r6, r1, r7

or r8, r1, r9

xor r10, r1, r11

Page 16: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

16

وابستگيهاي رو به عقب در زمان

Data Hazard on

r1:

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF

ID/RF

EX

MEM

WB

AL

UIm Reg Dm Reg

AL

UIm Reg Dm RegA

LUIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

Page 17: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

17

نتيجه يك مرحله را به محض آماده شدن •به جلو برانيم

براي حل مشكل Forwardingروش Data Hazard

Instr.

Order

Time (clock cycles)

add r1,r2,r3

sub r4,r1,r3

and r6,r1,r7

or r8,r1,r9

xor r10,r1,r11

IF

ID/RF

EX

MEM

WB

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

AL

UIm Reg Dm Reg

Im

AL

UReg Dm Reg

AL

UIm Reg Dm Reg

Page 18: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

18

تغييرات سخت افزاري براي پشتيباني Forwardingاز

MEM

/WR

ID/E

X

EX

/MEM

DataMemory

ALU

mux

mux

Registe

rs

NextPC

Immediate

mux

Page 19: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

19

• Read After Write (RAW) InstrJ tries to read operand before InstrI writes it

• Caused by a “Dependence” (in compiler nomenclature). This hazard results from an actual need for communication.

Data Hazardسه نوع

I: add r1,r2,r3J: sub r4,r1,r3

Page 20: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

20

• Write After Read (WAR) InstrJ writes operand before InstrI reads it

• Called an “anti-dependence” by compiler writers.This results from reuse of the name “r1”.

I: sub r4,r1,r3 J: add r1,r2,r3K: mul r6,r1,r7

نوع Data Hazardسه

Page 21: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

21

• Write After Write (WAW) InstrJ writes operand before InstrI writes it.

• Called an “output dependence” by compiler writersThis also results from the reuse of name “r1”.

I: sub r1,r4,r3 J: add r1,r2,r3K: mul r6,r1,r7

نوع Data Hazardسه

Page 22: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

22

Time (clock cycles)

Instr.

Order

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

or r8,r1,r9

Data Hazard از استفاده با حتيForwarding

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Page 23: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

23

Time (clock cycles)

or r8,r1,r9

Instr.

Order

lw r1, 0(r2)

sub r4,r1,r6

and r6,r1,r7

Reg

ALU

DMemIfetch Reg

RegIfetch

ALU

DMem RegBubble

Ifetch

ALU

DMem RegBubble Reg

Ifetch

ALU

DMemBubble Reg

Data Hazard از استفاده با حتيForwarding

Page 24: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

24

Try producing fast code for

a = b + c;

d = e – f;

assuming a, b, c, d ,e, and f in memory. Slow code:

LW Rb,b

LW Rc,c

ADD Ra,Rb,Rc

SW a,Ra

LW Re,e

LW Rf,f

SUB Rd,Re,Rf

SW d,Rd

Software Scheduling to Avoid Load Hazards

Fast code:

LW Rb,b

LW Rc,c

LW Re,e

ADD Ra,Rb,Rc

LW Rf,f

SW a,Ra

SUB Rd,Re,Rf

SW d,Rd

Page 25: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

25

Control Hazard on Branches - Three Stage Stall

10: beq r1,r3,36

14: and r2,r3,r5

18: or r6,r1,r7

22: add r8,r1,r9

36: xor r10,r1,r11

Reg ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Reg

ALU

DMemIfetch Reg

Page 26: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

26

Branch Stall Impact

• If CPI = 1, 30% branch, Stall 3 cycles => new CPI = 1.9!

• Two part solution:– Determine branch taken or not sooner, AND– Compute taken branch address earlier

Page 27: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

27

Four Branch Hazard Alternatives

#1: Stall until branch direction is clear

#2: Predict Branch Not Taken

#3: Predict Branch Taken#4: Delayed Branch

Page 28: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

28

Superscalar پردازش ابر عددي يا

از خط استفاده چندصورت لوله به

موازي

Page 29: Department of Computer and IT Engineering University of Kurdistan Computer Architecture

29

Summary : Control and Pipelining

• Just overlap tasks; easy if tasks are independent• Speed Up Pipeline Depth; if ideal CPI is 1, then:

• Hazards limit performance on computers:

– Structural: need more HW resources– Data (RAW,WAR,WAW): need forwarding, compiler scheduling– Control: Delayed branch, prediction

pipelined

dunpipeline

TimeCycle

TimeCycle

CPI stall Pipeline 1depth Pipeline

Speedup