איפה אנחנו ולאן ממשיכים ??

1 / 52

??איפה אנחנו ולאן ממשיכים

מבנהמחשבים

µProc60%/yr.(2X/1.5yr)

DRAM9%/yr.(2X/10 yrs)

1

10

100

1000

19

80 1

98

1 19

83 1

98

4 19

85 1

98

6 19

87 1

98

8 19

89 1

99

0 19

91 1

99

2 19

93 1

99

4 19

95 1

99

6 19

97 1

99

8 19

99 2

00

0

DRAM

CPU

19

82

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

Time

“Moore’s Law”

34-b it A LU

LO register(16x2 bits)

Load

HI

Cle

arH

I

Load

LO

M ultiplicandRegister

S h iftA ll

LoadM p

Extra

2 bits

3 232

LO [1 :0 ]

Result[H I] Result[LO]

32 32

Prev

LO[1]

Booth

Encoder E N C [0 ]

E N C [2 ]

"LO

[0]"

Con trolLog ic

InputM ultiplier

32

S ub /A dd

2

34

34

32

InputM ultiplicand

32=>34sig nEx

34

34x2 M U X

32=>34sig nEx

<<13 4

E N C [1 ]

M ulti x2 /x1

2

2HI register(16x2 bits)

2

01

3 4 Arithmetic

Single/multicycleDatapaths

IFetchDcd Exec Mem WB




Pipelining

Memory Systems

I/O

2 / 52

נלמד ? מה

•. בנוי מחשב כיצד

•. מחשב ביצועי לנתח כיצד

חדשים )• מעבדים על המשפיעים (cache, pipelineנושאים

הספר:Computer Organization & Design The hardware/software interface,

David A. Patterson and

John L. Hennessy.

Third Edition 2005

4 / 52

5 / 52

Levels of Representation

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

lw$15, 0($2)lw$16, 4($2)

,sw$16 0($2)sw$15, 4($2)

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

ALUOP[0:3] <= InstReg[9:11] & MASK

שפה עלית:קל לתכנת•לא חד ערכי לשפת מכונה•תלוי קומפיילר אופטימיזר••Portable

שפת סף (אסמבלי Assembly(

לשפת מכונה1:1•יותר קריא מש' מכונה•

High Level Language Program

Assembly Language Program

Machine Language Program

Control Signal Specification

°°

Compiler

Assembler

Machine Interpretation

6 / 52

MIPS Instruction SetUse the MIPS ISA document as the final word on the ISA

MIPS ISA documen

t available

on Course

Web Site.

7 / 52

Instruction Sets: A Thin Interface

Instruction Set Architecture

I/O systemProcessor

Digital DesignCircuit Design

Datapath & Control

Transistors

MemoryHardware

CompilerOperating

System(Mac OS X)

Application (iTunes)

Software Assembler

Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10

In Hexadecimal: 012A4020000000 01001 01010 01000 00000 100000Binary:

6 bits 5 bits 5 bits 5 bits 5 bits 6 bitsFieldsize:

opcode rs rt rd functshamtBitfield:

“R-Format”

8 / 52

Instruction Set Architecture

השונות • המכונה שפות בין רב דמיון יש• - ה מעבד של המכונה שפת את נלמד בתחילת MIPSאנו שפותח

- ה - 80שנות ב ) בו (.Silicon Graphics,NEC,Sonyמשתמשים

•RISC v. CISC

–Reduced Instruction Set Computer - MIPS–8086 - Complex Instruction Set Computer

יותר המוטו “• זה “פחות

, , יותר: פשוטות עצמן הפקודות כאשר יותר קטן פקודות סט כלומר . לביצוע הדרושה והחומרה היות זאת יותר טובים ביצועים מאפשר

” הסיליקון ושטח תגדל המהירות יותר פשוטה תהיה ל הנ הפקודות. ) יצטמצם )= המחיר הדרוש

9 / 52

Hardware implements semantics...

Instruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction

Fetch next inst from memory:012A4020

opcode rs rt rd functshamtDecode fields to get : ADD $8 $9 $10

“Retrieve” register values: $9 $10

Add $9 to $10

Place this sum in $8

Prepare to fetch instruction that follows the ADD in the program.

Syntax: ADD $8 $9 $10 Semantics: $8 = $9 + $10

10 / 52

Memory Instructions: LW $1,32($2)

Instruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction

Fetch the load inst from memory

“Retrieve” register value: $2

Compute memory address: 32 + $2

Load memory address contents into: $1

Prepare to fetch instr that follows the LW in the program. Depending on load semantics, new $1 is visible to that instr, or not until the following instr (”delayed loads”).

Decode fields to get : LW $1, 32($2)

opcode rs rt offset “I-Format”

11 / 52

Branch Instructions: BEQ $1,$2,25

Instruction

Fetch

Instruction

Decode

Operand

Fetch

Execute

Result

Store

Next

Instruction

Fetch branch inst from memory

“Retrieve” register values: $1, $2

Compute if we take branch: $1 == $2 ?

Decode fields to get: BEQ $1, $2, 25

opcode rs rt offset “I-Format”

ALWAYS prepare to fetch instr that follows the BEQ in the program (”delayed branch”). IF we take branch, the instr we fetch AFTER that instruction is PC + 4 + 100.PC == “Program

Counter”

12 / 52

Why ISA is important?

•Code size

– long instructions may take more time to be fetched

– Requires larges memory (important in small devices, e.g., cell phones)

•Number of instructions (IC) – Reducing IC reduce execution time (assuming same CPI and

frequency)

•Code “simplicity”– Simple HW implementation which leads to higher frequency and lower

power

– Code optimization can better be applied to “simple code”

13 / 52

The impact of the ISA

RISC vs CISC

14 / 52

CISC Processors

•CISC - Complex Instruction Set Computer

•The idea: a high level machine language

•Characteristic– Many instruction types, with many addressing modes– Some of the instructions are complex:

• Perform complex tasks• Require many cycles

– ALU operations directly on memory

• Usually uses limited number of registers– Variable length instructions

• Common instructions get short codes save code length

•Example: x86

15 / 52

CISC Drawbacks

•Compilers do not take advantage of the complex instructions and the complex indexing methods

•Implement complex instructions and complex addressing modes

complicate the processor

slow down the simple, common instructions contradict Amdahl’s law corollary:

Make The Common Case Fast

•Variable length instructions are real pain in the neck:– It is difficult to decode few instructions in parallel

• As long as instruction is not decoded, its length is unknown It is unknown where the instruction ends It is unknown where the next instruction starts

– An instruction may not fit into the “right behavior” of the memory hierarchy (will be discussed next lectures)

•Examples: VAX, x86)!?!(

16 / 52

RISC Processors

•RISC - Reduced Instruction Set Computer•The idea: simple instructions enable fast hardware

•Characteristic– A small instruction set, with only a few instructions formats– Simple instructions

• execute simple tasks• require a single cycle (with pipeline)

– A few indexing methods– ALU operations on registers only

• Memory is accessed using Load and Store instructions only.

• Many orthogonal registers • Three address machine: Add dst, src1, src2

– Fixed length instructions•Examples: MIPSTM, SparcTM, AlphaTM, PowerPCTM

17 / 52

RISC Processors (Cont.)

•Simple architecture Simple micro-architecture – Simple, small and fast control logic

– Simpler to design and validate

– Room for on die caches: instruction cache + data cache

• Parallelize data and instruction access– Shorten time-to-market

•Using a smart compiler – Better pipeline usage

– Better register allocation

•Existing RISC processor are not “pure” RISC – e.g., support division which takes many cycles

18 / 52

RISC v. CISC

I - בתוכנית פקודות מספר

T –פקודה לביצוע הזמן

RISC:

I * T

CISC:

I * T

19 / 52

So, what is better, RISC or CISC

•Today CISC architectures (X86) are running as fast as RISC (or even faster)

•The main reasons are:– Translates CISC instructions into RISC instructions (ucode) – CISC architecture are using “RISC like engine”

•We will discuss this kind of solutions later on in this course.

20 / 52

מספר תכנון Simplicity favors Regularity: 1כלל

אריתמטיות אריתמטיות פעולות פעולות

•MIPS

addi a,b,100 # a=b+100 add a,b,c # a=b+c

•8086 ADD EAX,B # EAX= EAX+B

כמו מינימליות פקודות עם פשוט מנגנון מעדיפים אנו R3 = R1 op R2 משתנים פניעל כמה עם לתכנות יותר שקל מחשב

למשל בפקודה שרוציםR5 = ( R1 op1 R2) op2 (R3 op3 R4)

. אותו ולממש לתכנן מאוד קשה אבל

21 / 52

’ מס תכנון Smaller is faster: 2כלל

•. רגיסטרים על רק אריתמטיות פעולות נאפשר• .) אחד ) קבוע או רגיסטר להיות יכולים האופרנדיםיש” • כ . 32סה spillingרגיסטרים word = 32 bits = 4 bytesרגיסטר •קונבנציות•

$1,$2 …

$s0,$s1 ... - C של משתנים$t1,$t2 … - זמניים משתנים

דוגמא:f=(g+h)-(k+j) # $s0=f, $s1=g, $s2=h, $s3=k, $s4=j

add $t0,$s1,$s2

add $t1,$s3,$s4

sub $s0, $t0, $t1

” י ע מסומנים הרגיסטריםאו. $31$ - 0$ ” י ע או

הקשורים שמות . יש בתכנית לתפקידיהם

תפקידי על הסכמה ” בכל ל הנ הרגיסטרים

בשפת . Cהתכניות למשל

המשפט איך מתארת הדוגצא ( f=(g+h)-(k+j פקודות” מיוצג י ע

אסמבלי. . הפרוססור של ליבו הם הרגיסטרים

. מלזיכרון מהירה אליהם הגישה- ” ל , בוזמנית -3נגשים : מ 2רגיסטרים

. כותבים ולשלישי קוראים.

110עמ’

22 / 52

Policy of Use Conventions

Name Register number Usage$zero 0 the constant value 0$v0-$v1 2-3 values for results and expression evaluation$a0-$a3 4-7 arguments$t0-$t7 8-15 temporaries$s0-$s7 16-23 saved$t8-$t9 24-25 more temporaries$gp 28 global pointer$sp 29 stack pointer$fp 30 frame pointer$ra 31 return address

הם נוספים השמור at = $1רגיסטרים $לאסמבלר

למערכת k0 , $ k1 = $ 26, $ 27ו- השמורים $ההפעלה

23 / 52

הזיכרון

•. גדול - מערך הזיכרון•. למערך - אינדקס לזיכרון כתובת•Byte addressing. בבתים - האינדקס

המכסימלי • הזיכרון גודל 230 words = 232 bytes

0123456...

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

8 bits of data

24 / 52

לזיכרון פניה

Load and Storeפקודות •בבתים - • היא בזיכרון הכתובת אבל מילה LWטוענים

lw $s1,100($s2) # $s1=Memory[$s2+100]

sw $s1,100($s2) # Memory[$s2+100]=$s1

•: נוספת דוגמא save - של Word מערך

muli $9,$19,4 # Temporary reg $9:=i*4

lw $8,save($9) # Temporary reg $8:=save[i]

במערך = offsetמקום למערך = base registerמצביע

למערך = baseמצביעregister

במערך = מקוםoffset

25 / 52

של byteקריאה

כמו )• פקודות גם lb (load byteישנםsb(store byteו- )

לקריאת • ASCIIב - byteגודל: charשימושיAmerican Standard Code For Information

Interchange

.2הוא charגודל - Unicodeב- • בתים

27 / 52

לזיכרון גישה

דוגמא: • A של wordsמערך

- Aכתובת $ s3ב h- ב $ s2נמצא

C code: A[2] = h + A[2];

MIPS code: lw $t0, 8($s3) # $t0=$s3[8] add $t0, $s2,$t0 # $t0=$s2+$t0 sw $t0, 8($s3) # $s3[8]=$t0

0

4

8

12

16

32 bits of dataA A[0]

A[1]

INTEL = Little endianA[2]

MIPS = Big endian

013

0 1 2 3

2

28 / 52

• - ב של MIPSהפקודות זהה . 32 בגודל ביט -8086ב - • מ - משתנה בגודל -1פקודות ל .17עד בתים

דוגמא: add $s1,$s2,$s3 # $s1=$17,$s2=$18, $s3=$19

הפקודה R-typeמסוג פורמט 0 31

3201719180 000000 10010 10011 10001 00000 100000

op rs rt rd shamt funct

op - opecode rs - register source

rt- register source no 2 rd - register destination

funct - function shamt - shift amount

המכונה שפת

6 5 5 5 5 6

29 / 52

•. . הפקודות בין דמיון השונים הפקודות סוגי מספר צמצום

• Example: lw $s1, 32($s2) # $s1 =$17, $s2=18

35 18 17 32

op rs rt 16 bit number

’ מס תכנון לעיתים : 3כלל דורש טוב תכנוןפשרות

op rs rt rd shamt funct

op rs rt 16 bit address

op 26 bit address

R

I

J

6 5 5 5 5 6

6 5 5 5 5 6

30 / 52

נתונים • כמו בדיוק בזיכרון נשמרת התוכנית

תוכנית ביצועמיוחד • . PC - Program Counterרגיסטר הפקודה כתובת את שומר•. מהזיכרון שלמה מילה קוראיםה - • את .PCמקדמים

Processor Memory

memory for data, programs, compilers, editors, etc.

בזיכרון התוכנית

31 / 52

•Jump ” תנאים - “ ללא אבסולוטית קפיצה

j label

•Branch - מותנת יחסית קפיצה bne $1,$2,label # $1!=$2 go to label

• Example:

if (i!=j) beq $s4, $s5, Lab1 h=i+j; add $s3, $s4, $s5else j Lab2 h=i-j; Lab1: sub $s3, $s4, $s5

Lab2: ...

Branch vs Jump

($s3=h $s4 =i $s5=j )

32 / 52

• Instructions:

bne $t4,$t5,Label Next instruction is at Label if $t4!= $t5

beq $t4,$t5,Label Next instruction is at Label if $t4 = $t5

j Label Next instruction is at Label

•Formats:

מכאןbranch - בגבולות יחסית .2^16קפיצה מילים

- ה: רוב . branchesהנחה לוקאליות קפיצות יהיו

•Beq $s1,$s2,25 # if ($s1 ==$s2) go to PC +4 +25*4

op rs rt 16 bit address

op 26 bit address

I

J

Addresses in Branches and Jumps

33 / 52

לקידוד דוגמא

Loop: lw $8, save($19) # $8=save[i]

bne $8, $21,Exit #Goto Exit if save[i]<> k

add $19,$19,$20 # i:=i+j

j Loop # Goto Loop

Exit:

SAVE - 1000

1000 8 19 35 80,000

2 21 8 5 80,004

32 0 19 20 19 0 80,008

20,000 2 80,012

80,016

34 / 52

לזכור שחשוב כלל

של MIPS מקודדים: באסמבלי

במילים codeכתובת

בבתים dataכתובת

בבתים הכתובת את מבקש הוא לזיכרון כאשר MIPS ניגש מעבד

35 / 52

Slt- set less then if $s1 < $s2 then $t0 = 1

slt $t0, $s1, $s2 else $t0 = 0

את • לבנות :bltניתןBlt –branch less then blt $s0,$s1, Less

slt $at,$s0,$s1 # $t0 gets 1 if $s0<$s1bne $at,$zero, Less # go to Less if $t0 != 0

• blt היא Pseudo instruction •Assembler uses $at (= $1) for pseudo instructions

Branch-if-less-than?

36 / 52

- ל נוספות Pseudo instructionדוגמאות

bne $8,$21,far_adrs

ל - שקול beq $8,$21,nxt j far_adrsnxt:

move $t1,$s4

add $t1,$s4,$zero

ל - שקול

37 / 52

’ מס תכנון מהיר : - 4כלל השכיח המקרה את בנה

קבועים • עם מתבצעות ארתמטיות פעולות הרבה.קטנים

גדולים • קבועים

$t0=101010101010101011111111111111111

addi $29, $29, 4

1010101010101010

0000000000000000 1111111111111111

1010101010101010 1111111111111111

0000000000000000 lui $t0,1010101010101010

# $t0=2^16*(101010101010101010)2

ori $t0,$t0, 1111111111111111

# $t0=$t0||1010101010101010

$to=10101010101010101111111111111…

: קטנים קבועים של והשוואה חיסור לחיבור מיוחדות פקודות הוקצו לכןaddi, slti, andi, ori, xori

- ב , 2נחשב , נדרשת אבל יותר לאט כלומר פקודות : נוספת אחת פקודה luiרק

38 / 52

לסיכוםMIPS operands

Name Example Comments$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform

32 registers $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero always equals 0. Register $at is $fp, $sp, $ra, $at reserved for the assembler to handle large constants.

Memory[0], Accessed only by data transfer instructions. MIPS uses byte addresses, so

230

memory Memory[4], ..., sequential words differ by 4. Memory holds data structures, such as arrays,

words Memory[4294967292] and spilled registers, such as those saved on procedure calls.MIPS assembly language

Category Instruction Example Meaning Commentsadd add $s1, $s2, $s3 $s1 = $s2 + $s3 Three operands; data in registers

Arithmetic subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers

add immediate addi $s1, $s2, 100 $s1 = $s2 + 100 Used to add constants

load w ord lw $s1, 100($s2) $s1 = Memory[$s2 + 100]Word from memory to register

store w ord sw $s1, 100($s2) Memory[$s2 + 100] = $s1 Word from register to memory

Data transfer load byte lb $s1, 100($s2) $s1 = Memory[$s2 + 100]Byte from memory to register

store byte sb $s1, 100($s2) Memory[$s2 + 100] = $s1 Byte from register to memoryload upper immediate

lui $s1, 100 $s1 = 100 * 216 Loads constant in upper 16 bits

branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to PC + 4 + 100

Equal test; PC-relative branch

Conditional

branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to PC + 4 + 100

Not equal test; PC-relative

branch set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; else $s1 = 0

Compare less than; for beq, bne

set less than immediate

slti $s1, $s2, 100 if ($s2 < 100) $s1 = 1; else $s1 = 0

Compare less than constant

jump j 2500 go to 10000 Jump to target address

Uncondi- jump register jr $ra go to $ra For sw itch, procedure return

tional jump jump and link jal 2500 $ra = PC + 4; go to 10000 For procedure call

39 / 52

תרגיל

:MIPS, ושני תרגומים שלו לשפת האסמבלי של C לפניך קוד הכתוב ב-:1שאלה

while (save[i]!=k) doi=i+j ;

save:array [ 0..100] of word.k כ-$21 ו-j כ-$20 ו i מתפקד כ$19

:תרגום ראשון•

Loop: muli $9,$19,4 # Temporary reg $9:=i*4 lw $8,save($9) # Temporary reg $8:=save[i] beq $8,$21,Exit # Goto Exit if save[i] = k add $19,$19,$20 # i:=i+j j Loop # Goto Loop

Exit:

40 / 52

תרגיל המשך

:תרגום שני•



beq $8,$21,Exit # Goto Exit if save[i] = k

Loop: add $19,$19,$20 # i:=i+j



bnq $8,$21,Loop # Goto Loop if save[i]!=k

Exit:

שמתבצעות פעמים, מה מספר הפקודות 10 בהנחה שהלולאה מתבצעת שאלה:•בכל אחד מהתרגומים?

41 / 52

Compiler

A.asm

B.asm compiler

compilerA.obj

B.obj

linker

C.lib

(c.obj)

P.exeloader

Memory

42 / 52

הקומפילציה תהליך

43 / 52

דוגמא

B.asm

compiler

compiler

A.obj

B.obj

s: .word 3,4

j k

lw $1,s ($2)

k: add $1,$2,$3

m: .word 2

sw 7, m($3)

A.asm

3 4

j 2

lw $1,0($2)

add $1,$2,$3

2

sw $7,0($3)2

3

4

sw $7,0($3)

j 3

lw $1,4($2)

add $1,$2,$3

linker

P.exe

44 / 52

Unixב- Objectקובץ

• Object file header

• text segment

• data segment

• relocation information

• symbol table

• debugging information

45 / 52

במחשב הזיכרון MIPSמבנה

46 / 52

47 / 52

Preprocessing • Macro:

• Code:

• After preprocessing

48 / 52

49 / 52

50 / 52

Calling procedure:

51 / 52

Sub1:

Callee procedure:

52 / 52

Single Cycle

Shiftleft 2

PC

Instr uctionme mor y

Readaddre ss

Instr uction[31– 0]

Datame mory

Readdata

Writedata

RegistersWriteregister

Writedata

Readdata 1

Readdata 2

Readregister 1

Readregister 2

I nstr uction [15 – 11]



Add

AL Uresult

Zer o

I nstruction [5– 0]

Me mto Reg

AL U Op

Me m Write

Reg Write

Me mRead

Br an ch

Ju mpReg Dst

AL U Src


4

Mux

I nstru ction [25– 0] Ju mp addr ess [31 – 0]

PC +4 [31 – 2 8]

Signexte nd

16 32I nstr uction [15 – 0]

1

Mux

1

0

Mux

0

1

Mux

0

1

AL Ucontr ol

Control

AddALU

result

Mux

0

1 0

AL U

Shiftleft 2

26 2 8

Address

Documents

איפה אנחנו ולאן ממשיכים ??