מה זה מבנה מחשבים?

Preview:

DESCRIPTION

מבנה מחשבים 0368-2159 Lecture 1 הקדמה נתן אינטרטור ויהודה אפק מתרגלים: הילל אבני נועה בן-עמוס. מה זה מבנה מחשבים?. חומרה - טרנזיסטורים מעגלים לוגיים ארכיטקטורת מחשבים. על מה נדבר היום:. Introduction : Computer Architecture Administrative Matters History - PowerPoint PPT Presentation

Citation preview

1/ 75

מחשבים מבנה0368-2159

Lecture 1הקדמה

נתן אינטרטור רן קנטי

מתרגלים: קיריל סולובי

2/ 75

מה זה מבנה ?מחשבים

חומרה - טרנזיסטורים

מעגלים לוגיים

ארכיטקטורת מחשבים

3/ 77

על מה נדבר :היום

Introduction : Computer Architecture

Administrative Matters

History

במחשב בסיסיות בינריות פעולות ועד וחשמל ממוליכים

• חשמלי מתח

מוליכים•

• למחצה: מוליך סיליקון

טרנזיסטור•

• אלקטרוניים ברכיבים בינריות פעולות

4/ 77

Computing Devices Then…

EDSAC, University of Cambridge, UK, 1949

5/ 77

Computing Devices Now

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this p icture.

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.QuickTime™ and a

TIFF (Uncompressed) decompressorare needed to see this picture.

Robots

SupercomputersAutomobiles

Laptops

Set-top boxes

Games

Smart phones

Servers

Media Players

Sensor Nets

Routers

Cameras

6/ 77

,מבנה מחשבים?מה זה

7/ 77

8/ 77

Mother board

9/ 75

First Pacemaker, 1957

10/ 77

11/ 77

12/ 77

The paradigm (Patterson)

Every Computer Scientist should master the “AAA”

ArchitectureAlgorithmsApplications

13/ 77

Computer Architecture: GOAL

The goal of Computer ArchitectureTo build “cost effective systems”

•How do we calculate the cost of a system ?•How we evaluate the effectiveness of the system?

To optimize the system•What are the optimization points ?

Fact: most of the computer systems still use Von-Neumann principle of operation, even though, internally, they are much different from the computer of that time.

Fast, Effective and Cheap

14/ 77

Anatomy: 5 components of any Computer (since 1946)

Personal Computer

Processor

Computer

Control(“brain”)

Datapath(“brawn”)

Memory

(where programs, data live whenrunning)

Devices

Input

Output

Keyboard, Mouse

Display, Printer

Disk (where programs, data live whennot running)

15/ 77

Computer System Structure

CPU

I/O BUS

Bridge Memory

KeyBoardMouse

Scanner

LAN

LanAdap

USBHub

GraphicAdapt

VideoBuffer

Mem BUSCPU BUS

Cache

Scsi/IDEAdap

Scsi Bus

HardDisk

16/ 77

The Instruction Set: a Critical Interface

instruction set

software

hardware

17/ 77

”Computer Architecture “מה זה ?

Computer Architecture =

Instruction Set Architecture +

Machine Organization + …

ארכיטקטורה + = הנדסה

18/ 77

מבנה מחשבים

What are “Machine Structures”?

* Coordination of many

levels (layers) of abstraction

I/O systemProcessor

CompilerOperating

System(Linux, Win, ..)

Application (ex: browser)

Digital DesignCircuit Design

Instruction Set Architecture

Datapath & Control

transistors

MemoryHardware

Software Assembler

Physics

19/ 77

Levels of Representation

High Level Language Program

Assembly Language Program

Machine Language Program

Control Signal Specification

Compiler

Assembler

Machine Interpretation

temp = v[k];

v[k] = v[k+1];

v[k+1] = temp;

lw$15, 0($2)lw$16, 4($2)sw $16, 0($2)sw $15, 4($2)

0000 1001 1100 0110 1010 1111 0101 10001010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

°°

ALUOP[0:3] <= InstReg[9:11] & MASK

20/ 77

Computer Architecture’s Changing Definition 1950s to 1960s Computer Architecture Course

• Computer Arithmetic

1970s to mid 1980s Computer Architecture Course

• Instruction Set Design, especially ISA appropriate for compilers

1990s Computer Architecture Course• Design of CPU, memory system, I/O system, Multi-

processors, Networks

2000s Computer Architecture Course: • Special purpose architectures, Functionally

reconfigurable, Special considerations for low power/mobile processing

2005 – futue (?) Multi processors, Parallelism• Synchronization, Speed-up, How to Program ??? !!!

21/ 77

Forces on Computer Architecture

ComputerArchitecture

Technology ProgrammingLanguages

OperatingSystems

History

Applications

Cleverness

22/ 77

Computers in the News: Sony Playstation 2000

As reported in Microprocessor Report, Vol 13, No. 5:• Emotion Engine: 6.2 GFLOPS, 75 million polygons per second

• Graphics Synthesizer: 2.4 Billion pixels per second

• Claim: Toy Story realism brought to games!

The Playstation 3 will deliver nearly 2 teraflops overall performance, said Ken Kutaragi, president and group CEO of Sony Computer Entertainment

23/ 77

24/ 77

http://singules-atarityhub.com/2010/01/25/kurzweil-discusses-the-future-of-brain-computer-interfac-x-prize-lab-video/

Ray Kurzweil: By 2029 reverse engineer the Human Brain

25/ 77

New Wrist Computers/Health Monitors

26/ 77

Where are We Going??

מבנהמחשבים

µProc60%/yr.(2X/1.5yr)

DRAM9%/yr.(2X/10 yrs)

1

10

100

1000

19

80 1

98

1 19

83 1

98

4 19

85 1

98

6 19

87 1

98

8 19

89 1

99

0 19

91 1

99

2 19

93 1

99

4 19

95 1

99

6 19

97 1

99

8 19

99 2

00

0

DRAM

CPU

19

82

Processor-MemoryPerformance Gap:(grows 50% / year)

Per

form

ance

Time

“Moore’s Law”

34-b it A LU

LO register(16x2 bits)

Load

HI

Cle

arH

I

Load

LO

M ultiplicandRegister

S h iftA ll

LoadM p

Extra

2 bits

3 232

LO [1 :0 ]

Result[H I] Result[LO]

32 32

Prev

LO[1]

Booth

Encoder E N C [0 ]

E N C [2 ]

"LO

[0]"

Con trolLog ic

InputM ultiplier

32

S ub /A dd

2

34

34

32

InputM ultiplicand

32=>34sig nEx

34

34x2 M U X

32=>34sig nEx

<<13 4

E N C [1 ]

M ulti x2 /x1

2

2HI register(16x2 bits)

2

01

3 4 Arithmetic

Single/multicycleDatapaths

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

IFetchDcd Exec Mem WB

Pipelining

Memory Systems

I/O

27/ 77

שקופית מאחת ההרצאות לקראת סוף הסמסטר

28/ 77

Course Administration Instructors:

Nathan Intrator (nin@post.tau.ac.il)

TA: Kiril Solovey (kirilsolo@gmail.com )

http://cs.tau.ac.il/~nin/Courses/CompStruct/CompStruct.htm

http://virtual.tau.ac.il

Books:

1. V. C. Hamacher, Z. G. Vranesic, S. G. Zaky Computer Organization. McGraw-Hill, 1982

2. H. Taub Digital Circuits and Microporcessors. McGraw-Hill 1982

3. מערכות ספרתיות בהוצאות האוניברסיטה הפתוחה

4. Hennessy and Patterson, Computer Organization Design, the hardware/software interface, Morgan Kaufman 1998

29/ 77

Gradingציון:

סופי 80%מבחן

20%תרגילים

תרגילים 7 - 6

30/ 77

Architecture & Microarchitecture Elements Architecture:

• Registers data width (8/16/32/64)

• Instruction set

• Addressing modes

• Addressing methods (Segmentation, Paging, etc...)

Architecture:• Physical memory size• Caches size and structure

• Number of execution units, number of execution pipelines

• Branch prediction

• TLB

Timing is considered Arch (though it is user visible!)

Processors with the same arch may have different Arch

31/ 77

Compatibility Backward compatibility

– New hardware can run existing software

– Example: Pentium 4 can run software originally written for Pentium III, Pentium II, Pentium , 486, 386, 286

Forward compatibility– New software can run on existing (old) hardware

– Example: new software written with MMXTM must still run on older Pentium processors which do not support MMXTM

– Less important than backward compatibility

New ideas: architecture independent– JIT – just in time compiler: Java and .NET

– Binary translation

32/ 77

How to compare between different systems?

33/ 77

Benchmarks – Programs for Evaluating Processor Performance

Toy Benchmarks– 10-100 line programs

– e.g.: sieve, puzzle, quicksort

Synthetic Benchmarks– Attempt to match average frequencies of real workloads

– e.g., Winstone, Dhrystone

Real programs– e.g., gcc, spice

SPEC: System Performance Evaluation Cooperative– SPECint (8 integer programs)

– and SPECfp (10 floating point)

34/ 77

CPI – to compare systems with same instruction set architecture

(ISA) The CPU is synchronous - it works according to a clock

signal.• Clock cycle is measured in nsec (10-9 of a second).• Clock rate (= 1/clock cycle) is measured in MHz (106

cycles/second). CPI - cycles per instruction

• Average #cycles per Instruction (in a given program)

• IPC (= 1/CPI) : Instructions per cycles

Clock rate is mainly affected by technology, CPI by the architecture

CPI breakdown: how many cycles (on average) the program spends for different causes; e.g., in executing, memory I/O etc.

CPI =#cycles required to execute the program #instruction executed in the program

36/ 77

CPU Time

CPU Time– The time required by the CPU to execute a given program:

CPU Time = clock cycle #cyc = clock cycle CPI IC

Our goal: minimize CPU Time– Minimize clock cycle: more MHz (process, circuit, Arch)

– Minimize CPI: Arch (e.g.: more execution units)

– Minimize IC: architecture (e.g.: MMXTM technology)

Speedup due to enhancement E

oEewPerformanc

EewPerformanc=

EExTimew

oEExTimew=ESpeedup

/

/

/

/

37/ 77

Speedupoverall =ExTimeold

ExTimenew

=1

Speedupenhanced

Fractionenhanced(1 - Fractionenhanced) +

ExTimenew = ExTimeold xSpeedupenhanced

Fractionenhanced(1 - Fractionenhanced) +

Suppose that enhancement E accelerates a fraction F of the task by a factor S, and the remainder of the task is unaffected, then:

Amdahl’s Law

38/ 77

• Floating point instructions improved to run 2X; but only 10% of actual instructions are FP

Speedupoverall =1

0.95= 1.053

ExTimenew = ExTimeold x (0.9 + .1/2) = 0.95 x ExTimeold

Corollary:

Make The Common Case Fast

Amdahl’s Law: Example

39/ 77

instruction set

software

hardware

Instruction Set Design

The ISA is what the user and the compiler sees

The ISA is what the hardware needs to implement

40/ 77

Why ISA is important?

Code size

• long instructions may take more time to be fetched

• Requires large memory (important in small devices, e.g., cell phones)

Number of instructions (IC)

• Reducing IC reduce execution time (assuming same CPI and frequency)

Code “simplicity”

• Simple HW implementation which leads to higher frequency and lower power

• Code optimization can better be applied to “simple code”

41/ 77

The impact of the ISA

RISC vs CISC

42/ 77

CISC Processors

CISC - Complex Instruction Set Computer

The idea: a high level machine languageCharacteristic

•Many instruction types, with many addressing modes

•Some of the instructions are complex: - Perform complex tasks- Require many cycles

•ALU operations directly on memory- Usually uses limited number of registers

•Variable length instructions- Common instructions get short codes save code

length

Example: x86

43/ 77

CISC Drawbacks

Compilers do not take advantage of the complex instructions and the complex indexing methods

Implement complex instructions and complex addressing modes complicate the processor slow down the simple, common instructions

contradict Amdahl’s law corollary:

Make The Common Case Fast

Variable length instructions are real pain in the neck:• It is difficult to decode few instructions in parallel

- As long as instruction is not decoded, its length is unknown

It is unknown where the instruction ends

It is unknown where the next instruction starts

• An instruction may not fit into the “right behavior” of the memory hierarchy (will be discussed next lectures)

Examples: VAX, x86 (!?!)

44/ 77

RISC Processors RISC - Reduced Instruction Set Computer

The idea: simple instructions enable fast hardware

Characteristic• A small instruction set, with only a few instructions formats

• Simple instructions- execute simple tasks

- require a single cycle (with pipeline)

• A few indexing methods

• ALU operations on registers only- Memory is accessed using Load and Store instructions only.

- Many orthogonal registers

- Three address machine: Add dst, src1, src2

• Fixed length instructions

Examples: MIPSTM, SparcTM, AlphaTM, PowerPCTM

45/ 77

RISC Processors (Cont.)

Simple architecture Simple micro-architecture •Simple, small and fast control logic•Simpler to design and validate•Room for on die caches: instruction cache + data

cache- Parallelize data and instruction access

•Shorten time-to-market

Using a smart compiler •Better pipeline usage•Better register allocation

Existing RISC processor are not “pure” RISC •e.g., support division which takes many cycles

46/ 77

RISC and Amdhal’s Law (Example)

In comparison to the CISC architecture:• 10% of the static code, that executes 90% of the

dynamic has the same CPI

• 90% of the static code, which is only 10% of the dynamic, increases in 60%

• The number of instruction being executed is increased in 50%

• The speed of the processor is doubled - This was true for the time the RISC processors were

invented

We get

And then

1.061.60.10.91 =+=Speedup

Fraction+Fraction=

CPI

CPI

enhanced

enhancedenhanced

old

new

Speedup overall=CPU TimeoldCPU Timenew

=clockoldclock new

∗CPI oldCPI new

∗IC old

IC new=2/1.06∗1.5=1.26

47/ 77

So, what is better, RISC or CISC

Today CISC architectures (X86) are running as fast as RISC (or even faster)

The main reasons are:• Translates CISC instructions into RISC instructions

(ucode)

• CISC architecture are using “RISC like engine”

We will discuss this kind of solutions later on in this course.

48/ 77

Year

Tra

nsis

tors

1000

10000

100000

1000000

10000000

100000000

1970 1975 1980 1985 1990 1995 2000

i80386

i4004

i8080

Pentium

i80486

i80286

i8086

Technology Trends: Microprocessor Complexity

2X transistors/ChipEvery 1.5 years

Called “Moore’s Law”

Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million

Moore’s Law

Athlon (K7): 22 Million

Itanium 2: 410 Million

49/ 77

50/ 77

51/ 77

Technology Trends: Processor Performance

0100200300400500600700800900

87 88 89 90 91 92 93 94 95 96 97

DEC Alpha 21264/600

DEC Alpha 5/500

DEC Alpha 5/300

DEC Alpha 4/266

IBM POWER 100

1.54X/yr

Intel P4 2000 MHz(Fall 2001)

year

Per

form

ance

mea

sure

52/ 77

Technology Trends: Memory Capacity(Single-Chip DRAM)

size

Year

Bit

s

1000

10000

100000

1000000

10000000

100000000

1000000000

1970 1975 1980 1985 1990 1995 2000

year size (Mbit)

1980 0.0625

1983 0.25

1986 1

1989 4

1992 16

1996 64

1998 128

2000 256

2002 512• Now 1.4X/yr, or 2X every 2 years.• 8000X since 1980!

53/ 77

Technology Trends Imply Dramatic Change

Processor• Logic capacity: about 30% per year

• Clock rate: about 20% per year

Memory• DRAM capacity: about 60% per year (4x every 3

years)

• Memory speed: about 10% per year

• Cost per bit: improves about 25% per year

Disk• Capacity: about 60% per year

• Total data use: 100% per 9 months!

Network Bandwidth• Bandwidth increasing more than 100% per year!

54/ 77

1980-2003, CPU--DRAM Speed gap

10

DRAM

CPU

Performance(1/latency)

100

1000

1980

2000

1990 Year

Gap grew 50% per year

Q. How do architects address this gap?

A. Put smaller, faster “cache” memories between CPU and DRAM.

10000The

power wall

2005

CPU60% per yr2X in 1.5 yrs

DRAM9% per yr2X in 10 yrs

55/ 77

Dimensions

1 cm 1 mm 0.1 mm 10µm 1 µm 0.1 µm 10 nm 1 nm 1 Å

Chip size(1 cm)

Diameter ofHuman Hair

(25 µm)

1996 devices(0.35 µm)

2007 devices(0.01 µm)

Siliconatomradius

(1.17 Å)

Deep UVWavelength(0.248 µm)

X-rayWavelength

(0.6 nm)

2001 devices(0.18 µm)

2005: 0.12 10e-6 = 1.2 10e-7

2006: 0.04 10e-6

Demo

56/ 77

ארכיטקטורת מחשבים בשנים הבאות :אנרגיה / צריכת חשמל בעבר non issue.

:היום Power Wall.חשמל יקר. טרנזיסטורים הם בחינם

:ביצועים משתפרים ע"י מיקבול ברמת פקודות המכונה, בעבר ,pipelining יחיד (CPUקומפיילרים חכמים, וארכיטקטורות

superscalar, out-of-order execution, speculations(

:היום ILP Wall.שיפורי חומרה לשיפור ביצועים לא משתלם

:כפל איטי, גישה לזיכרון מהירה.בעבר

:היום Memory Wall.כפל מהיר גישות לזיכרון איטיות

מחזורים לכפל)DRAM 4 מחזורי שעון ל200 (

:ביצועי מעבד יחיד בעבר X 2 שנים.1.5 כל

:אולי כל הנ"ל היום :X 2 שנים??5 כל

ליבות 40 עד 4) כל שנתיים. היום Cores מעבדים (ליבות X 2 אבל למעבד

57/ 77

Physics / Transistor’s History

First point contact transistor (germanium), 1947John Bardeen and Walter Brattain

Bell Laboratories

Audion (Triode), 1906Lee De Forest

19061906 19471947

58/ 77

History

Intel Pentium II, 1997Clock: 233MHz

Number of transistors: 7.5 MGate Length: 0.35

First integrated circuit (germanium), 1958Jack S. Kilby, Texas Instruments

Contained five components, three types:transistors resistors and capacitors

19581958 19971997

59/ 77

Annual Sales

1018 transistors manufactured in 2003 alone• 100 million for every human on the planet

0

50

100

150

200

1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002

Year

Global S

emiconductor B

illings(B

illions of US

$)

60/ 77

61/ 77

62/ 77

63/ 77

Integrated Circuits (2003 state-of-the-art)

Primarily Crystalline Silicon

1mm - 25mm on a side

2003 - feature size ~ 0.13µm = 0.13 x 10-6 m

100 - 400M transistors

(25 - 100M “logic gates")

3 - 10 conductive layers

“CMOS” (complementary metal oxide semiconductor) - most common.

Package provides:

• spreading of chip-level signal paths to board-level

• heat dissipation.

Ceramic or plastic with gold wires.

Chip in Package

Bare Die

64/ 77

Printed Circuit Boards

fiberglass or ceramic

1-20 conductive layers

1-20in on a side

IC packages are soldered down.

65/ 77

nMOS Transistor

Four terminals: gate, source, drain, body

Gate – oxide – body stack looks like a capacitor• Gate and body are conductors

• SiO2 (oxide) is a very good insulator

• Called metal – oxide – semiconductor (MOS) capacitor

• Even though gate is

no longer made of metal

n+

p

GateSource Drain

bulk Si

SiO2

Polysilicon

n+

Off

Onn+ n+

p-type body

W

L

tox

SiO2 gate oxide(good insulator, ox = 3.90)

polysilicongate

66/ 77

nMOS Operation

Body is commonly tied to ground (0 V)

When the gate is at a low voltage:• P-type body is at low voltage

• Source-body and drain-body diodes are OFF

• No current flows, transistor is OFF

n+

p

GateSource Drain

bulk Si

SiO2

Polysilicon

n+D

0

S

Off

67/ 77

nMOS Operation Cont.

When the gate is at a high voltage:• Positive charge on gate of MOS capacitor

• Negative charge attracted to body

• Inverts a channel under gate to n-type

• Now current can flow through n-type silicon from source through channel to drain, transistor is ON

n+

p

GateSource Drain

bulk Si

SiO2

Polysilicon

n+D

1

SOn

68/ 77

pMOS Transistor

Similar, but doping and voltages reversed

• Body tied to high voltage (VDD)

• Gate low: transistor ON

• Gate high: transistor OFF

• Bubble indicates inverted behavior

SiO2

n

GateSource Drain

bulk Si

Polysilicon

p+ p+

69/ 77

70/ 77

Example: Inverter

71/ 77

Example: NAND3

Horizontal N-diffusion and p-diffusion strips

Vertical polysilicon gates

Metal1 VDD rail at top

Metal1 GND rail at bottom

32 by 40

72/ 77

73/ 77

74/ 77

CMOS Inverter

A Y

0

1

VDD

A Y

GNDA Y

75/ 77

CMOS Inverter

A Y

0

1 0

VDD

A=1 Y=0

GND

ON

OFF

A Y

76/ 77

CMOS Inverter

A Y

0 1

1 0

VDD

A=0 Y=1

GND

OFF

ON

A Y

77/ 77

78/ 77

79/ 77

Multiplexers

2:1 multiplexer chooses between two inputs

S D1 D0 Y

0 X 0

0 X 1

1 0 X

1 1 X

0

1

S

D0

D1Y

80/ 77

Multiplexers

2:1 multiplexer chooses between two inputs

S D1 D0 Y

0 X 0 0

0 X 1 1

1 0 X 0

1 1 X 1

0

1

S

D0

D1Y

81/ 77

Transmission Gate Mux

Nonrestoring mux uses two transmission gates• Only 4 transistors

S

S

D0

D1

YS

82/ 77

out

83/ 77

מה למדנו היוםComputer Architecture: integrates few levels, from programming languages to logic design.

Instruction Set Architecture (ISA)

Amdahl’s law

Moor’s law

Processor (CPU) --- Memory speed gap

History

Transistors. What, and how.

From transistors to logic design

Recommended