42
1 성균관대 소프트웨어대학 신동군 Computer Architecture Chapter 4-1 The Processor: Datapath and Control (PART 1) Single-Cycle Implementation

Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

1성균관대 소프트웨어대학신동군

Computer Architecture

Chapter 4-1

The Processor:Datapath and Control (PART 1)Single-Cycle Implementation

Page 2: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

2성균관대 소프트웨어대학신동군

Single-Cycle Implementation Outline

• The Big Picture

– We're ready to look at an implementation of the MIPS

• MIPS ISA Subset

• Clocking Methodology

• Datapath Components

• We will examine two MIPS implementations

– A simplified version

– A more realistic pipelined version

• Single-Cycle Implementation

– Assembling the Datapath

– Controlling the machine

– Advantages and Disadvantages

Page 3: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

3성균관대 소프트웨어대학신동군

Performance Impact

• Performance of a machine is determined by

– Instruction count

– Clock cycle time

– Clock cycles per instruction (CPI)

• Instruction count

– Determined by ISA and compiler

• Processor design (datapath and control) determines

– Clock cycle time

– CPI (for fixed instruction mix)

• In this part: Single-cycle implementation

– Advantage• Only one clock cycle per instruction

– Disadvantages• Long cycle time

• Inefficient utilization of memory and function units

Page 4: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

4성균관대 소프트웨어대학신동군

MIPS Instruction Formats (Review)

op: operation of the instructionrs, rt, rd: source/destination register specifiersshamt: shift amountfunct: selects variant of operation in op fieldaddress/immediate: address offset or imm. valuetarget address: target address of jump instruction

op target address

02631

6 bits 26 bits

op rs rt rd shamt funct

061116212631

6 bits 6 bits5 bits5 bits5 bits5 bits

op rs rt immediate

016212631

6 bits 16 bits5 bits5 bits

Three Instruction Formats

R-type

I-type

J-type

Page 5: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

5성균관대 소프트웨어대학신동군

MIPS Subset

• load word (lw) and store word (sw)

• add, sub, and, or & slt

• beq & j

Page 6: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

6성균관대 소프트웨어대학신동군

Implementation Overview

• Data “flows” through memory and functional units

1. Use the program counter (PC) to supply instruction address2. Get the instruction from memory3. Read registers4. Use the instruction to decide exactly what to do

Page 7: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

7성균관대 소프트웨어대학신동군

Multiplexers

Can’t just join

wires together

Use multiplexers

Page 8: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

8성균관대 소프트웨어대학신동군

Control

Page 9: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

9성균관대 소프트웨어대학신동군

Clocking Methodology

• Combinational logic transforms data during clock cycles

– Between clock edges

– Input from state elements, output to state element

– Longest delay determines clock period

Page 10: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

10성균관대 소프트웨어대학신동군

Datapath Combinational Logic Elements

• Adder

• MUX

• ALU

32

32

A

B

32Sum

Carry

32

32

A

B

32Result

OP

32A

B32

Y32

Select

Ad

der

MU

XA

LU

CarryIn

3

Combinational Logic:Does not use a clock

Page 11: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

11성균관대 소프트웨어대학신동군

Sequential Elements

• Register: stores data in a circuit

– Uses a clock signal to determine when to update the stored value

– Edge-triggered: update when Clk changes from 0 to 1

D

Clk

Q

Clk

D

Q

Page 12: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

12성균관대 소프트웨어대학신동군

Sequential Elements

• Register with write control

– Only updates on clock edge when write control input is 1

– Used when stored value is required later

D

Clk

Q

Write

Write

D

Q

Clk

Page 13: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

13성균관대 소프트웨어대학신동군

Storage Element: Register File

• Register File consists of 32 registers:

– Two 32-bit output busses

– One 32-bit input bus:

– Register 0 hard-wired to value 0

• Register is selected by:

– Read register number1

– Read register number2

– Write register

Read registernumber 1 Read

data 1Read registernumber 2

Readdata 2

Writeregister

Write

Writedata

Register file

Page 14: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

14성균관대 소프트웨어대학신동군

Two Read Ports

Read register

number 1

(n-bits)Register 0

Register 1

. . .

Register 2n-2

Register 2n-1

M

u

x

Read register

M

u

x

Read data 1

Read data 2

number 2

(n-bits)

Page 15: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

15성균관대 소프트웨어대학신동군

Write Port

Write

0

1

n-to-2n

decoder

2n-2

2n-1

Register 0

C

D

Register 1

C

D

Register 2n-2

C

D

Register 2n-1

C

D

.

.

.

Register number...

Register data

Page 16: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

16성균관대 소프트웨어대학신동군

Storage Element: Idealized Memory

• Memory (idealized)

– One input bus: Data In

– One output bus: Data Out

• Memory word is selected by:

– Address selects the word to put on Data Out

– Write Enable = 1: address selects the memory

– memory word to be written via the Data In bus

• Clock input (CLK)

– The CLK input is a factor only for write operation (e.g. SDRAM)

Clk

Data In

Write Enable

32 32

DataOut

Address

32

Page 17: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

17성균관대 소프트웨어대학신동군

Instruction Fetch Unit

• Common RTL operations

– Fetch the Instruction: mem[PC]

– Update the program counter:

• Sequential Code: PC PC + 4

• Branch and Jump: PC ”something else”

PC

Instructionaddress

Instruction

Instruction

memory

Add Sum

a. Instruction memory b. Program counter c. Adder

Page 18: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

18성균관대 소프트웨어대학신동군

Instruction Fetch Unit

32-bit

register

Increment by

4 for next

instruction

Page 19: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

19성균관대 소프트웨어대학신동군

R-format ALU Operations

• Read two register operands

• Perform arithmetic/logical operation

• Write register result

Page 20: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

20성균관대 소프트웨어대학신동군

R-format ALU Operations

Readregister 1

Readregister 2

Writeregister

WriteData

Registers ALU

Zero

ALUresult

RegWrite

Readdata 1

Readdata 2

ALU operation4

Instruction

Page 21: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

21성균관대 소프트웨어대학신동군

Load & Store

• Read register operands

• Calculate address using 16-bit offset

– Use ALU, but sign-extend offset

• Load: Read memory and update register

• Store: Write register value to memory

Page 22: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

22성균관대 소프트웨어대학신동군

Load & Store

Readregister 1

Readregister 2

Writeregister

WriteData

Registers ALUZero

ALUresult

RegWrite

Readdata 1

Readdata 2

ALUoperation4

Instruction

AddressReaddata

Data

memoryWritedata

MemRead

MemWrite

Signextend

16 32

Page 23: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

23성균관대 소프트웨어대학신동군

Branch

Just

re-routes

wires

Sign-bit wire

replicated

Page 24: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

24성균관대 소프트웨어대학신동군

Simple Implementation

• Execute all instructions in one clock cycle

• Use multiplexors to switch them together

Memory Instructions & R-type Instructions

Page 25: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

25성균관대 소프트웨어대학신동군

Full Datapath

Page 26: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

26성균관대 소프트웨어대학신동군

Control

• Selects the operations to perform (ALU, read/write, etc.)

• Controls the flow of data (multiplexor inputs)

• Information comes from the 32 bits of the instruction

• Example:

add $8, $17, $18 Instruction Format:

• ALU's operation based on instruction type and function code

Page 27: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

27성균관대 소프트웨어대학신동군

ALU Control

• Q: What should the ALU do with this instruction?

• Example: lw $1, 100($2)

• 4-bit ALU control input

ALU control Function

0000 AND

0001 OR

0010 add

0110 subtract

0111 set-on-less-than

1100 NOR

Page 28: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

28성균관대 소프트웨어대학신동군

ALU Control

• Must describe hardware to compute 4-bit ALU control input

• Assume 2-bit ALUOp derived from opcode

– Combinational logic derives ALU control

– given instruction type

00 = lw, sw

01 = beq,

10 = arithmetic

– function code for arithmetic

• Describe it using a truth table (can turn into gates):

ALUOpcomputed from instruction type

Page 29: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

29성균관대 소프트웨어대학신동군

The Main Control Unit

• Control signals derived from instruction

0 rs rt rd shamt funct

31:26 5:025:21 20:16 15:11 10:6

35 or 43 rs rt address

31:26 25:21 20:16 15:0

4 rs rt address

31:26 25:21 20:16 15:0

R-type

Load/

Store

Branch

opcode always

read

read,

except

for load

write for

R-type

and load

sign-extend

and add

Page 30: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

30성균관대 소프트웨어대학신동군

Simple Datapath with Control Unit

Page 31: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

31성균관대 소프트웨어대학신동군

R-Type Instruction

Page 32: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

32성균관대 소프트웨어대학신동군

Load Instruction

Page 33: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

33성균관대 소프트웨어대학신동군

Branch-on-Equal Instruction

Page 34: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

34성균관대 소프트웨어대학신동군

Control Signals

• RegDst: rt (bits 20-16) v.s. rd (bits 15-11)

• RegWrite:

• ALUSrc: Read data 2 v.s. Sign-extended 16 bits

• PCSrc: PC+4 v.s. Branch target

• MemRead:

• MemWrite:

• MemtoReg: from the ALU v.s. from the data memory

Instruction RegDst ALUSrc

Memto-

Reg

Reg

Write

Mem

Read

Mem

Write Branch ALUOp1 ALUp0

R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0

sw X 1 X 0 0 1 0 0 0

beq X 0 X 0 0 0 1 0 1

Page 35: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

35성균관대 소프트웨어대학신동군

Implementation of Combinational Control Unit

• Simple combinational logic (truth tables)

Page 36: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

36성균관대 소프트웨어대학신동군

PLA Implementation

Page 37: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

37성균관대 소프트웨어대학신동군

Jump

Page 38: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

38성균관대 소프트웨어대학신동군

Jump

Page 39: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

39성균관대 소프트웨어대학신동군

Performance of Single Cycle Implementation

• Calculate cycle time assuming negligible delays except:

– memory (2ns), ALU and adders (2ns), register file access (1ns)

Page 40: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

40성균관대 소프트웨어대학신동군

1 Clock Cycle of Fixed Length vs. Variable Length

• Loads 24% Stores 12% R-formats 44%

• Branchs 18% Jumps 2%

Page 41: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

41성균관대 소프트웨어대학신동군

1 Clock Cycle of Fixed Length vs. Variable Length

• CPU clock cycleFixed = 8 ns

• CPU clock cycleVAR =

8x24% + 7x12% + 6x44% + 5x18% + 2x2% = 6.3 ns

Implementing a variable-speed clock for each instruction class is not practical.

Alternative is to use a shorter clock cycle andvary the number of clock cycles for different inst. classes.

CPU performance variable clock 8

CPU performance fixed clock 6.3= 1.27=

Page 42: Computer Architecture Chapter 4-1 The Processor: Datapath and …nyx.skku.ac.kr/wp-content/uploads/2017/08/CA-lec4-1.pdf · 2017-10-09 · 성균관대소프트웨어대학신동군

42성균관대 소프트웨어대학신동군

Disadvantages of a single-cycle machine

• Worst case delay for all instructions

– violates “Make the common case fast”

• Inefficient Usage of Functional Units

– functional units are used only once per cycle