Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1성균관대 소프트웨어대학신동군
Computer Architecture
Chapter 4-1
The Processor:Datapath and Control (PART 1)Single-Cycle Implementation
2성균관대 소프트웨어대학신동군
Single-Cycle Implementation Outline
• The Big Picture
– We're ready to look at an implementation of the MIPS
• MIPS ISA Subset
• Clocking Methodology
• Datapath Components
• We will examine two MIPS implementations
– A simplified version
– A more realistic pipelined version
• Single-Cycle Implementation
– Assembling the Datapath
– Controlling the machine
– Advantages and Disadvantages
3성균관대 소프트웨어대학신동군
Performance Impact
• Performance of a machine is determined by
– Instruction count
– Clock cycle time
– Clock cycles per instruction (CPI)
• Instruction count
– Determined by ISA and compiler
• Processor design (datapath and control) determines
– Clock cycle time
– CPI (for fixed instruction mix)
• In this part: Single-cycle implementation
– Advantage• Only one clock cycle per instruction
– Disadvantages• Long cycle time
• Inefficient utilization of memory and function units
4성균관대 소프트웨어대학신동군
MIPS Instruction Formats (Review)
op: operation of the instructionrs, rt, rd: source/destination register specifiersshamt: shift amountfunct: selects variant of operation in op fieldaddress/immediate: address offset or imm. valuetarget address: target address of jump instruction
op target address
02631
6 bits 26 bits
op rs rt rd shamt funct
061116212631
6 bits 6 bits5 bits5 bits5 bits5 bits
op rs rt immediate
016212631
6 bits 16 bits5 bits5 bits
Three Instruction Formats
R-type
I-type
J-type
5성균관대 소프트웨어대학신동군
MIPS Subset
• load word (lw) and store word (sw)
• add, sub, and, or & slt
• beq & j
6성균관대 소프트웨어대학신동군
Implementation Overview
• Data “flows” through memory and functional units
1. Use the program counter (PC) to supply instruction address2. Get the instruction from memory3. Read registers4. Use the instruction to decide exactly what to do
7성균관대 소프트웨어대학신동군
Multiplexers
Can’t just join
wires together
Use multiplexers
8성균관대 소프트웨어대학신동군
Control
9성균관대 소프트웨어대학신동군
Clocking Methodology
• Combinational logic transforms data during clock cycles
– Between clock edges
– Input from state elements, output to state element
– Longest delay determines clock period
10성균관대 소프트웨어대학신동군
Datapath Combinational Logic Elements
• Adder
• MUX
• ALU
32
32
A
B
32Sum
Carry
32
32
A
B
32Result
OP
32A
B32
Y32
Select
Ad
der
MU
XA
LU
CarryIn
3
Combinational Logic:Does not use a clock
11성균관대 소프트웨어대학신동군
Sequential Elements
• Register: stores data in a circuit
– Uses a clock signal to determine when to update the stored value
– Edge-triggered: update when Clk changes from 0 to 1
D
Clk
Q
Clk
D
Q
12성균관대 소프트웨어대학신동군
Sequential Elements
• Register with write control
– Only updates on clock edge when write control input is 1
– Used when stored value is required later
D
Clk
Q
Write
Write
D
Q
Clk
13성균관대 소프트웨어대학신동군
Storage Element: Register File
• Register File consists of 32 registers:
– Two 32-bit output busses
– One 32-bit input bus:
– Register 0 hard-wired to value 0
• Register is selected by:
– Read register number1
– Read register number2
– Write register
Read registernumber 1 Read
data 1Read registernumber 2
Readdata 2
Writeregister
Write
Writedata
Register file
14성균관대 소프트웨어대학신동군
Two Read Ports
Read register
number 1
(n-bits)Register 0
Register 1
. . .
Register 2n-2
Register 2n-1
M
u
x
Read register
M
u
x
Read data 1
Read data 2
number 2
(n-bits)
15성균관대 소프트웨어대학신동군
Write Port
Write
0
1
n-to-2n
decoder
2n-2
2n-1
Register 0
C
D
Register 1
C
D
Register 2n-2
C
D
Register 2n-1
C
D
.
.
.
Register number...
Register data
16성균관대 소프트웨어대학신동군
Storage Element: Idealized Memory
• Memory (idealized)
– One input bus: Data In
– One output bus: Data Out
• Memory word is selected by:
– Address selects the word to put on Data Out
– Write Enable = 1: address selects the memory
– memory word to be written via the Data In bus
• Clock input (CLK)
– The CLK input is a factor only for write operation (e.g. SDRAM)
Clk
Data In
Write Enable
32 32
DataOut
Address
32
17성균관대 소프트웨어대학신동군
Instruction Fetch Unit
• Common RTL operations
– Fetch the Instruction: mem[PC]
– Update the program counter:
• Sequential Code: PC PC + 4
• Branch and Jump: PC ”something else”
PC
Instructionaddress
Instruction
Instruction
memory
Add Sum
a. Instruction memory b. Program counter c. Adder
18성균관대 소프트웨어대학신동군
Instruction Fetch Unit
32-bit
register
Increment by
4 for next
instruction
19성균관대 소프트웨어대학신동군
R-format ALU Operations
• Read two register operands
• Perform arithmetic/logical operation
• Write register result
20성균관대 소프트웨어대학신동군
R-format ALU Operations
Readregister 1
Readregister 2
Writeregister
WriteData
Registers ALU
Zero
ALUresult
RegWrite
Readdata 1
Readdata 2
ALU operation4
Instruction
21성균관대 소프트웨어대학신동군
Load & Store
• Read register operands
• Calculate address using 16-bit offset
– Use ALU, but sign-extend offset
• Load: Read memory and update register
• Store: Write register value to memory
22성균관대 소프트웨어대학신동군
Load & Store
Readregister 1
Readregister 2
Writeregister
WriteData
Registers ALUZero
ALUresult
RegWrite
Readdata 1
Readdata 2
ALUoperation4
Instruction
AddressReaddata
Data
memoryWritedata
MemRead
MemWrite
Signextend
16 32
23성균관대 소프트웨어대학신동군
Branch
Just
re-routes
wires
Sign-bit wire
replicated
24성균관대 소프트웨어대학신동군
Simple Implementation
• Execute all instructions in one clock cycle
• Use multiplexors to switch them together
Memory Instructions & R-type Instructions
25성균관대 소프트웨어대학신동군
Full Datapath
26성균관대 소프트웨어대학신동군
Control
• Selects the operations to perform (ALU, read/write, etc.)
• Controls the flow of data (multiplexor inputs)
• Information comes from the 32 bits of the instruction
• Example:
add $8, $17, $18 Instruction Format:
• ALU's operation based on instruction type and function code
27성균관대 소프트웨어대학신동군
ALU Control
• Q: What should the ALU do with this instruction?
• Example: lw $1, 100($2)
• 4-bit ALU control input
ALU control Function
0000 AND
0001 OR
0010 add
0110 subtract
0111 set-on-less-than
1100 NOR
28성균관대 소프트웨어대학신동군
ALU Control
• Must describe hardware to compute 4-bit ALU control input
• Assume 2-bit ALUOp derived from opcode
– Combinational logic derives ALU control
– given instruction type
00 = lw, sw
01 = beq,
10 = arithmetic
– function code for arithmetic
• Describe it using a truth table (can turn into gates):
ALUOpcomputed from instruction type
29성균관대 소프트웨어대학신동군
The Main Control Unit
• Control signals derived from instruction
0 rs rt rd shamt funct
31:26 5:025:21 20:16 15:11 10:6
35 or 43 rs rt address
31:26 25:21 20:16 15:0
4 rs rt address
31:26 25:21 20:16 15:0
R-type
Load/
Store
Branch
opcode always
read
read,
except
for load
write for
R-type
and load
sign-extend
and add
30성균관대 소프트웨어대학신동군
Simple Datapath with Control Unit
31성균관대 소프트웨어대학신동군
R-Type Instruction
32성균관대 소프트웨어대학신동군
Load Instruction
33성균관대 소프트웨어대학신동군
Branch-on-Equal Instruction
34성균관대 소프트웨어대학신동군
Control Signals
• RegDst: rt (bits 20-16) v.s. rd (bits 15-11)
• RegWrite:
• ALUSrc: Read data 2 v.s. Sign-extended 16 bits
• PCSrc: PC+4 v.s. Branch target
• MemRead:
• MemWrite:
• MemtoReg: from the ALU v.s. from the data memory
Instruction RegDst ALUSrc
Memto-
Reg
Reg
Write
Mem
Read
Mem
Write Branch ALUOp1 ALUp0
R-format 1 0 0 1 0 0 0 1 0lw 0 1 1 1 1 0 0 0 0
sw X 1 X 0 0 1 0 0 0
beq X 0 X 0 0 0 1 0 1
35성균관대 소프트웨어대학신동군
Implementation of Combinational Control Unit
• Simple combinational logic (truth tables)
36성균관대 소프트웨어대학신동군
PLA Implementation
37성균관대 소프트웨어대학신동군
Jump
38성균관대 소프트웨어대학신동군
Jump
39성균관대 소프트웨어대학신동군
Performance of Single Cycle Implementation
• Calculate cycle time assuming negligible delays except:
– memory (2ns), ALU and adders (2ns), register file access (1ns)
40성균관대 소프트웨어대학신동군
1 Clock Cycle of Fixed Length vs. Variable Length
• Loads 24% Stores 12% R-formats 44%
• Branchs 18% Jumps 2%
41성균관대 소프트웨어대학신동군
1 Clock Cycle of Fixed Length vs. Variable Length
• CPU clock cycleFixed = 8 ns
• CPU clock cycleVAR =
8x24% + 7x12% + 6x44% + 5x18% + 2x2% = 6.3 ns
Implementing a variable-speed clock for each instruction class is not practical.
Alternative is to use a shorter clock cycle andvary the number of clock cycles for different inst. classes.
CPU performance variable clock 8
CPU performance fixed clock 6.3= 1.27=
42성균관대 소프트웨어대학신동군
Disadvantages of a single-cycle machine
• Worst case delay for all instructions
– violates “Make the common case fast”
• Inefficient Usage of Functional Units
– functional units are used only once per cycle