Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
1성균관대소프트웨어대학신동군
Computer Architecture
Chapter 4-2
The Processor:Enhancing Performance with Pipelining
2성균관대소프트웨어대학신동군
Pipelining is Natural!
• Laundry Example
• Ann, Brian, Cathy and Don each have one load of clothes to wash, dry, fold, and store
• Washer takes 30 minutes
• Dryer takes 30 minutes
• Folder takes 30 minutes
• Storer takes 30 minutes to put clothes into drawers
3성균관대소프트웨어대학신동군
Sequential Laundry
• Sequential laundry takes 8 hours for 4 loads
• If they learned pipelining, how long would laundry take?
Task
Ord
er
4성균관대소프트웨어대학신동군
Pipelined Laundry: Start work ASAP
• Pipelined laundry takes 3.5 hours for 4 loads!
Task
Ord
er
5성균관대소프트웨어대학신동군
Pipelining Lessons
• Pipelining doesn’t help latency of single task, it helps throughput of entire workload
• Multiple tasks operating simultaneously using different resources
• Potential speedup
= Number pipeline stages
• Pipeline rate limited by slowest pipeline stage
• Unbalanced lengths of pipe stages reduces speedup
• Time to fill pipeline and time to drain it reduces speedup
• Stall for Dependences
Task
Ord
er
6성균관대소프트웨어대학신동군
Pipelining
Cafeteria Assembly line
7성균관대소프트웨어대학신동군
The Five Stages of Load
• Ifetch: Instruction Fetch
– Fetch the instruction from the Instruction Memory
• Reg/Dec: Registers Fetch and Instruction Decode
• Exec: Calculate the memory address
• Mem: Read the data from the Data Memory
• Wr: Write the data back to the register file
8성균관대소프트웨어대학신동군
Pipelining
• Improve performance by increasing instruction throughput
Single-cycle (Tc= 800ps)
Pipelined (Tc= 200ps)
9성균관대소프트웨어대학신동군
Ideal Speedup
• Q: Ideal speedup is number of stages in the pipeline.
• Do we achieve this?
– Imperfect balance & Pipeline overhead
• For 1003 instructions:
– non-pipelined: 1003*800ps = 802,400ps
– pipelined: 1003*200ps+4*200ps = 201,400ps
Pipelining: improves by increasing instruction throughputNOT by decreasing the exec. time of an individual instruction
Time between instructionspipelined
Time between instructionsno-pipelined
Number of pipe stages=
8024
2014
8
2
10성균관대소프트웨어대학신동군
Instruction Set for Pipelining
• MIPS is made for pipelining!
– All instructions are the same length:
• Helps Ifetch & Decode State
– A few instruction formats & fixed source operands fields
• Can read operands & decode opcode at the same time
– Memory operands only appear in loads/stores
• Calculate the memory address in Execute stage
– Operands must be aligned in memory
• One data memory access for a single data transfer instruction
11성균관대소프트웨어대학신동군
Can pipelining get us into trouble?
• Yes: Pipeline Hazards
– structural hazards: attempt to use the same resource two different ways at the same time
– data hazards: attempt to use item before it is ready
• instruction depends on result of prior instruction still in the pipeline
– control hazards: attempt to make a decision before condition is evaluated
• branch instructions
• Can always resolve hazards by waiting
– pipeline control must detect the hazard
– take action (or delay action) to resolve hazards
12성균관대소프트웨어대학신동군
Computer Pipelining Hazards
• Structural Hazards
– Conflict for use of a resource
– With a single memory, Instruction fetch would have to stall
– Cause a pipeline “bubble”
– Require separate instruction/data memories (or caches)
13성균관대소프트웨어대학신동군
Computer Pipelining Hazards
• Data Hazards
14성균관대소프트웨어대학신동군
Forwarding/Bypassing
• Observation:
– Don’t need to wait for the instruction to complete.
– As soon as the ALU computes the sum for the add, we can supply it as an input for the subtract
15성균관대소프트웨어대학신동군
Forwarding/Bypassing
• Load-Use Data Hazard
• Can’t always avoid stalls by forwarding
Stall even with forwarding
16성균관대소프트웨어대학신동군
Pipeline Scheduling
• Compiler can re-schedule instructions to avoid stalls
Assumption: No memory-to-memory forwarding
17성균관대소프트웨어대학신동군
Pipeline Scheduling
• Compiler can re-schedule instructions to avoid stalls
18성균관대소프트웨어대학신동군
Computer Pipelining Hazards
• Control Hazards: Bubble or Pipeline Stall on Branch
Assumption: In the second stage,branch is resolved,target address computed, andthe PC updated
19성균관대소프트웨어대학신동군
Computer Pipelining Hazards
• Control Hazards: Branch prediction
Prediction
correct
Prediction
incorrect
20성균관대소프트웨어대학신동군
Computer Pipelining Hazards
• Control Hazards: Delayed branch
– Solution used in MIPS
– If you run SPIM in a bare mode, you must pay attention to delayed branches
– Compiler fills delayed branch slots
21성균관대소프트웨어대학신동군
Single-Cycle Datapath
• What do we need to add to actually split the datapath into stages?
22성균관대소프트웨어대학신동군
Pipelined Datapath
23성균관대소프트웨어대학신동군
LW: IF Stage
24성균관대소프트웨어대학신동군
LW: ID Stage
25성균관대소프트웨어대학신동군
LW: EX Stage
26성균관대소프트웨어대학신동군
LW: MEM Stage
27성균관대소프트웨어대학신동군
LW: WB Stage
Can you find a BUG here?
28성균관대소프트웨어대학신동군
Corrected Version for Load
29성균관대소프트웨어대학신동군
EX for Store
30성균관대소프트웨어대학신동군
MEM for Store
31성균관대소프트웨어대학신동군
WB for Store
32성균관대소프트웨어대학신동군
Graphically Representing Pipelines
• Can help with answering questions like:
– how many cycles does it take to execute this code?
– what is the ALU doing during cycle 4?
– use this representation to help understand datapaths
33성균관대소프트웨어대학신동군
Conventional Pipelined Execution Representation
34성균관대소프트웨어대학신동군
Single-Cycle Pipeline Diagram
35성균관대소프트웨어대학신동군
Single Cycle, Multiple Cycle, vs. Pipeline
36성균관대소프트웨어대학신동군
Why Pipeline?
• Suppose we execute 100 instructions
• Single Cycle Machine
– 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
• Multicycle Machine
– 10 ns/cycle x 4.6 CPI (due to inst mix) x 100 inst = 4600 ns
• Ideal pipelined machine
– 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns
37성균관대소프트웨어대학신동군
Why Pipeline? Because the resources are there!
38성균관대소프트웨어대학신동군
Pipelined Control
39성균관대소프트웨어대학신동군
Pipelined Control
• Divide control lines into five groups according to pipeline stages
• What needs to be controlled in each stage?
– Instruction Fetch and PC Increment
– Instruction Decode / Register Fetch
– Execution
– Memory Stage
– Write Back
40성균관대소프트웨어대학신동군
Pipelined Control
• Pass control signals along just like the data
41성균관대소프트웨어대학신동군
Pipelined Control
Control signalsfor last three stagescreated in ID
42성균관대소프트웨어대학신동군
Pipelined Execution Example
lw $10, 20($1)
sub $11, $2, $3
and $12, $4, $5
or $13, $6, $7
add $14, $8, $9
43성균관대소프트웨어대학신동군
Clock 1
44성균관대소프트웨어대학신동군
Clock 2
45성균관대소프트웨어대학신동군
Clock 3
46성균관대소프트웨어대학신동군
Clock 4
47성균관대소프트웨어대학신동군
Clock 5
48성균관대소프트웨어대학신동군
Clock 6
49성균관대소프트웨어대학신동군
Clock 7
50성균관대소프트웨어대학신동군
Clock 8
51성균관대소프트웨어대학신동군
Clock 9