View
102
Download
0
Category
Preview:
Citation preview
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-1
Chapter SixPipelining
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-2
Pipelining: analogia com linha de produção
• Tempo de fabricação de um carro: C+M+C+P+A• Taxa de produção (carros/h): medido na saída
– throughput: depende do número de estágios e do balanceamento
• Objetivo do pipeline:– distribuir e balancear o tempo em cada estágio– otimizar o uso de HW dos estágios (taxa de ocupação)– resumo: aumentar a velocidade e diminuir o hardware
carro1 Chassi Mec Carroc. Pint. Acab.
carro2 Chassi Mec Carroc. Pint. Acab.
carro3 Chassi Mec Carroc. Pint. Acab.
tempo
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-3
Pipelining• Improve perfomance by increasing instruction throughput
• tempo de execução de uma instrução: 8 ns e 9 ns (com ou sem pipe)• throughput: 1 instr / 8ns (sem pipeline) ou 1 instr / 2 ns (com)• # de estágios = 5; ganho de 4:1; para obter 5:1 ??
Instructionfetch Reg ALU Data
access Reg
8 ns Instructionfetch Reg ALU Data
access Reg
8 ns Instructionfetch
8 ns
Time
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2 4 6 8 10 12 14 16 18
2 4 6 8 10 12 14
. . .
Programexecutionorder(in instructions)
Instructionfetch Reg ALU Data
access Reg
Time
lw $1, 100($0)
lw $2, 200($0)
lw $3, 300($0)
2 ns Instructionfetch Reg ALU Data
access Reg
2 ns Instructionfetch Reg ALU Data
access Reg
2 ns 2 ns 2 ns 2 ns 2 ns
Programexecutionorder(in instructions)
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-4
Pipelining
• What makes it easy– all instructions are the same length– just a few instruction formats– memory operands appear only in loads and stores
• What makes it hard?– structural hazards: suppose we had only one memory– control hazards: need to worry about branch instructions– data hazards: an instruction depends on a previous instruction
• We’ll build a simple pipeline and look at these issues
• We’ll talk about modern processors and what really makes it hard:– exception handling– trying to improve performance with out-of-order execution, etc.
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-5
Basic Idea
• What do we need to add to actually split the datapath into stages?
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Instruction
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
WriteregisterWritedata
ReaddataAddress
Datamemory
1
ALUresultM
ux
ALUZero
IF: Instruction fetch ID: Instruction decode /register file read
EX: Execute/address calculation
MEM: Mem ory access WB: Write back
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-6
IM Reg DM RegALU
IM Reg DM RegALU
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6 CC 7
Time (in clock cycles)
lw $2, 200($0)
lw $3, 300($0)
Programexecutionorder(in instructions)
lw $1, 100($0) IM Reg DM RegALU
Uma representação para o pipeline
• Hachurado representa “atividade”
• Representação alternativa: imaginando que cada instrução tenha a sua própria via de dados
• Outra representação: foto no tempo
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-7
Pipelined Datapath
• O que é produzido em cada estágio é armazenado em um registrador de pipeline para uso pelo próximo estágio
• Problemas com o endereço do registrador de escrita?? Caminhando para esquerda?
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Instru
ction
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1Readregister 2
16 Signextend
WriteregisterWritedata
Readdata
1
ALUresultM
ux
ALUZero
ID/EX
Datamemory
Address
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-8
Corrected Datapath
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Instru
ction
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0
Address
Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1Readregister 2
16 Signextend
WriteregisterWritedata
Readdata
Datamemory
1
ALUresultM
ux
ALUZero
ID/EX
Exemplo com sub e lw (1)
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Instruction decode
lw $10, 20($1)
Instruction fetch
sub $11, $2, $3
Instructionmemory
Address
4
32
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Instruction fetch
lw $10, 20($1)
Address
Datamemory
Address
Datamemory
Clock 1
Clock 2
Instructionmemory
Address
4
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
3216Sign
extend
Writeregister
Writedata
Memory
lw $10, 20($1)
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Execution
sub $11, $2, $3
Instructionmemory
Address
4
0
Add Addresult
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEM MEM/WB
Mux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
Writeregister
Writedata
Readdata
1
ALUresult
Mux
ALUZero
ID/EX
Execution
lw $10, 20($1)
Instruction decode
sub $11, $2, $3
3216Sign
extend
Address
Datamemory
Datamemory
Address
Clock 3
Clock 4
Exemplo com sub e lw (2)
Exemplo com sub e lw (3) Instruction
memory
Address
4
32
0
Add Addresult
1
ALUresult
Zero
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEMID/EX MEM/WB
Write backMux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Mux
ALUReaddata
Writeregister
Writedata
lw $10, 20($1)
Instructionmemory
Address
4
32
0
Add Addresult
1
ALUresult
Zero
Shiftleft 2
Inst
ruct
ion
IF/ID EX/MEMID/EX MEM/WB
Write backMux
0
1
Add
PC
0Writedata
Mux
1Registers
Readdata 1
Readdata 2
Readregister 1
Readregister 2
16Sign
extend
Mux
ALUReaddata
Writeregister
Writedata
sub $11, $2, $3
Memory
sub $11, $2, $3
Address
Datamemory
Address
Datamemory
Clock 6
Clock 5
1998 Morgan Kaufmann PublishersMario Côrtes - MO401 - IC/Unicamp- 2002s1 Ch6-12
Graphically Representing Pipelines
• Can help with answering questions like:– how many cycles does it take to execute this code?– what is the ALU doing during cycle 4?– use this representation to help understand datapaths
IM Reg DM Reg
IM Reg DM Reg
CC 1 CC 2 CC 3 CC 4 CC 5 CC 6
Time (in clock cycles)
lw $10, 20($1)
Programexecutionorder(in instructions)
sub $11, $2, $3
ALU
ALU
Recommended