Upload
neha-pachauri
View
234
Download
0
Embed Size (px)
Citation preview
7/28/2019 Lecture ASIP 5
1/59
VLSI Architecture :: MEL G642
MEL G642
Dr. A. Amalin Prince
BITS Pilani K.K. Birla Goa Campus
Department of Electrical , Electronics and Instrumentation Engineering
7/28/2019 Lecture ASIP 5
2/59
System on a chip
What is a DSP core?
What is a DSP processor?
What is a DSP subsystem?
MCU is the task controller-executes tasks without real-time
requirements
MEL G642
MCU subsystemDSP subsystem
DSP Processor core
RF Control path
DM PM Interrupt TimerDMA MMU
CustomerIF
ALU MAC
Bus with arbitration
Main memories
MCU core
AGU
SoCdesignhierarchy
CustomerIF
7/28/2019 Lecture ASIP 5
3/59
G642
rchite
cture
Bus and arbiration
MMU
Main memories
RF Control pathADG
DM DM PMDMA
ubsys
tem
MEL G642
ME
L
Processor
DSP Processor
DSP core
Interrupt TimerOther pheriph
Chip inferface
ALUMAC accelerator DSP
7/28/2019 Lecture ASIP 5
4/59
Architecture and microarchitecture
The processor architecture is the hardware organization
of the core and its peripherals including the memory bus
architecture. Architecture represents relations of modules The microarchitecture design is the specification of
functional modules
MEL G642
ASIP microarchitecture design is the implementation ofan ISA specification into hardware modules.
7/28/2019 Lecture ASIP 5
5/59
Inside a core
The core can be divided into three parts:
the datapath, the control path, and the address generation unit
(AGU). The core components are organized around two data
busses:
MEL G642
The memory bus is distributed between the core and thememory subsystem.
The register bus connects the register file to all units in the core.
7/28/2019 Lecture ASIP 5
6/59
Memory subsystem in a DSP subsystem
The memory subsystem consists of
data memories (DM),
program (code) memory (PM), AGU, DMA, and MMU.
MEL G642
7/28/2019 Lecture ASIP 5
7/59
Peripherals in a DSP subsystem
Timers for counting clock cycles and events
Interrupt controller for handling interrupts
DMA (Direct Memory Access) controller for handlingdata transfers to/from main memory and between other
memories/ports
MEL G642
MMU (Memory Management Unit) for reliable andefficient (address space) memory usage
7/28/2019 Lecture ASIP 5
8/59
DSP memory architecture
MEL G642
7/28/2019 Lecture ASIP 5
9/59
History of DSP memory architectures
Memory
Control Arithmetic
Programmemory Datamemory
MEL G642
un un t
In-out
(a) Von Neumann architecture
Controlunit Arithmeticunit
In-out
(b) Harvard architecture
7/28/2019 Lecture ASIP 5
10/59
History of DSP memory architectures
DP DM
CP PM
DP DM
CP PM
MUX
DP DM
CP PM
MUX
MEL G642
(a) (b) (c)
One tap of
convolution requires
multiple clock
Fetch coefficients
instead of
instructions during
CONV
Dual port/multi-port
memory required.
Used up to 1980s
7/28/2019 Lecture ASIP 5
11/59
7/28/2019 Lecture ASIP 5
12/59
A typical DSP bus architecture
Register _File ALU MAC
OPA
OPB
ressingpath
(AGU)
Register
bus
Datapath
MEL G642
-a ress
D1-data
D2-address
D2-data
P-address
Program
PM DM1 DM2ControlPa
th(CP)andadd
Memorybu
s
PMbus
7/28/2019 Lecture ASIP 5
13/59
Control flow of DSP ASIP
Calculate PC Request an instruction Receive an instruction
Send PC
to PM
Get code
from PMreset
MEL G642
Receive states from DP Generate operandaddresses
Decode the instructionand send control to DP
Control
signals to DPGenerated addressto storage units
Flags
from DP
Instruction Flow FSM
7/28/2019 Lecture ASIP 5
14/59
Data flow of DSP ASIP
Receiveinstruction
Receive operand
address
Fetch
operands
From PMFrom address
generatorSend address
to storage HW
MEL G642
Return
statesStore
result
Execute
instruction
Flags toPC FSM
Send result to
storage HW
Data Flow FSM
7/28/2019 Lecture ASIP 5
15/59
G642
faDSP
processor
RF
UPCFSM
PM
Program
address
gura
tion
status
xecun
itALU
/MAC
Instruction
Program flow control
Results
struc
tion
dec
oder
Operand
&
result
co
ntrol
MEL G642
MEL
Acompleteview
o
A
Con
fi
an
d
D
Operation ctrl
MEM ctrl
Legend
Data bus Control signals
Memory busInternal signals
in control path
I
7/28/2019 Lecture ASIP 5
16/59
Modules in a core
MEL G642
7/28/2019 Lecture ASIP 5
17/59
Modules in a DSP core
Datapath
Register file
ALU MAC
AGU
MEL G642
Control path
7/28/2019 Lecture ASIP 5
18/59
Differences between design of DSP and MPU
The MPU designers think ofultimate performance and
ultimate flexibility as well as the compiler-friendly
instruction set. The ASIP DSP designers think ofapplication and cost
first, and the challenge is to be efficient.
MEL G642
The goal of an ASIP design is to reach the highestperformance over silicon, the highest performance over
power consumption, the highest performance over the
design cost.
7/28/2019 Lecture ASIP 5
19/59
Is DSP CISC or RISC
a DSP, like a RISC:
More general-purpose registers.
Most instructions as simple instructions. Instruction decoding by decoding logic circuit instead of
microcode.
MEL G642
egu ar ns ruc on p pe n ng.
a DSP, like a CISC:
One execution cycle for ALU and multiple cycles for iteration. Complicated data memory addressing modes and circuits.
Special-purpose registers (accumulator registers).
Strong instructions for accelerating certain tasks.
7/28/2019 Lecture ASIP 5
20/59
Is DSP CISC or RISC
DSP RISC CISC
Emphasis on hardware
and software
Emphasis on software Emphasis on hardware
Single and multiclockcomplex instructions Single-clock, reduced instructiononly Includes multiclockcomplex instructions
Operands from registers
Operands also from
Operands only from registers
LOAD and STORE are used to
Arithmetic computing based
on memory-to-memory
MEL G642
data memories
- -
register variables
Small code size Large code size Small code size
Most silicon area used for
program and data storing
Most silicon area used for
program and data storing
Silicon might be used for
storing complex instructions
(microcode)
7/28/2019 Lecture ASIP 5
21/59
Design instruction set
MEL G642
7/28/2019 Lecture ASIP 5
22/59
G642
tdesi
gnflow
S ource code profiling: c overage and 10-90% lo cality
D esign o f ge neral R ISC instructio ns
D esign of C ISC accele rate d ins tructions
De sig n of mi sc ellaneous ins tructions
MEL G642
MEL
Instru
ctions
Instruc tion s et simu la tor and a ssem bler
Benc hmarking performa nce a nd covera ge
Release the ins truc tion set archi tec ture
N o
ye s
ns t ruc t on c o ng an r e ea se manua
satisfied
7/28/2019 Lecture ASIP 5
23/59
Release an instruction set
Design of
assembly
instruction set
Instruction set
benchmarkingApplication
profiling
MEL G642
When
Benchmarking result equivalent to requirements
7/28/2019 Lecture ASIP 5
24/59
We need to identify problems
How is an instruction set designed and why is it designed
in that way?
In which circumstances should a function beimplemented using an instruction instead of a subroutine?
Why ASIP DSP instructions not really RISC
MEL G642
Why my benchmarking is not satisfactory?
7/28/2019 Lecture ASIP 5
25/59
What is the starting point
Let us start at the point to implement C functions to an
assembly instruction set
A typical architecture with two DM in parallel Instructions including move-load-store, ALU/MAC, and
program flow control
MEL G642
7/28/2019 Lecture ASIP 5
26/59
Classify the Instruction set
Instruction
group /type
Operands Operations Mathematical
description
Flags CC
Load, store,
and move
Register name
and memoryaddressing
Data transfer
and addressingmodes
DST (ADR)
7/28/2019 Lecture ASIP 5
27/59
Move-load-store instructions
RISC processor architecture simple.
Data and parameters of a subroutine are loaded to the
register file first. Operands are from register file or immediate data carried
by an instruction.
MEL G642
Results in the register file need to be moved back to thedata memory
7/28/2019 Lecture ASIP 5
28/59
Move-load-store instructions
Mnem Operand Description Operation CC
Load Rd, DA Load data from memory
0/1
RdDM(DA) 1
Store DA, Rs Store data to memory
0/1
DM(DA) Rs 1
MEL G642
move Rd, Rs Move between two
registers
Rd Rs 1
move Rd, K Move immediate data to
a register
Rd immediate 1
7/28/2019 Lecture ASIP 5
29/59
Addressing for data memory access
Memory addressing is addressing algorithm carried by anassembly instruction.
It specifies the way to calculate the memory the uniquelocation of data in a data memory for a read or a write.
MEL G642
Implicitly addressing algorithm in C; explicitly algorithmin ASM
7/28/2019 Lecture ASIP 5
30/59
Addressing for data memory access
Name DA DA code
cost (b)
Memory Algorithm CC
Direct D 16 DM0/1 16-bit constant as the direct
memory address
1
Register
indirect
R 5 DM0/1 A register containing the memory
address
1
=
MEL G642
incremental
,
addressingRegister
decrement
--R 5 DM0/1 R=R1 before addressing, R gives
address
1
7/28/2019 Lecture ASIP 5
31/59
Arithmetic logic instructions
Basic arithmetic operations in C are +, , , /, and %.
The modulo operation % is not used very often for DSP
arithmetic computing, to implement it using a subroutine. Division operation / is not easy to implement in
hardware
MEL G642
7/28/2019 Lecture ASIP 5
32/59
Basic Arithmetic Instructions
Mnem Operand Description Operation Flags CC
ADD Rd, Rr Add Rd Ra + Rb Z,N,V 1
SUB Rd, Rr Subtract Rd Ra - Rb Z,N,V 1
ABS Rd, Rr Absolute operation RdABS(Ra) Z,N,V 1
INC Rd Increment Rd Ra + 1 Z,N,V 1
DEC Rd Decrement Rd Ra - 1 Z,N,V 1
MEL G642
MPL A, Rd, Rr Multiplication A
Ra Rb Z,N,V 1MAC A, Rd, Rr Multiplication and
accumulation
AA + Ra Rb Z,N,V 2
RND Rd, A Round, saturate,
and truncate
Rd Saturate(Round(A)) Z,N,V 1
CAC A Clear an
accumulator
A 0 Z,N,V 1
7/28/2019 Lecture ASIP 5
33/59
Logic and Shift Operations
Logic and shift operations in C
&(and), |(or), ~(not), ^(xor),
> (right shift).
Here "and" operates on each bit of operand A and B; that
is, C[0]=A[0] & B[0], C[1]=A[1] & B[1],
MEL G642
C[15]=A[15] & B[15].
L i d Shif O i
7/28/2019 Lecture ASIP 5
34/59
Logic and Shift Operations
Mnem Operand Description Operation Flags CC
AND Ra, Rb A logic-and B Rd Ra and Rb C, Z 1
OR Ra, Rb A logic-or B Rd Ra or Rb C, Z 1
NOT Ra, Rb Invert A Rd INV (Ra) C, Z 1
XOR Ra, Rb A logic-xor B Rd Ra xor Rb C, Z 1
MEL G642
LS Ra, Rb Logic left shift Rd Ra left shifted byRb [3:0]
C, Z 1
RS Ra, Rb Logic right shift Rd Ra right shifted by
Rb [3:0]
C, Z 1
L i O i C
7/28/2019 Lecture ASIP 5
35/59
Logic Operators in C
Condition symbol Conditions
< Less than
= Greater than or equal to
> Greater than
MEL G642
!= Not equal to
&& Boolean AND
|| Boolean OR
! Boolean NOT
P fl t l i C
7/28/2019 Lecture ASIP 5
36/59
Program flow control in C
Conditional and unconditional controls in C. Unconditional GOTO operations.
Conditional: Condition test and jump in C are integrated, for
example, if A then B else C.
In an assembl lan ua e
MEL G642
Condition test and condition jump are separated the first instruction offers and flag computation
the second instruction is the conditional jump
P fl t l i t ti
7/28/2019 Lecture ASIP 5
37/59
Program flow control instructions
Mnem Description Condit
ions
Flags
meet
CC
JLT Jump when Less than < N=1 3/1
JLE Jump when Less than or Equal to N=0 and
Z=0
3/1
JNE Jump when Not Equal to != Z=0 3/1
JUMP Unconditional jump 3
CALL Jump, push return address into stack 3
Return Return to the stacked address 3
Target addressing for jumping
7/28/2019 Lecture ASIP 5
38/59
Target addressing for jumping
TA Algorithm
Absolute 16 bits constant
Relative In a general register
MEL G642
y
7/28/2019 Lecture ASIP 5
39/59
G642
ionSe
tSumm
ary
MEL G642
M
EL
A
ssembl
yInstru
c
7/28/2019 Lecture ASIP 5
40/59
Benchmarking theinstruction set
MEL G642
What is benchmark
7/28/2019 Lecture ASIP 5
41/59
What is benchmark
DSP benchmarking gets cycle cost and code size used by
a DSP algorithm with single-precision data.
Convention of DSP benchmarking round is required before moving long data from an accumulation
register to a general register
MEL G642
7/28/2019 Lecture ASIP 5
42/59
How to benchmark
7/28/2019 Lecture ASIP 5
43/59
How to benchmark
BDTI benchmarking convention
It measures the execution time (cycle cost), the code size
(program memory cost), and the cost of data memories.
The cycle cost = prologue + Kernel + epilogue
MEL G642
Prologue: preparing for running a program,
Epilogue: terminating the program
Kernel: the part of the algorithm
Assumption in this discussion
7/28/2019 Lecture ASIP 5
44/59
Assumption in this discussion
Data frame size: 40 samples.
The number of FIR taps = 16.
The cycle cost = 1 cycle per normal instruction 3 cyclesfor jump taken.
MAC takes one c cle if the followin instruction does
MEL G642
not use the data in an accumulator register. TSMD: a typical single MAC DSP (TSMD)
processor available as a COTS (commercial off-the shelf).
Example: Block Transfer
7/28/2019 Lecture ASIP 5
45/59
Example: Block Transfer
C-code: DM1 (SEG: 0 to 39) -> DM1 (SEG: 0 to 39)
Assembly code
MEL G642
Example: Block Transfer
7/28/2019 Lecture ASIP 5
46/59
Example: Block Transfer
Processor Algorithm Total cycle
cost
Pro-epilogue
cycle cost
Kernel
cycle cost
Total code
cost
Code for pro-
epilogue
DM
cost
Basic (ours) BT 242 4238
8 4 84
TSMD 47 4
437 4 84
MEL G642
The loop: The extra cost of each jump taken and DEC of theloop counter consumes four clock cycles. HW loop may
eliminate the cost.
Load and store can be merged to a memory move to memoryinstruction.
Example: Single sample FIR
7/28/2019 Lecture ASIP 5
47/59
Example: Single sample FIR
Modulo addressing
FIFO Emulated in a data memory
Can be hardware accelerated memory addressing (for accelerated
instructions)
MEL G642
Example: consider 7-tap FIR Filter
7/28/2019 Lecture ASIP 5
48/59
Example: consider 7 tap FIR Filter
MEL G642
7/28/2019 Lecture ASIP 5
49/59
Example: Single sample FIR
7/28/2019 Lecture ASIP 5
50/59
p g p
Assembly code
MEL G642
Example: Single sample FIR: FIFO behavior
7/28/2019 Lecture ASIP 5
51/59
p g p
DM X (n-3)X (n-4)
X (n)
DARX (n-4)
X (n)
X (n-1)
MIN address
DAR
BAR BARStep 0 Step 1
MEL G642
Thed
atamemoryspace
TAR
BAR
TheFIFO
buffer
BAR + 0
BAR + 1
BAR + 2
BAR + 3
BAR + 4
X (n-1)
X (n-2)
X (n-2)
X (n-3)
Example: The procedure a FIFO getting a new data sample
before getting
new data
MAX address
after getting
new data 1
X (n)
X (n-1)X (n-2)
X (n-3)
X (n-4) DAR
X (n-1)
X (n-2)X (n-3)
X (n-4)
X (n)
after getting
new data 2
DAR
after getting
new data 3
TAR TAR
TAR
BAR
TAR
BAR
Step 2 Step 3
Example: N sample FIR (Single Sample inloop)
7/28/2019 Lecture ASIP 5
52/59
loop)
MEL G642
Example: Single sample FIR
7/28/2019 Lecture ASIP 5
53/59
p g p
Processor Algorithm Total cycle cost Kernel cycle
cost
Total code
cost
Basic 16-tapFIR 192 173 26
TSMD 16-tapFIR 31 16 15
MEL G642
- . .
times higher than the benchmark of a TSMD. Opportunities for improvement are:
The cost ofSW emulated circular buffer and modulo addressing is high.
o HW circular buffer and modulo addressing is essential.
Data and coefficient loading, MAC, and the loop control can be merged into
one instruction, convolution, which is one of the most frequently used
instructions in DSP.
CONV N DM0(AP0++M) DM1(AP1++)
Example:
7/28/2019 Lecture ASIP 5
54/59
p
FIR Filtering
Auto correlation Autocorrelation is used for finding regularities or periodical
features of a signal
MEL G642
Cross-correlation
Cross-correlation is used for measuring the similarity of a signal
with a known signal pattern
What difference??
7/28/2019 Lecture ASIP 5
55/59
Analyses on identifiedproblems
MEL G642
Lessons Learned
7/28/2019 Lecture ASIP 5
56/59
C does not give parallel features;
The convolution is one of the most used DSP operations, very high
efficiency by having the memory addressing, arithmetic
computing, result store, and program flow control carried out inparallel in one instruction.
It is ossible because the arallel hardware can be or anized in a
MEL G642
pipeline.
Other most frequently used iterative DSP ops can also be
specified into one instruction.
Research work: Why?
Identify the requirement and benchmark it
Conclusion
7/28/2019 Lecture ASIP 5
57/59
An assembly language instruction set must be more
efficient.
Accelerations implemented at arithmetic and algorithmic
levels.
Addressing and memory accesses should be executed in
MEL G642
parallel with arithmetic computing. Program flow control such as loop or conditional execution
shall also be accelerated
ASIP microarchitecture design flow
7/28/2019 Lecture ASIP 5
58/59
Proposed assembly language manual
pe
line
Further expose all micro operations of each assembly instruction
Partiton micro operations into DP, CP, and AP
MEL G642
Propose
dp
steps
Schedule micro operations into each pipeline step
Design for HW multiplexing in DP and AP
Specify microarchitecture and micro operations for CP
Release micro architecture documents
The End :: Thank you for your attention
7/28/2019 Lecture ASIP 5
59/59
Questions?
MEL G642