Upload
nirneya-gupta
View
217
Download
0
Embed Size (px)
Citation preview
7/29/2019 DSP_30_08_2013
1/14
BITS PilaniPilani | Dubai | Goa | Hyderabad
Date : 30/08/2013
Digital Signal Processing
7/29/2019 DSP_30_08_2013
2/14
BITS PilaniPilani | Dubai | Goa | Hyderabad
Previous class:
Computation required in DSPEvolution of DSP architecture
7/29/2019 DSP_30_08_2013
3/14
BITS PilaniPilani | Dubai | Goa | Hyderabad
Today class
Evolution of DSP architecture
Numeric Representation used in DSP
Fixed point
Floating point
7/29/2019 DSP_30_08_2013
4/14
Analysis of computation required for FIR filter
Expression for 8-tap FIR filter.
Y[n] = a0 X[n]+ a1 X[n-1]+ a2X[n-2]+ -- - - - +a7X[n-7]
Most recurring computation ismultiplication and thenaccumulation (MAC)
7/29/2019 DSP_30_08_2013
5/14
DSP~GPP
Real time throughput
requirement Used in embedded
application.
To support DSPcomputation like FFT,
convolution, special
features are provided. Have MAC unit
Not real time
throughput needed Desktop computing
No special features.
7/29/2019 DSP_30_08_2013
6/14
What is the best suitable architecture for DSP?
Architectural evolution:
Called as Von Neumann architecture.Designed by: John Von Neumann, an American mathematician.Single memory shared by both the program instructions and data.Most computers today are of the Von Neumann design.
Von NeumannVon NeumannVon NeumannVon Neumann
7/29/2019 DSP_30_08_2013
7/14
How many cycles needed for MAC instruction for two
numbers that reside in external memory?
1. Get the opcode of instruction.
2. Get data1
3. Get data2
4. Multiply and accumulate and store result.
(Assume that CPU computation takes very small time incomparison to memory access)
So need four cycles.
7/29/2019 DSP_30_08_2013
8/14
Harvard architecture
Developed at Harvard University (1940)
Program instructions and data can be fetched at the same time.
Increasing overall processing speed
Most present day DSPs use this dual bus architecture.
Ex: ADSP-21xx and AT&T's DSP16xx.
7/29/2019 DSP_30_08_2013
9/14
1. Instruction 1 fetched.2. Instruction 1 decode and get data1 from DM and coefficient
from PM3. Perform MAC operation and store result in DM as well as
fetch Instruction 2 from PM.4. Instruction 2 decode get data1 from DM and coefficient
from PM5. Perform MAC operation and store result in DM (for inst 2)
as well as fetch Instruction 3 from PM.So single MAC operation need 3 cycles
Cycles needed for MAC instruction in Harvard
architecture
7/29/2019 DSP_30_08_2013
10/14
Three memory banksAllow three independent memory accesses per instruction cycle.Processors based on a three-bank modified Harvard architectureinclude the Zilog Z893xx, Motorola DSP5600x, DSP563xx
Modified Harvard architecture
7/29/2019 DSP_30_08_2013
11/14
Multiple-Access Memories
Using fast memories that
support multiple, sequential
accesses per instruction cycle
over a single set of busesOR
Using multi-ported memories
that allow multiple concurrent
memory accesses over two or
more independent sets of buses.
This arrangement provides one program memory access
and two data memory accesses per instruction word.
Ex: Motorola DSP561xx processors.
7/29/2019 DSP_30_08_2013
12/14
Super Harvard Architecture (SHARCH DSP)
Part of program memory is used as data
memory.
Including an instruction cache in the CPU.The first time through a loop, slower operation
Next executions of the loop will be faster
This means that all of the memory to CPU
information transfers can be accomplished in asingle cycle.
EX: ADSP-2106x and new ADSP-211xx
7/29/2019 DSP_30_08_2013
13/14
Enhanced DSP architectures:Enhanced DSP architectures:Enhanced DSP architectures:Enhanced DSP architectures:
Very Long Instruction Word (VLIW) architecture:
VLIW CPUs have four to eight
execution units.
One VLIW instruction encodes
multiple operations.EX:if a VLIW device has four
execution units, then a VLIW
instruction for that device would
have four operation fields.
VLIW instructions are usually at least 64 bits in width.
VLIW CPUs use software (the compiler) to decide whichoperations can run in parallel.
Hardware's complexity for instruction scheduling is reduced.
EX: TMS320 C6xx
7/29/2019 DSP_30_08_2013
14/14
Endians:
Big Endian(MSB in first location)Little endianHow 12345678 will be stored in four
location starting from 4000 in eachcase?TI DSP: Little endian
Motorola DSP: Big endian