DSP_30_08_2013

7/29/2019 DSP_30_08_2013

1/14

BITS PilaniPilani | Dubai | Goa | Hyderabad

Date : 30/08/2013

Digital Signal Processing

7/29/2019 DSP_30_08_2013

2/14


Previous class:

Computation required in DSPEvolution of DSP architecture

7/29/2019 DSP_30_08_2013

3/14


Today class

Evolution of DSP architecture

Numeric Representation used in DSP

Fixed point

Floating point

7/29/2019 DSP_30_08_2013

4/14

Analysis of computation required for FIR filter

Expression for 8-tap FIR filter.

Y[n] = a0 X[n]+ a1 X[n-1]+ a2X[n-2]+ -- - - - +a7X[n-7]

Most recurring computation ismultiplication and thenaccumulation (MAC)

7/29/2019 DSP_30_08_2013

5/14

DSP~GPP

Real time throughput

requirement Used in embedded

application.

To support DSPcomputation like FFT,

convolution, special

features are provided. Have MAC unit

Not real time

throughput needed Desktop computing

No special features.

7/29/2019 DSP_30_08_2013

6/14

What is the best suitable architecture for DSP?

Architectural evolution:

Called as Von Neumann architecture.Designed by: John Von Neumann, an American mathematician.Single memory shared by both the program instructions and data.Most computers today are of the Von Neumann design.

Von NeumannVon NeumannVon NeumannVon Neumann

7/29/2019 DSP_30_08_2013

7/14

How many cycles needed for MAC instruction for two

numbers that reside in external memory?

1. Get the opcode of instruction.

2. Get data1

3. Get data2

4. Multiply and accumulate and store result.

(Assume that CPU computation takes very small time incomparison to memory access)

So need four cycles.

7/29/2019 DSP_30_08_2013

8/14

Harvard architecture

Developed at Harvard University (1940)

Program instructions and data can be fetched at the same time.

Increasing overall processing speed

Most present day DSPs use this dual bus architecture.

Ex: ADSP-21xx and AT&T's DSP16xx.

7/29/2019 DSP_30_08_2013

9/14

1. Instruction 1 fetched.2. Instruction 1 decode and get data1 from DM and coefficient

from PM3. Perform MAC operation and store result in DM as well as

fetch Instruction 2 from PM.4. Instruction 2 decode get data1 from DM and coefficient

from PM5. Perform MAC operation and store result in DM (for inst 2)

as well as fetch Instruction 3 from PM.So single MAC operation need 3 cycles

Cycles needed for MAC instruction in Harvard

architecture

7/29/2019 DSP_30_08_2013

10/14

Three memory banksAllow three independent memory accesses per instruction cycle.Processors based on a three-bank modified Harvard architectureinclude the Zilog Z893xx, Motorola DSP5600x, DSP563xx

Modified Harvard architecture

7/29/2019 DSP_30_08_2013

11/14

Multiple-Access Memories

Using fast memories that

support multiple, sequential

accesses per instruction cycle

over a single set of busesOR

Using multi-ported memories

that allow multiple concurrent

memory accesses over two or

more independent sets of buses.

This arrangement provides one program memory access

and two data memory accesses per instruction word.

Ex: Motorola DSP561xx processors.

7/29/2019 DSP_30_08_2013

12/14

Super Harvard Architecture (SHARCH DSP)

Part of program memory is used as data

memory.

Including an instruction cache in the CPU.The first time through a loop, slower operation

Next executions of the loop will be faster

This means that all of the memory to CPU

information transfers can be accomplished in asingle cycle.

EX: ADSP-2106x and new ADSP-211xx

7/29/2019 DSP_30_08_2013

13/14

Enhanced DSP architectures:Enhanced DSP architectures:Enhanced DSP architectures:Enhanced DSP architectures:

Very Long Instruction Word (VLIW) architecture:

VLIW CPUs have four to eight

execution units.

One VLIW instruction encodes

multiple operations.EX:if a VLIW device has four

execution units, then a VLIW

instruction for that device would

have four operation fields.

VLIW instructions are usually at least 64 bits in width.

VLIW CPUs use software (the compiler) to decide whichoperations can run in parallel.

Hardware's complexity for instruction scheduling is reduced.

EX: TMS320 C6xx

7/29/2019 DSP_30_08_2013

14/14

Endians:

Big Endian(MSB in first location)Little endianHow 12345678 will be stored in four

location starting from 4000 in eachcase?TI DSP: Little endian

Motorola DSP: Big endian

Documents

DSP_30_08_2013