Digital Signal Processors (DSPs). DSP Advanced signal processor circuits MAC (Multiply and Accumulate) unit (s) - provides fast multiplication of two

Digital Signal Processors (DSPs)

DSP• Advanced signal processor circuits• MAC (Multiply and Accumulate) unit (s) - provides

fast multiplication of two operands and accumulating results at a single address.

• MAC unit computes fast an expression such as the following summation. yn = Σ (ai × xn−i) where the sum is made for i = 0, 1, 2, …, N-1. Here i, n and N are the integers, ai is a coefficient, xj is independent variable or an input element and yk is the dependent variable or an output element.

DSP• DSP processors invariably have Harward

architecture.• Caches are organized in Harward architecture

separate I-cache and D-Cache• Basic Units: Memory, Internal Bus, Data bus,

Address bus, control bus, Bus Interface Unit, Instruction fetch register, Instruction decoder, Control unit, Instruction Cache, Data Cache, multistage pipeline processing, multi-line superscalar processing for obtaining processing speed higher than one instruction per clock cycle, Program counter

DSP special structural units• Packed Data processing• Parallel Execution MAC units• Special Instructions• Instruction Packing unit

ADVANCED PROCESSOR ARCHITECTURESAND MEMORY ORGANISATION

SHARC and Tiger SHARC

SHARC• Processor from Analog Devices.• SHARC stands for Super Harvard Single - Chip

Computer Super Harvard architecture means more than one set of address and data buses for data

• For example, set J of address and data buses for data-memory space, set K of address and data buses for data-memory space, and set I of address and data buses for program memory space

SHARC functions• Program memory configurable as program and

data memory parts (Princeton architecture)• SHARC functions as VLIW (very large instruction

word) processor.• used in large number of DSP applications.• Controlled power dissipation in floating point

ALU.• Different SHARCs can link by serial

communication between them

SHARC features• Instruction word 48-bits• 32-bit data word for integer and floating point

operations • 40-bit extended floating point.• Smaller 16 or 8 bit must also store as full 32-bit data.

Therefore, the big endian or little endian data alignment is not considered during processing with carry also extended for rotating (RRX).

• ON chip memory 1 MB Program memory and data memory (Harvard architecture)

• External OFF chip memory• OFF chip as well as ON-chip Memory can be

configured for 32-bit or 48 bit words.

Parallel Processing features• Supports processing instruction level parallelism

as well as memory access parallelism.• Multiple data accesses in a single Instruction

Addressing Features• SHARC 32-bit address space for accesses • Accesses 16 GB or 20 GB or 24 GB as per the word

size ( 32-bit, 40-bit or 48 bit) configured in the memory for each address.

• When word size = 32-bit then external memory configuration addressable space is 16 GB (232 × 4) bytes

Word size features• Provides for two word sizes 32-bit and 48-bit.• SHARC two full sets of 16 general-purpose

registers, therefore fast context switching• Allows multitasking OS and multithreading.• Registers are called R0 to R15 or F0 to F15

depending upon integer operation configuration or floating point configuration.

• Registers are of 32-bit. A few registers are special, of 48 bits that may also be accesses as pair of 16-bit and 32-bit registers.

TigerSHARC• Highest performance density family of processors

from Analog Devices• Precision high-performance integrated circuits

used in analog and digital signal processing applications

• Designed for multiprocessing applications and for peak performance greater than BFLOPS (billion floating-point operations per second)

Example

• ADSP-TS203SABP-050 processor processes using 250 MHz clock and on chip memory of 6 M bits and operates at 1.2V/3.3 V

• Low voltage design helps in processing with little power dissipation.

• Analog Devices claims the highest performance per Watt.

TigerSHARC• Multiple TigerSHARCs can connect by serial

communication at 1 GBps.• A TigerSHARC version has 24M bits ON-chip

memory.• Two ALUs and twos set of address and data buses

for data memory• On set of address and data buses for program

memory• TigerSHARC is available as IP core also so that new

applications and enhancements can be developed.

ArithmaticFn = Fx + FyFn = Fx-FyFn = ABS(Fx + Fy)Fn = ABS(Fx-Fy)Fn=(Fx + Fy)/2COMP(Fx,Fy)Fn = -FxRn = Rx + Ry + CI

AddSubtractAbsolute value of sumAbsolute value of differenceAverageCompareNegateAdd with Carry

ArithmaticFn = ABSFxFn = PASS FxFn = RND FxFn = SCALE Fx by RyRn = MANX FxRn = LOGB FxRn = FIX Fx,

Fn = FLOAT Rx by Ry

Absolute valueCopyFx to FnRoundScale exponent of Fx by RyExtract mantissa of FxConvert exponent of Fx to integerConvert floating-point to integerConvert integer to floating-point

ArithmaticFn = RECIPS FxFn = RSQRTS Fx

Fn = Fx COPYSIGN FyFn = MIN(Fx.Fy)Fn = MAX(Fx,Fy)Fn = CLIP Fx by Fy

Create seed for reciprocalCreate seed for reciprocal square rootCopy sign of Fy to FxMinimum of Fx, FyMaximum of Fx, FyClip Fx within range [-Fy,Fy]

LogicalRn = LSHIFT Rx by RyRn = Rn OR LSHIFT Rx by RyRn=ASHIFT Rx by RyRn = Rn OR ASHIFT Rx by RyRn = ROT Rx by RyRn = BCLR Rx by RyRn = BSET Rx by RyRn = BTGL Rx by Ry

Logical shift distance RyLogical shift and logical ORArithmetic shiftArithmetic shift and logical ORRotate distance RyClear one bit in RxSet one bit in RxToggle one bit in Rx

LogicalBTST Rx by RyRn = FDEP Rx by RyRn = Rn OR FDEP Rx by RyRn = FDEP Rx by RyRn = Rn OR FDEP Rx by RyRn = FEXT Rx by RyRn = FEXT Rx by RyRn = EXP Rx

Test one bit in RxDeposit field from Rx into RnDeposit field from Rx using ORDeposit and sign extend field from RxDeposit and sign extend using ORExtract field from RxExtract and sign extend field from RxExtract exponent field

Operations

Rn = EXP Rx (EX)Rn = LEFTZ RxRn = LEFTO RxRn = FPACK Fx

Fx = FUNPACK Rn

Extract exponent field from ALUExtract number of leading OsExtract number of leading IsConvert 32-bit floating-point to 16-bit floating-pointConvert 16-bit floating-point to 32-bit floating-point

Load Value• R0 = DM (0x2000000); • R0 = DM(_a);

loads R0 the contents of the variable a• DM(_a) = R0;

stores R0 into memory location

Little Endian and Big Endian in a Memory OrganizationLittle Endian: The LSB Byte (8-bit) of a word of 32 bit is

written into the lowest address byte, and the other bytes are written in increasing order of significance.

Word (32 bit): 0x90ABCDEF Word Address: 0x1000Address0x1000 0x1001 0x1002 0x1003Little EF CD AB 90

Big Endian: The byte order is revered. The MSB Byte (8-bit) of a word of 32 bit is written into the lowest address byte, and the other bytes are written in decreasing order of significance.

Word (32 bit): 0x90ABCDEF Word Address: 0x1000Address0x1000 0x1001 0x1002 0x1003Big 90 AB CD EF

Princeton Architecture• 80x86 processors and ARM7 have Princeton

architecture for main memory. 8051-family microcontrollers have Harvard architecture.). Vectors and pointers, variables, program segments and memory blocks for data and stacks have different addresses in the program in Princeton memory architecture.

Harvard architecture• When the address spaces for the data and for

program are distinct• Handling streams of data that are required to be

accessed in cases of single instruction multiple data type instructions and DSP instructions.

• Separate data buses ensure simultaneous accesses for instructions and data.

• Program segments and memory blocks for data and stacks have separate set of addresses in Harvard architecture.

• Control signals and read-write instructions are also read-write instructions are also separate for accessing the program memory and data memory.

Documents

Digital Signal Processors (DSPs). DSP Advanced signal processor circuits MAC (Multiply and Accumulate) unit (s) - provides fast multiplication of two