DSP_30_08_2013

Embed Size (px)

Citation preview

  • 7/29/2019 DSP_30_08_2013

    1/14

    BITS PilaniPilani | Dubai | Goa | Hyderabad

    Date : 30/08/2013

    Digital Signal Processing

  • 7/29/2019 DSP_30_08_2013

    2/14

    BITS PilaniPilani | Dubai | Goa | Hyderabad

    Previous class:

    Computation required in DSPEvolution of DSP architecture

  • 7/29/2019 DSP_30_08_2013

    3/14

    BITS PilaniPilani | Dubai | Goa | Hyderabad

    Today class

    Evolution of DSP architecture

    Numeric Representation used in DSP

    Fixed point

    Floating point

  • 7/29/2019 DSP_30_08_2013

    4/14

    Analysis of computation required for FIR filter

    Expression for 8-tap FIR filter.

    Y[n] = a0 X[n]+ a1 X[n-1]+ a2X[n-2]+ -- - - - +a7X[n-7]

    Most recurring computation ismultiplication and thenaccumulation (MAC)

  • 7/29/2019 DSP_30_08_2013

    5/14

    DSP~GPP

    Real time throughput

    requirement Used in embedded

    application.

    To support DSPcomputation like FFT,

    convolution, special

    features are provided. Have MAC unit

    Not real time

    throughput needed Desktop computing

    No special features.

  • 7/29/2019 DSP_30_08_2013

    6/14

    What is the best suitable architecture for DSP?

    Architectural evolution:

    Called as Von Neumann architecture.Designed by: John Von Neumann, an American mathematician.Single memory shared by both the program instructions and data.Most computers today are of the Von Neumann design.

    Von NeumannVon NeumannVon NeumannVon Neumann

  • 7/29/2019 DSP_30_08_2013

    7/14

    How many cycles needed for MAC instruction for two

    numbers that reside in external memory?

    1. Get the opcode of instruction.

    2. Get data1

    3. Get data2

    4. Multiply and accumulate and store result.

    (Assume that CPU computation takes very small time incomparison to memory access)

    So need four cycles.

  • 7/29/2019 DSP_30_08_2013

    8/14

    Harvard architecture

    Developed at Harvard University (1940)

    Program instructions and data can be fetched at the same time.

    Increasing overall processing speed

    Most present day DSPs use this dual bus architecture.

    Ex: ADSP-21xx and AT&T's DSP16xx.

  • 7/29/2019 DSP_30_08_2013

    9/14

    1. Instruction 1 fetched.2. Instruction 1 decode and get data1 from DM and coefficient

    from PM3. Perform MAC operation and store result in DM as well as

    fetch Instruction 2 from PM.4. Instruction 2 decode get data1 from DM and coefficient

    from PM5. Perform MAC operation and store result in DM (for inst 2)

    as well as fetch Instruction 3 from PM.So single MAC operation need 3 cycles

    Cycles needed for MAC instruction in Harvard

    architecture

  • 7/29/2019 DSP_30_08_2013

    10/14

    Three memory banksAllow three independent memory accesses per instruction cycle.Processors based on a three-bank modified Harvard architectureinclude the Zilog Z893xx, Motorola DSP5600x, DSP563xx

    Modified Harvard architecture

  • 7/29/2019 DSP_30_08_2013

    11/14

    Multiple-Access Memories

    Using fast memories that

    support multiple, sequential

    accesses per instruction cycle

    over a single set of busesOR

    Using multi-ported memories

    that allow multiple concurrent

    memory accesses over two or

    more independent sets of buses.

    This arrangement provides one program memory access

    and two data memory accesses per instruction word.

    Ex: Motorola DSP561xx processors.

  • 7/29/2019 DSP_30_08_2013

    12/14

    Super Harvard Architecture (SHARCH DSP)

    Part of program memory is used as data

    memory.

    Including an instruction cache in the CPU.The first time through a loop, slower operation

    Next executions of the loop will be faster

    This means that all of the memory to CPU

    information transfers can be accomplished in asingle cycle.

    EX: ADSP-2106x and new ADSP-211xx

  • 7/29/2019 DSP_30_08_2013

    13/14

    Enhanced DSP architectures:Enhanced DSP architectures:Enhanced DSP architectures:Enhanced DSP architectures:

    Very Long Instruction Word (VLIW) architecture:

    VLIW CPUs have four to eight

    execution units.

    One VLIW instruction encodes

    multiple operations.EX:if a VLIW device has four

    execution units, then a VLIW

    instruction for that device would

    have four operation fields.

    VLIW instructions are usually at least 64 bits in width.

    VLIW CPUs use software (the compiler) to decide whichoperations can run in parallel.

    Hardware's complexity for instruction scheduling is reduced.

    EX: TMS320 C6xx

  • 7/29/2019 DSP_30_08_2013

    14/14

    Endians:

    Big Endian(MSB in first location)Little endianHow 12345678 will be stored in four

    location starting from 4000 in eachcase?TI DSP: Little endian

    Motorola DSP: Big endian