Programmable Dsp Lecture4

Embed Size (px)

Citation preview

  • 8/11/2019 Programmable Dsp Lecture4

    1/24

    1P.D. Sawaant

    Contents:

    Memory space of TMS320C67XX

    Program Control..

    Interrupts of TMS320C67XX processors.

    Pipeline Operation of TMS320C67XX Processors.

    On-Chip peripherals.

  • 8/11/2019 Programmable Dsp Lecture4

    2/24

    P.D. Sawaant 2

    Memory Map of TMS320C67xx Processor.

    The processor uses a two-level cache-based architecture.

    The Level 1 program cache (L1P) is a 4K-byte direct-mapped cache and the Level 1 data cache (L1D) is a 4K-

    byte 2-way set-associative cache.

    The Level 2 memory/cache (L2) consists of a 256K-byte

    memory space that is shared between program and data

    space.

    64K bytes of the 256K bytes in L2 memory can be

    configured as mapped memory, cache, or combinations of

    the two.

    The remaining 192K bytes in L2 serve as mapped SRAM.

  • 8/11/2019 Programmable Dsp Lecture4

    3/24

    P.D. Sawaant 3

    Memory Map of TMS320C67xx Processor.

  • 8/11/2019 Programmable Dsp Lecture4

    4/24

    P.D. Sawaant 4

    Memory Map of TMS320C67xx Processor.

    L2 Memory

    Configuration

  • 8/11/2019 Programmable Dsp Lecture4

    5/24

    P.D. Sawaant 5

    On Chip Memory and Peripherals for TMS320C67xx Processor.

    Processors

    (TMS320C67xx)

    Data Memory Program

    Memory

    Peripherals and external memory

    interface.

    TMS320C6701 16K X 32 16K X 32 A 4-channel DMA, A16-bit HPI, 2-

    BSPs, 2-Timers and a 32-bit EMIF

    TMS320C6711 32K bits

    L1 cache

    32K bits

    L1 cache

    A 16-channel enhanced DMA, a 16-bit

    HPI , 2-BSPs, 2-Timers, and a 32-bit

    external memory interface.

    512K bits unified L2 cache

    TMS320C6712 32K bits

    L1 cache

    32K bits

    L1 cache

    A 16-channel enhanced DMA, a 16-bit

    HPI , 2-Serial ports, 2-Timers, and a 16-

    bit EMIF.512K bits unified L2 cache

    TMS320C6713 4K bytes

    L1 cache

    4K bytes

    L1 cache

    A 16-channel enhanced DMA, a 16-bit

    HPI , 2-McBSPs, 2-Timers, and a 32-bit

    EMIF.64K bytes L2 cache and

    192K bytes L2 SRAM

  • 8/11/2019 Programmable Dsp Lecture4

    6/24

  • 8/11/2019 Programmable Dsp Lecture4

    7/24

    P.D. Sawaant 7

    Interrupts of TMS320C67xx Processors:

    Many times, when CPU is in the midst of executing a program, a peripheral device

    may require a service from the CPU. In such a situation, the main program may be

    interrupted by a signal generated by the peripheral devices. This results in the processorsuspending the main program in order to execute another program, called interrupt

    service routine, to service the peripheral device. On completion of the interrupt service

    routine, the processor returns to the main program to continue from where it left.

    Interrupt may be generated either by an internal or an external device. It may also be

    generated by software.

    Not all interrupts are serviced when they occur. Only those interrupts that are called

    nonmaskableare serviced whenever they occur.

    Other interrupts, which are called maskableinterrupts, are serviced only if they are

    enabled.

    There is also a priority to determine which interrupt gets serviced first if more thanone interrupts occur simultaneously.

    Almost all the devices of TMS320C67xx family have 32 interrupts. However, the

    types and the number under each type vary from device to device.

    Some of these interrupts are reserved for use by the CPU.

  • 8/11/2019 Programmable Dsp Lecture4

    8/24

    P.D. Sawaant 8

    Pipeline Operation of TMS320C67XX Processors

    The CPU of 67xx devices have a 16-level-deep instruction pipeline.

    FETCH Phase: Includes

    1) PG (Program Address Generation Phase) : Computes the nextsequential fetch-packet address or branch address.

    2) PS(Program-Address-Send phase) : sends the program address to

    memory.

    3) PW(Program-Address-Ready-Wait phase) : Waits until either amemory access is completed.

    4) PR(Program-Fetch-Packet-Receive Phase) : Receives the fetch

    packet from memory.

    DECODE Phase: Include

    5) DP(Instruction-Dispatch phase): Separates fetch packets into execute

    packets.

    6) DC(Instruction-Decode phase): Decode source register, destination

    register and associated paths.

  • 8/11/2019 Programmable Dsp Lecture4

    9/24

    P.D. Sawaant 9

    Pipeline Operation of TMS320C67XX Processors

    EXECUTE Phase:

    It is divided in to phases 7-11(E1-E5).

    Different instruction require different number of phases.

    CPU executes each instruction within 8-Functional units.

    Most instruction require only one execution phase E1 & no delay.

    Multiply Instruction like MPY and SMPY require two execution

    phases E1 and E2.

    This implies that a latency of 2-Instruction cycle and a delay of 1-

    Instruction cycle are introduced in multiply instruction.

    Latency:

    Is the number of cycles between the execution of two consecutive

    instruction on the same functional unit.Delay:

    Delay is the number of cycles until the result is ready.Eg. LDB & LDH requires E1 to E5, thus latency & delay is 5-Instruction cycles.

    Eg. Branch Instruction (B) needs E1 but reaches its target 5 cycles later. Therefore

    branch have latency 6-Instruction cycle.

  • 8/11/2019 Programmable Dsp Lecture4

    10/24

    P.D. Sawaant 10

    Pipeline Operation of TMS320C67XX Processors

    Some Floating point instruction require additional delay slots

    (E2-E10). Which comprise the additional delay after the E1 stage of

    pipeline.

    TMS320C67xx

    Pipeline Phases

  • 8/11/2019 Programmable Dsp Lecture4

    11/24

    P.D. Sawaant 11

    Pipeline Operation of TMS320C67XX Processors

  • 8/11/2019 Programmable Dsp Lecture4

    12/24

    P.D. Sawaant 12

    Pipeline Operation of TMS320C67XX Processors

    Diagram shows the progression of instruction cycles in the pipeline.

  • 8/11/2019 Programmable Dsp Lecture4

    13/24

    P.D. Sawaant 13

    Pipeline Operation of TMS320C67XX Processors

    Parallel Operations:

    The instruction word for each functional unit is 32 bits long.

    Instructions are fetched 8 at a time consisting of 8 32 =

    256 bits.

    This group is called a Fetch Packet.

    Fetch packets must start at an address that is a multiple of 832-bit words.

    Up to 8 instructions can be executed in parallel.

    Each must use a different functional unit.

    Each group of parallel instructions is called an ExecutePacket.

  • 8/11/2019 Programmable Dsp Lecture4

    14/24

    P.D. Sawaant 14

    Peripherals of TMS320C6713

    The TMS320C67x devices contain peripherals for

    communication with off-chip memory, co-processors, host

    processors and serial devices.

    The following subsections discuss the peripherals of C6713

    processor.

    Enhanced DMA (EDMA)

    Host-Port Interface (HPI)

    External Memory Interface (EMIF)

    Multichannel Buffered Serial Port (McBSP)

    TimersMultichannel Audio Serial Ports (McASP)

    Power Down Logic

  • 8/11/2019 Programmable Dsp Lecture4

    15/24

    P.D. Sawaant 15

    Peripherals of TMS320C6713

    Enhanced DMA (EDMA):

    The EDMA has following features:

    Background operation: The DMA operates independently of the CPU.The EDMA has 16-independently programmable channels.

    High throughput: Elements can be transferred at the CPU clock rate.

    Sixteen channels: The EDMA can keep track of the contexts of

    sixteen independent transfers.

    Split operation: A single channel may be used simultaneously to

    perform both receive and transmit element transfers to or from two

    peripherals and memory.

    Programmable priority: Each channel has independently

    programmable priorities versus the CPU.

  • 8/11/2019 Programmable Dsp Lecture4

    16/24

    P.D. Sawaant 16

    Peripherals of TMS320C6713

    Enhanced DMA (EDMA):

    The EDMA has following features(cont)

    Each channels source and destination address registers can have

    configurable indexes for each read and write transfer. The address may

    remain constant, increment, decrement, or be adjusted by a

    programmable value.

    Programmable-width transfers: Each channel can be independently

    configured to transfer bytes, 16-bit half words, or 32-bit words.

    Authentication: Once a block transfer is complete, an EDMA channel

    may automatically reinitialize itself for the next block transfer.

    Linking: Each EDMA channel can be linked to a subsequent transferto perform after completion.

    Event synchronization: Each channel is initiated by a specific event.

    Transfers may be either synchronized by element or by frame.

  • 8/11/2019 Programmable Dsp Lecture4

    17/24

    P.D. Sawaant 17

    Peripherals of TMS320C6713

    Host Port Interface :

    HPI is a 16-bit wide parallel port through which a host processor can

    directly access the CPUs memory space.

    The host device functions as a master to the interface, which increases

    ease of access.

    The host and CPU can exchange information via internal or external

    memory.

    The host also has direct access to memory-mapped peripherals.

    The HPI is connected to the internal memory via a set of registers.

  • 8/11/2019 Programmable Dsp Lecture4

    18/24

    P.D. Sawaant 18

    Peripherals of TMS320C6713

    Host Port Interface (cont) :

    Either the host or the CPU may use the HPI Control register (HPIC) to

    configure the interface.

    The host can access the host address register (HPIA) and the host data

    register (HPID) to access the internal memory space of the device.

    The host accesses these registers using external data and interface

    control signals.

    The HPIC is a memory-mapped register, which allows the CPU access.

    The data transactions are performed within the EDMA, and are

    invisible to the user.

  • 8/11/2019 Programmable Dsp Lecture4

    19/24

    i f S320C6 13

  • 8/11/2019 Programmable Dsp Lecture4

    20/24

    P.D. Sawaant 20

    Peripherals of TMS320C6713

    Multichannel Buffered Serial Port(McBSP):

    standard serial port interface provides:

    Full-duplex communication

    Double-buffered data registers, which allow a continuous data streamIndependent framing and clocking for reception and transmission

    Direct interface to industry-standard codecs, analog interface chips

    (AICs), and other serially connected A/D and D/A devices.

    External shift clock generation or an internal programmable frequency

    shift clock.

    P i h l f TMS320C6713

  • 8/11/2019 Programmable Dsp Lecture4

    21/24

    P.D. Sawaant 21

    Peripherals of TMS320C6713

    Multichannel Buffered Serial Port(McBSP) cont

    Multichannel transmission and reception of up to 128 channels.

    8-bit data transfers with LSB or MSB first.

    Programmable polarity for both frame synchronization and data clocks.

    Highly programmable internal clock and frame generation.

    P i h l f TMS320C6713

  • 8/11/2019 Programmable Dsp Lecture4

    22/24

    P.D. Sawaant 22

    Peripherals of TMS320C6713

    Timers:

    The C62x/C67x has two 32-bit general-purpose timers that can be

    used to:

    Time events

    Count events Generate pulses

    Interrupt the CPU

    Send synchronization events to the DMA controller

    P i h l f TMS320C6713

  • 8/11/2019 Programmable Dsp Lecture4

    23/24

    P.D. Sawaant 23

    Peripherals of TMS320C6713

    Multichannel Audio Serial Port:

    The C6713 processor includes two Multichannel Audio Serial Ports

    (McASP).

    The McASP interface modules each support one transmit and one

    receive clock zone. Each of the McASP has eight serial data pins which

    can be individually allocated to any of the two zones.The serial port supports time-division multiplexing on each pin from 2

    to 32 time slots.

    The McASP also provides extensive error-checking and recovery

    features, such as the bad clock detection circuit for each high-frequency

    master clock which verifies that the master clock is within a programmed

    frequency range.

    P i h l f TMS320C6713

  • 8/11/2019 Programmable Dsp Lecture4

    24/24

    P.D. Sawaant 24

    Peripherals of TMS320C6713

    Power Down Logic:

    Most of the operating power of CMOS logic is dissipated during

    circuit switching, from one logic state to another.

    By preventing some or all of the chips logic from switching,

    significant power savings can be realized without losing any data or

    operational context.

    Power-down mode PD1 blocks the internal clock inputs at the

    boundary of the CPU, preventing most of its logic from switching,effectively shutting down the CPU.

    Additional power savings are accomplished in power-down mode PD2,

    in which the entire on chip clock structure (including multiple buffers) is

    halted at the output of the PLL.Power-down mode PD3 shuts down the entire internal clock tree (like

    PD2) and also disconnects the external clock source (CLKIN) from

    reaching the PLL. Wake-up from PD3 takes longer than wake-up from

    PD2 because the PLL needs to be relocked, just as it does following

    power up.