Programmable Dsp Lecture4

8/11/2019 Programmable Dsp Lecture4

1/24

1P.D. Sawaant

Contents:

Memory space of TMS320C67XX

Program Control..

Interrupts of TMS320C67XX processors.

Pipeline Operation of TMS320C67XX Processors.

On-Chip peripherals.


2/24

P.D. Sawaant 2

Memory Map of TMS320C67xx Processor.

The processor uses a two-level cache-based architecture.

The Level 1 program cache (L1P) is a 4K-byte direct-mapped cache and the Level 1 data cache (L1D) is a 4K-

byte 2-way set-associative cache.

The Level 2 memory/cache (L2) consists of a 256K-byte

memory space that is shared between program and data

space.

64K bytes of the 256K bytes in L2 memory can be

configured as mapped memory, cache, or combinations of

the two.

The remaining 192K bytes in L2 serve as mapped SRAM.


3/24

P.D. Sawaant 3



4/24

P.D. Sawaant 4


L2 Memory

Configuration


5/24

P.D. Sawaant 5

On Chip Memory and Peripherals for TMS320C67xx Processor.

Processors

(TMS320C67xx)

Data Memory Program

Memory

Peripherals and external memory

interface.

TMS320C6701 16K X 32 16K X 32 A 4-channel DMA, A16-bit HPI, 2-

BSPs, 2-Timers and a 32-bit EMIF

TMS320C6711 32K bits

L1 cache

32K bits

L1 cache

A 16-channel enhanced DMA, a 16-bit

HPI , 2-BSPs, 2-Timers, and a 32-bit

external memory interface.

512K bits unified L2 cache

TMS320C6712 32K bits

L1 cache

32K bits

L1 cache


HPI , 2-Serial ports, 2-Timers, and a 16-

bit EMIF.512K bits unified L2 cache

TMS320C6713 4K bytes

L1 cache

4K bytes

L1 cache


HPI , 2-McBSPs, 2-Timers, and a 32-bit

EMIF.64K bytes L2 cache and

192K bytes L2 SRAM


6/24


7/24

P.D. Sawaant 7

Interrupts of TMS320C67xx Processors:

Many times, when CPU is in the midst of executing a program, a peripheral device

may require a service from the CPU. In such a situation, the main program may be

interrupted by a signal generated by the peripheral devices. This results in the processorsuspending the main program in order to execute another program, called interrupt

service routine, to service the peripheral device. On completion of the interrupt service

routine, the processor returns to the main program to continue from where it left.

Interrupt may be generated either by an internal or an external device. It may also be

generated by software.

Not all interrupts are serviced when they occur. Only those interrupts that are called

nonmaskableare serviced whenever they occur.

Other interrupts, which are called maskableinterrupts, are serviced only if they are

enabled.

There is also a priority to determine which interrupt gets serviced first if more thanone interrupts occur simultaneously.

Almost all the devices of TMS320C67xx family have 32 interrupts. However, the

types and the number under each type vary from device to device.

Some of these interrupts are reserved for use by the CPU.


8/24

P.D. Sawaant 8

Pipeline Operation of TMS320C67XX Processors

The CPU of 67xx devices have a 16-level-deep instruction pipeline.

FETCH Phase: Includes

1) PG (Program Address Generation Phase) : Computes the nextsequential fetch-packet address or branch address.

2) PS(Program-Address-Send phase) : sends the program address to

memory.

3) PW(Program-Address-Ready-Wait phase) : Waits until either amemory access is completed.

4) PR(Program-Fetch-Packet-Receive Phase) : Receives the fetch

packet from memory.

DECODE Phase: Include

5) DP(Instruction-Dispatch phase): Separates fetch packets into execute

packets.

6) DC(Instruction-Decode phase): Decode source register, destination

register and associated paths.


9/24

P.D. Sawaant 9


EXECUTE Phase:

It is divided in to phases 7-11(E1-E5).

Different instruction require different number of phases.

CPU executes each instruction within 8-Functional units.

Most instruction require only one execution phase E1 & no delay.

Multiply Instruction like MPY and SMPY require two execution

phases E1 and E2.

This implies that a latency of 2-Instruction cycle and a delay of 1-

Instruction cycle are introduced in multiply instruction.

Latency:

Is the number of cycles between the execution of two consecutive

instruction on the same functional unit.Delay:

Delay is the number of cycles until the result is ready.Eg. LDB & LDH requires E1 to E5, thus latency & delay is 5-Instruction cycles.

Eg. Branch Instruction (B) needs E1 but reaches its target 5 cycles later. Therefore

branch have latency 6-Instruction cycle.


10/24

P.D. Sawaant 10


Some Floating point instruction require additional delay slots

(E2-E10). Which comprise the additional delay after the E1 stage of

pipeline.

TMS320C67xx

Pipeline Phases


11/24

P.D. Sawaant 11



12/24

P.D. Sawaant 12


Diagram shows the progression of instruction cycles in the pipeline.


13/24

P.D. Sawaant 13


Parallel Operations:

The instruction word for each functional unit is 32 bits long.

Instructions are fetched 8 at a time consisting of 8 32 =

256 bits.

This group is called a Fetch Packet.

Fetch packets must start at an address that is a multiple of 832-bit words.

Up to 8 instructions can be executed in parallel.

Each must use a different functional unit.

Each group of parallel instructions is called an ExecutePacket.


14/24

P.D. Sawaant 14

Peripherals of TMS320C6713

The TMS320C67x devices contain peripherals for

communication with off-chip memory, co-processors, host

processors and serial devices.

The following subsections discuss the peripherals of C6713

processor.

Enhanced DMA (EDMA)

Host-Port Interface (HPI)

External Memory Interface (EMIF)

Multichannel Buffered Serial Port (McBSP)

TimersMultichannel Audio Serial Ports (McASP)

Power Down Logic


15/24

P.D. Sawaant 15


Enhanced DMA (EDMA):

The EDMA has following features:

Background operation: The DMA operates independently of the CPU.The EDMA has 16-independently programmable channels.

High throughput: Elements can be transferred at the CPU clock rate.

Sixteen channels: The EDMA can keep track of the contexts of

sixteen independent transfers.

Split operation: A single channel may be used simultaneously to

perform both receive and transmit element transfers to or from two

peripherals and memory.

Programmable priority: Each channel has independently

programmable priorities versus the CPU.


16/24

P.D. Sawaant 16


Enhanced DMA (EDMA):

The EDMA has following features(cont)

Each channels source and destination address registers can have

configurable indexes for each read and write transfer. The address may

remain constant, increment, decrement, or be adjusted by a

programmable value.

Programmable-width transfers: Each channel can be independently

configured to transfer bytes, 16-bit half words, or 32-bit words.

Authentication: Once a block transfer is complete, an EDMA channel

may automatically reinitialize itself for the next block transfer.

Linking: Each EDMA channel can be linked to a subsequent transferto perform after completion.

Event synchronization: Each channel is initiated by a specific event.

Transfers may be either synchronized by element or by frame.


17/24

P.D. Sawaant 17


Host Port Interface :

HPI is a 16-bit wide parallel port through which a host processor can

directly access the CPUs memory space.

The host device functions as a master to the interface, which increases

ease of access.

The host and CPU can exchange information via internal or external

memory.

The host also has direct access to memory-mapped peripherals.

The HPI is connected to the internal memory via a set of registers.


18/24

P.D. Sawaant 18


Host Port Interface (cont) :

Either the host or the CPU may use the HPI Control register (HPIC) to

configure the interface.

The host can access the host address register (HPIA) and the host data

register (HPID) to access the internal memory space of the device.

The host accesses these registers using external data and interface

control signals.

The HPIC is a memory-mapped register, which allows the CPU access.

The data transactions are performed within the EDMA, and are

invisible to the user.


19/24

i f S320C6 13


20/24

P.D. Sawaant 20


Multichannel Buffered Serial Port(McBSP):

standard serial port interface provides:

Full-duplex communication

Double-buffered data registers, which allow a continuous data streamIndependent framing and clocking for reception and transmission

Direct interface to industry-standard codecs, analog interface chips

(AICs), and other serially connected A/D and D/A devices.

External shift clock generation or an internal programmable frequency

shift clock.

P i h l f TMS320C6713


21/24

P.D. Sawaant 21


Multichannel Buffered Serial Port(McBSP) cont

Multichannel transmission and reception of up to 128 channels.

8-bit data transfers with LSB or MSB first.

Programmable polarity for both frame synchronization and data clocks.

Highly programmable internal clock and frame generation.



22/24

P.D. Sawaant 22


Timers:

The C62x/C67x has two 32-bit general-purpose timers that can be

used to:

Time events

Count events Generate pulses

Interrupt the CPU

Send synchronization events to the DMA controller



23/24

P.D. Sawaant 23


Multichannel Audio Serial Port:

The C6713 processor includes two Multichannel Audio Serial Ports

(McASP).

The McASP interface modules each support one transmit and one

receive clock zone. Each of the McASP has eight serial data pins which

can be individually allocated to any of the two zones.The serial port supports time-division multiplexing on each pin from 2

to 32 time slots.

The McASP also provides extensive error-checking and recovery

features, such as the bad clock detection circuit for each high-frequency

master clock which verifies that the master clock is within a programmed

frequency range.



24/24

P.D. Sawaant 24


Power Down Logic:

Most of the operating power of CMOS logic is dissipated during

circuit switching, from one logic state to another.

By preventing some or all of the chips logic from switching,

significant power savings can be realized without losing any data or

operational context.

Power-down mode PD1 blocks the internal clock inputs at the

boundary of the CPU, preventing most of its logic from switching,effectively shutting down the CPU.

Additional power savings are accomplished in power-down mode PD2,

in which the entire on chip clock structure (including multiple buffers) is

halted at the output of the PLL.Power-down mode PD3 shuts down the entire internal clock tree (like

PD2) and also disconnects the external clock source (CLKIN) from

reaching the PLL. Wake-up from PD3 takes longer than wake-up from

PD2 because the PLL needs to be relocked, just as it does following

power up.

Documents

Programmable Dsp Lecture4