45
Module 6: Programmable Com ponents in SoC I I 이 이 이 ( 이이이이이 , 이이이이이이이이이 )

Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

Embed Size (px)

Citation preview

Page 1: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

Module 6:Programmable Components in SoC II

이 찬 호 (숭실대학교 , 정보통신전자공학부 )

Page 2: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

2 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

목 차 1. DPS Processor Introduction

1.1 Introduction 1.2 Fast Multipliers 1.3 Multiple Execution Units 1.4 Efficient Memory Access 1.5 High Memory Bandwidth Requirement 1.6 Data Format 1.7 Efficient Zero Overhead Looping 1.8 Streamlined I/O 1.9 Specialized Instruction Sets

2. Piccolo for ARM 2.1 Overview of Piccolo 2.2 Organization 2.3 Input & Output Buffer 2.4 Register Bank 2.5 Process

Page 3: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

3 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

목 차 3. v5TE for ARM

3.1 Overview of v5TE 3.2 Multiplication Instruction 3.3 Addition/Subtraction Instruction

4. TeakLite & Teak 4.1 Overview of TeakLite & Teak 4.2 CEVA-TeakLite Core Block Diagram 4.3 CEVA-Teak Core Block Diagram 4.4 Features 4.5 CBU 4.6 DAAU 4.7 PCU 4.8 Memory Organization 4.9 Power Management Modes 4.10 CEVA-Teak Performance

Page 4: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

4 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

목차 1. DPS Processor Introduction

1.1 Introduction 1.2 Fast Multipliers 1.3 Multiple Execution Units 1.4 Efficient Memory Access 1.5 High Memory Bandwidth Requirement 1.6 Data Format 1.7 Efficient Zero Overhead Looping 1.8 Streamlined I/O 1.9 Specialized Instruction Sets

2. Piccolo for ARM 3. v5TE for ARM 4. Teak & TeakLite

Page 5: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

5 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.1 Introduction (1/2) Communication system

Human interface: analog signal Signal processing: digital signal A/D, D/A converdion

[1]

Page 6: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

6 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.1 Introduction (2/2) [2] Digital signal processor

Type of microprocessor optimized for digital signal processing Fast and powerful

Communication, medical, military and industrial products

Adventage Speed, cost and energy efficiency.

Fast Multipliers Multiple Execution Units Efficient Memory Access Data Format Efficient Zero-Overhead Looping Streamlined I/O Specialized Instruction Sets

Page 7: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

7 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.2 Fast Multipliers The most common operations in signal

processing y=∑xh :multiplication & accumulation Convolution, IIR filtering, Fourier Transforms, etc.

Need for fast multiplication & addition operations Shift, multiplication and addition in a loop

Each require one or more cycle

Need to develope special hardware for multiplication In 1982, Texas Instruments(TMS32010) (in a single clock cycle)

All modern DSP processors include at least One or more “dedicated, single-cycle multiplier” Combined multiply-and-accumulate unit (MAC)

Page 8: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

8 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.3 Multiple Execution Units Need to perform high computational tasks

Real time operation Ex) Filtering signals in 10-100kHz sampling rate in real

time Several independent execution units are

required Parallel operation Ex) Arithmetic Logical Unit (ALU) and shifter in parallel to

MAC units Pipelining

Single instruction multiple data (SIMD)

Page 9: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

9 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.4 Efficient Memory Access

Executing a MAC in a single cycle means Fetching the MAC instruction in a single cycle Fetching data sample in a single cycle Fetching filter coefficients in a single cycle

Good performance requires high memory bandwidth

Approach: using two or more seperate memory banks

Each have its own bus Each could be read or written during every cycle

Page 10: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

10 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.5 High Memory Bandwidth Requirement

Dedicated hardware for calculating memory addresses Address Generation Units

Memory access is very predictable in DSP Ex) FIR filter: coefficients accessed sequentially

Register indirect addressing with post increment

Increment of address pointer where repetitive computations are performed on a series of data

Circular Addressing Allows processor to access data sequentially and then

automatically wrap around to the beginning address

Page 11: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

11 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.6 Data Format DSP algorithms generally use floating point

formats More complex hardware Better accuracy than fixed point processors

Fixed point processors Cheaper and less power consuming 16 bit data words: sufficient for many applications 20, 24 or 32 bit data word for better accuracy

Shortest data word width with adequate accuracy Considering the cost & energy consumption

Accumulator Registers Wider than other registers Provide extra guard bits to avoid overflow

Page 12: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

12 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II1.7 Efficient Zero Overhead

Looping

DSP algorithms have many loops Efficient looping is required

Special loop : Zero Overhead Looping No loop counter No branching back to the top of the loop

Page 13: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

13 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.8 Streamlined I/O

Specialized serial or parallel I/O interfaces Streamlined I/O handling mechanisms Ex)

Low overhead interrupts Direct memory access (DMA)

Page 14: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

14 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

1.9 Specialized Instruction Sets (1/2)

Two goals in instruction sets To make maximum use of hardware and to increase

efficiency Programmers can specify parallel operations in a single instruc

tion To minimize memory space : keeping instructions short

Use mode bits rather than encoding Restrict operations to specific registers Restrict operation combinations in the instruction

This makes DSP instructions complicated

Page 15: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

15 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II1.9 Specialized Instruction Sets (2/

2)

DSPs aren’t usually programmed in high level languages : C,C++..etc

Program optimization is essential Programmers should optimize code in assembly level

Easier instruction set, more desirable it is for programmers

Page 16: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

16 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

목차 1. DPS Processor Introduction 2. Piccolo for ARM

2.1 Overview of Piccolo 2.2 Organization 2.3 Input & Output Buffer 2.4 Register Bank 2.5 Process

3. v5TE for ARM 4. Teak & TeakLite 5. OMAP

Page 17: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

17 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

2.1 Overview of Piccolo [3]

Sophisticated 16-bit signal processor Designed to assist the ARM7 Can’t be used as a stand-alone Licensable cores Up to 70MHz / 3V Supports zero-overhead single and multi

instruction hardware loops Single instruction cycle Digital cellular handset, modem, pager,

multimedia applications

Page 18: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

18 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

2.2 Organization (1/2)

Page 19: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

19 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

2.2 Organization (2/2) Input buffer Output buffer Register bank 16-bit Multiplier 32-bit Barrel shifter 32-bit ALU

Page 20: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

20 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

2.3 Input & Output Buffer

To move many 16-bit values in a single instruction

Transfer in pairs Full usage of 32-bit bus Input buffer

Reorder buffer Output buffer

First in first out (FIFO) buffer

Page 21: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

21 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

2.4 Register Bank

Hold operands : 16 bit, 32 bit, 48 bit 16 general purpose registers

12 registers : 32 bits 4 registers : 48 bits

For accumulator

Page 22: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

22 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

2.5 Process

The ARM7 and Piccolo are separate processors RISC-based instruction set Execute programs

When DSP functionality is required, -> the ARM7 issues an instruction -> the Piccolo processor begins execution

(a specified address in memory )

Page 23: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

23 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

목 차 DSP Processor Introduction Piccolo for ARM v5TE for ARM TeakLite Teak OMAP

Page 24: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

24 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

목차 1. DPS Processor Introduction 2. Piccolo for ARM 3. v5TE for ARM

3.1 Overview of v5TE 3.2 Multiplication Instruction 3.3 Addition/Subtraction Instruction

4. Teak & TeakLite 5. OMAP

Page 25: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

25 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

3.1 Overview of v5TE [3] (1/2) In 1999, ARM v5TE architecture: ARM DSP

instruction set extensions Enhanced 32-bit arithmetic capabilities in a single

general purpose CPU improved performance and flexibility

Included in E series (ARM9E, ARM10E…) Up to a 70% increase in speed for audio DSP

applications

Page 26: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

26 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

3.1 Overview of v5TE (2/2) First implemented on the ARM9E-S

synthesizable core A very different approach to the problem

from that used in the design of Piccolo Q flag: sticky overflow flag

remains set until explicitly reset by an MSR instruction Series of instruction may be executed Q flag inspected only once

(using an MRS instruction)

[ ARM v5TE PSR format ]

NZCV modeTIFunusedQ31 8 7 6 5 4 028 27 26

Page 27: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

27 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

3.2 Multiplication Instruction 16-bit data types 32-bit ARM register may hold two 16-bit values → eff

icient access to values 16x16, 32x16 multiplication/accumulation

SMLAxy, SMLAWy, SMULWy, SMULxy x,y = 0 (lower) or = 1 (upper)

cond Rm1yx031 8 7 6 5 4 028 27 26

00010 mul0 Rd/RdHi

Rd/RdLo Rs

Page 28: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

28 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II3.3 Addition/Subtraction

Instruction 32bit operation using saturating arithmetic

Overflows → the nearest value is returned and the Q flag set

QADD, QSUB, QDADD, QDSUB QDADD, QDSUB: doubles one of the operands before the

addition and subtraction

cond Rm010131 8 7 6 5 4 028 27 26

00010 op0 Rn Rd 0000

Page 29: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

29 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

목차 1. DPS Processor Introduction 2. Piccolo for ARM 3. v5TE for ARM 4. Teak & TeakLite

4.1 Overview of TeakLite & Teak 4.2 Features 4.3 CEVA-TeakLite Core Block Diagram 4.4 CEVA-Teak Core Block Diagram 4.5 CBU 4.6 DAAU 4.7 PCU 4.8 Memory Organization 4.9 Power Management Modes 4.10 CEVA-Teak Performance

5. OMAP

Page 30: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

30 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.1 Overview of TeakLite & Teak [4][5]

Low power, high performance DSP Group 16-bit (data and program) core Fixed-point Fully synthesizable (soft core) Process independent design Application

2G wireless handsets, internet audio players, magnetic & optical drives, IP phones, modems, etc. (TeakLite)

Cellular handset, PDA and Smart phone, VoIP, Portable Audio, Digital Still Camera, etc. (Teak)

Based on OakDSPCore® (TaekLite)

Page 31: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

31 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.2 Features CBU - Computation and Bit Unit

Computation Unit (CU) Bit Manipulation Unit (BMU)

DAAU - Data Address Arithmetic Unit PCU - Program Control Unit IDU - Instruction Decode Unit OFU - Operand Fetch Unit

Page 32: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

32 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.3 CEVA-TeakLite Core Block Diagram

Page 33: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

33 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II4.4 CEVA-Teak Core Block

Diagram

Page 34: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

34 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.5 CBU (1/3) Computation unit (CU) and Bit Manipulation Unit

(BMU) Single cycle Multiply-Accumulate (MAC) instructions

Dual MAC instruction (Teak) Single cycle division step Single cycle exponent evaluation Maximum/Minimum calculation in a single cycle Zero overhead block repeat Codebook search Viterbi built in accelerator Dedicated FFT accelerator (Teak) Parallel instructions execution in a single cycle

(Teak)

Page 35: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

35 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.5 CBU (2/3)

CBU of Teak

CBU of TeakLite

Page 36: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

36 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.5 CBU (3/3)TeakLite Teak Note

Transfer 2 (16 bit) 4 (16 bit) Parallel

Multiplier1 (16 x 16 bit)

2 (16 x 16 bit)

ComplementParallel

ALU 36 bit 40 bit 3 input

Accumulator 4 (36 bit) 4 (40 bit)Independent

operating

Shifter 36 bit 40 bit Barrel

Note Bit Field Operation (BFO)

Page 37: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

37 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.6 DAAU (1/2) Addressing modes

Direct (TeakLite) / Indirect (Teak) Short/Long Direct Short/Long Index Short/Long Immediate Stack Pointer Program Memory Indirect (TeakLite) Bit-reverse (Teak)

Page 38: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

38 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.6 DAAU (2/2) General purpose address pointer registers

(6+3) x 16-bit (TeakLite) (8) x 16-bit (Teak)

Alternative bank of registers Four 16-bit User Defined Registers Enables both linear and cyclic pointer modification (Teak) Enables four 16-bit data memory transactions in parallel

(Teak)

Page 39: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

39 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.7 PCU Program Control Unit Zero Overhead looping

Block repeat instructions Repeat instruction

Single cycle interrupt latency Interrupts types

Three maskable Non-maskable Trap (software interrupts) Breakpoint Vector (Teak)

Page 40: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

40 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.8 Memory Organization Program memory space

TeakLite Up to 64K-word

Teak Linear space, up to 256K-word Total space paging: up to 4M-word

Data memory space Up to 64K-word, three sections X & Y : zero wait state transactions, memory interface Z : slow devices (peripheral) interface Flexible and configurable at a 1K word resolution

Page 41: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

41 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.9 Power Management Modes

Active Mode Slow Mode

Reduces clock speed and current consumption linearly Stop Mode

leakage current only

Page 42: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

42 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

4.10 CEVA-Teak Performance

Frequency 135 MHz @ 0.13um worst case

Power dissipation 0.27 mA/MHz in a typical DSP application 0.45 mA/MHz in a typical DSP application with DMA intensive tra

nsfers

Page 43: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

43 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

Summary (1/2) DSP processors

Fast and powerful performance of digital signal processing operation

Specialized instruction set : shift, multiplication, addition Piccolo

Digital signal processing unit for ARM7 Licensable core

v5TE: signal processing instruction set for ARM-E Teak & TeakLite

Synthesizable embedded DSP core Process independent soft core

OMAP (TI) : software platform with MCU and DSP core TI925T MPU & TMS320C55x DPS core

Page 44: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

44 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

Summary (2/2) Comparison

Piccolo TeakLite TeakOMAP

(TMS320C55x)

ALU One 32bit One 36bit One 40bitOne 40bitOne 16bit

Barrel shifter

One 32bit One 36bit One 40bit One 40bit

Multiplier One 16bit One 16bit Two 16bit Two 17bit

Accumulator

- Four 36bit Four 40bit Four 40bit

Performance

(MHz)70 135

144, 160,200, 300

Power(mA)

0.27/MHz0.45/MHz

0.05/MIPS

Page 45: Module 6: Programmable Components in SoC II 이 찬 호 ( 숭실대학교, 정보통신전자공학부 )

45 Copyright 2003ⓒ

SoC Architecture 6. Programmable processor components in SoC II

Reference [1]http://dspvillage.ti.com/docs/catalog/dspplatform/details.jhtml?

templateId=5121&path=templatedata/cm/dspdetail/data/vil_getstd_whatis

[2]http://www.cmpe.boun.edu.tr/courses/cmpe511/fall2003/dsp.ppt

[3]ARM system-on-chip architecture, 2nd, Steve Furber, ADDISON-WESLEY

[4]http://www.ceva-dsp.com/products/cores/ceva-teak.php (CEVA-Teak Datasheet)

[5]http://www.ceva-dsp.com/products/cores/ceva-teaklite.php (CEVA-TeakLite Datasheet)

[6]http://focus.ti.com/docs/prod/folders/print/omap5910.html (OMAP5910 Dual-Core Processor (Rev. C), OMAP5910 Dual-Core Processor Silicon Errata (Rev. A))

[7] Winning the SoC Revolution : Experienced in Real Design, Grant Martin & Henry Chang, KLUWER ACADEMIC PUBLISHERS