Upload
philippa-matthews
View
218
Download
0
Embed Size (px)
Citation preview
Module 6:Programmable Components in SoC II
이 찬 호 (숭실대학교 , 정보통신전자공학부 )
2 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
목 차 1. DPS Processor Introduction
1.1 Introduction 1.2 Fast Multipliers 1.3 Multiple Execution Units 1.4 Efficient Memory Access 1.5 High Memory Bandwidth Requirement 1.6 Data Format 1.7 Efficient Zero Overhead Looping 1.8 Streamlined I/O 1.9 Specialized Instruction Sets
2. Piccolo for ARM 2.1 Overview of Piccolo 2.2 Organization 2.3 Input & Output Buffer 2.4 Register Bank 2.5 Process
3 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
목 차 3. v5TE for ARM
3.1 Overview of v5TE 3.2 Multiplication Instruction 3.3 Addition/Subtraction Instruction
4. TeakLite & Teak 4.1 Overview of TeakLite & Teak 4.2 CEVA-TeakLite Core Block Diagram 4.3 CEVA-Teak Core Block Diagram 4.4 Features 4.5 CBU 4.6 DAAU 4.7 PCU 4.8 Memory Organization 4.9 Power Management Modes 4.10 CEVA-Teak Performance
4 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
목차 1. DPS Processor Introduction
1.1 Introduction 1.2 Fast Multipliers 1.3 Multiple Execution Units 1.4 Efficient Memory Access 1.5 High Memory Bandwidth Requirement 1.6 Data Format 1.7 Efficient Zero Overhead Looping 1.8 Streamlined I/O 1.9 Specialized Instruction Sets
2. Piccolo for ARM 3. v5TE for ARM 4. Teak & TeakLite
5 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.1 Introduction (1/2) Communication system
Human interface: analog signal Signal processing: digital signal A/D, D/A converdion
[1]
6 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.1 Introduction (2/2) [2] Digital signal processor
Type of microprocessor optimized for digital signal processing Fast and powerful
Communication, medical, military and industrial products
Adventage Speed, cost and energy efficiency.
Fast Multipliers Multiple Execution Units Efficient Memory Access Data Format Efficient Zero-Overhead Looping Streamlined I/O Specialized Instruction Sets
7 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.2 Fast Multipliers The most common operations in signal
processing y=∑xh :multiplication & accumulation Convolution, IIR filtering, Fourier Transforms, etc.
Need for fast multiplication & addition operations Shift, multiplication and addition in a loop
Each require one or more cycle
Need to develope special hardware for multiplication In 1982, Texas Instruments(TMS32010) (in a single clock cycle)
All modern DSP processors include at least One or more “dedicated, single-cycle multiplier” Combined multiply-and-accumulate unit (MAC)
8 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.3 Multiple Execution Units Need to perform high computational tasks
Real time operation Ex) Filtering signals in 10-100kHz sampling rate in real
time Several independent execution units are
required Parallel operation Ex) Arithmetic Logical Unit (ALU) and shifter in parallel to
MAC units Pipelining
Single instruction multiple data (SIMD)
9 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.4 Efficient Memory Access
Executing a MAC in a single cycle means Fetching the MAC instruction in a single cycle Fetching data sample in a single cycle Fetching filter coefficients in a single cycle
Good performance requires high memory bandwidth
Approach: using two or more seperate memory banks
Each have its own bus Each could be read or written during every cycle
10 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.5 High Memory Bandwidth Requirement
Dedicated hardware for calculating memory addresses Address Generation Units
Memory access is very predictable in DSP Ex) FIR filter: coefficients accessed sequentially
Register indirect addressing with post increment
Increment of address pointer where repetitive computations are performed on a series of data
Circular Addressing Allows processor to access data sequentially and then
automatically wrap around to the beginning address
11 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.6 Data Format DSP algorithms generally use floating point
formats More complex hardware Better accuracy than fixed point processors
Fixed point processors Cheaper and less power consuming 16 bit data words: sufficient for many applications 20, 24 or 32 bit data word for better accuracy
Shortest data word width with adequate accuracy Considering the cost & energy consumption
Accumulator Registers Wider than other registers Provide extra guard bits to avoid overflow
12 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II1.7 Efficient Zero Overhead
Looping
DSP algorithms have many loops Efficient looping is required
Special loop : Zero Overhead Looping No loop counter No branching back to the top of the loop
13 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.8 Streamlined I/O
Specialized serial or parallel I/O interfaces Streamlined I/O handling mechanisms Ex)
Low overhead interrupts Direct memory access (DMA)
14 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
1.9 Specialized Instruction Sets (1/2)
Two goals in instruction sets To make maximum use of hardware and to increase
efficiency Programmers can specify parallel operations in a single instruc
tion To minimize memory space : keeping instructions short
Use mode bits rather than encoding Restrict operations to specific registers Restrict operation combinations in the instruction
This makes DSP instructions complicated
15 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II1.9 Specialized Instruction Sets (2/
2)
DSPs aren’t usually programmed in high level languages : C,C++..etc
Program optimization is essential Programmers should optimize code in assembly level
Easier instruction set, more desirable it is for programmers
16 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
목차 1. DPS Processor Introduction 2. Piccolo for ARM
2.1 Overview of Piccolo 2.2 Organization 2.3 Input & Output Buffer 2.4 Register Bank 2.5 Process
3. v5TE for ARM 4. Teak & TeakLite 5. OMAP
17 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
2.1 Overview of Piccolo [3]
Sophisticated 16-bit signal processor Designed to assist the ARM7 Can’t be used as a stand-alone Licensable cores Up to 70MHz / 3V Supports zero-overhead single and multi
instruction hardware loops Single instruction cycle Digital cellular handset, modem, pager,
multimedia applications
18 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
2.2 Organization (1/2)
19 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
2.2 Organization (2/2) Input buffer Output buffer Register bank 16-bit Multiplier 32-bit Barrel shifter 32-bit ALU
20 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
2.3 Input & Output Buffer
To move many 16-bit values in a single instruction
Transfer in pairs Full usage of 32-bit bus Input buffer
Reorder buffer Output buffer
First in first out (FIFO) buffer
21 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
2.4 Register Bank
Hold operands : 16 bit, 32 bit, 48 bit 16 general purpose registers
12 registers : 32 bits 4 registers : 48 bits
For accumulator
22 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
2.5 Process
The ARM7 and Piccolo are separate processors RISC-based instruction set Execute programs
When DSP functionality is required, -> the ARM7 issues an instruction -> the Piccolo processor begins execution
(a specified address in memory )
23 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
목 차 DSP Processor Introduction Piccolo for ARM v5TE for ARM TeakLite Teak OMAP
24 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
목차 1. DPS Processor Introduction 2. Piccolo for ARM 3. v5TE for ARM
3.1 Overview of v5TE 3.2 Multiplication Instruction 3.3 Addition/Subtraction Instruction
4. Teak & TeakLite 5. OMAP
25 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
3.1 Overview of v5TE [3] (1/2) In 1999, ARM v5TE architecture: ARM DSP
instruction set extensions Enhanced 32-bit arithmetic capabilities in a single
general purpose CPU improved performance and flexibility
Included in E series (ARM9E, ARM10E…) Up to a 70% increase in speed for audio DSP
applications
26 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
3.1 Overview of v5TE (2/2) First implemented on the ARM9E-S
synthesizable core A very different approach to the problem
from that used in the design of Piccolo Q flag: sticky overflow flag
remains set until explicitly reset by an MSR instruction Series of instruction may be executed Q flag inspected only once
(using an MRS instruction)
[ ARM v5TE PSR format ]
NZCV modeTIFunusedQ31 8 7 6 5 4 028 27 26
27 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
3.2 Multiplication Instruction 16-bit data types 32-bit ARM register may hold two 16-bit values → eff
icient access to values 16x16, 32x16 multiplication/accumulation
SMLAxy, SMLAWy, SMULWy, SMULxy x,y = 0 (lower) or = 1 (upper)
cond Rm1yx031 8 7 6 5 4 028 27 26
00010 mul0 Rd/RdHi
Rd/RdLo Rs
28 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II3.3 Addition/Subtraction
Instruction 32bit operation using saturating arithmetic
Overflows → the nearest value is returned and the Q flag set
QADD, QSUB, QDADD, QDSUB QDADD, QDSUB: doubles one of the operands before the
addition and subtraction
cond Rm010131 8 7 6 5 4 028 27 26
00010 op0 Rn Rd 0000
29 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
목차 1. DPS Processor Introduction 2. Piccolo for ARM 3. v5TE for ARM 4. Teak & TeakLite
4.1 Overview of TeakLite & Teak 4.2 Features 4.3 CEVA-TeakLite Core Block Diagram 4.4 CEVA-Teak Core Block Diagram 4.5 CBU 4.6 DAAU 4.7 PCU 4.8 Memory Organization 4.9 Power Management Modes 4.10 CEVA-Teak Performance
5. OMAP
30 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.1 Overview of TeakLite & Teak [4][5]
Low power, high performance DSP Group 16-bit (data and program) core Fixed-point Fully synthesizable (soft core) Process independent design Application
2G wireless handsets, internet audio players, magnetic & optical drives, IP phones, modems, etc. (TeakLite)
Cellular handset, PDA and Smart phone, VoIP, Portable Audio, Digital Still Camera, etc. (Teak)
Based on OakDSPCore® (TaekLite)
31 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.2 Features CBU - Computation and Bit Unit
Computation Unit (CU) Bit Manipulation Unit (BMU)
DAAU - Data Address Arithmetic Unit PCU - Program Control Unit IDU - Instruction Decode Unit OFU - Operand Fetch Unit
32 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.3 CEVA-TeakLite Core Block Diagram
33 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II4.4 CEVA-Teak Core Block
Diagram
34 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.5 CBU (1/3) Computation unit (CU) and Bit Manipulation Unit
(BMU) Single cycle Multiply-Accumulate (MAC) instructions
Dual MAC instruction (Teak) Single cycle division step Single cycle exponent evaluation Maximum/Minimum calculation in a single cycle Zero overhead block repeat Codebook search Viterbi built in accelerator Dedicated FFT accelerator (Teak) Parallel instructions execution in a single cycle
(Teak)
35 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.5 CBU (2/3)
CBU of Teak
CBU of TeakLite
36 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.5 CBU (3/3)TeakLite Teak Note
Transfer 2 (16 bit) 4 (16 bit) Parallel
Multiplier1 (16 x 16 bit)
2 (16 x 16 bit)
ComplementParallel
ALU 36 bit 40 bit 3 input
Accumulator 4 (36 bit) 4 (40 bit)Independent
operating
Shifter 36 bit 40 bit Barrel
Note Bit Field Operation (BFO)
37 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.6 DAAU (1/2) Addressing modes
Direct (TeakLite) / Indirect (Teak) Short/Long Direct Short/Long Index Short/Long Immediate Stack Pointer Program Memory Indirect (TeakLite) Bit-reverse (Teak)
38 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.6 DAAU (2/2) General purpose address pointer registers
(6+3) x 16-bit (TeakLite) (8) x 16-bit (Teak)
Alternative bank of registers Four 16-bit User Defined Registers Enables both linear and cyclic pointer modification (Teak) Enables four 16-bit data memory transactions in parallel
(Teak)
39 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.7 PCU Program Control Unit Zero Overhead looping
Block repeat instructions Repeat instruction
Single cycle interrupt latency Interrupts types
Three maskable Non-maskable Trap (software interrupts) Breakpoint Vector (Teak)
40 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.8 Memory Organization Program memory space
TeakLite Up to 64K-word
Teak Linear space, up to 256K-word Total space paging: up to 4M-word
Data memory space Up to 64K-word, three sections X & Y : zero wait state transactions, memory interface Z : slow devices (peripheral) interface Flexible and configurable at a 1K word resolution
41 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.9 Power Management Modes
Active Mode Slow Mode
Reduces clock speed and current consumption linearly Stop Mode
leakage current only
42 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
4.10 CEVA-Teak Performance
Frequency 135 MHz @ 0.13um worst case
Power dissipation 0.27 mA/MHz in a typical DSP application 0.45 mA/MHz in a typical DSP application with DMA intensive tra
nsfers
43 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
Summary (1/2) DSP processors
Fast and powerful performance of digital signal processing operation
Specialized instruction set : shift, multiplication, addition Piccolo
Digital signal processing unit for ARM7 Licensable core
v5TE: signal processing instruction set for ARM-E Teak & TeakLite
Synthesizable embedded DSP core Process independent soft core
OMAP (TI) : software platform with MCU and DSP core TI925T MPU & TMS320C55x DPS core
44 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
Summary (2/2) Comparison
Piccolo TeakLite TeakOMAP
(TMS320C55x)
ALU One 32bit One 36bit One 40bitOne 40bitOne 16bit
Barrel shifter
One 32bit One 36bit One 40bit One 40bit
Multiplier One 16bit One 16bit Two 16bit Two 17bit
Accumulator
- Four 36bit Four 40bit Four 40bit
Performance
(MHz)70 135
144, 160,200, 300
Power(mA)
0.27/MHz0.45/MHz
0.05/MIPS
45 Copyright 2003ⓒ
SoC Architecture 6. Programmable processor components in SoC II
Reference [1]http://dspvillage.ti.com/docs/catalog/dspplatform/details.jhtml?
templateId=5121&path=templatedata/cm/dspdetail/data/vil_getstd_whatis
[2]http://www.cmpe.boun.edu.tr/courses/cmpe511/fall2003/dsp.ppt
[3]ARM system-on-chip architecture, 2nd, Steve Furber, ADDISON-WESLEY
[4]http://www.ceva-dsp.com/products/cores/ceva-teak.php (CEVA-Teak Datasheet)
[5]http://www.ceva-dsp.com/products/cores/ceva-teaklite.php (CEVA-TeakLite Datasheet)
[6]http://focus.ti.com/docs/prod/folders/print/omap5910.html (OMAP5910 Dual-Core Processor (Rev. C), OMAP5910 Dual-Core Processor Silicon Errata (Rev. A))
[7] Winning the SoC Revolution : Experienced in Real Design, Grant Martin & Henry Chang, KLUWER ACADEMIC PUBLISHERS