Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1/26/2013
1
ĐẠI HỌC QUỐC GIA TP.HỒ CHÍ MINHTRƯỜNG ĐẠI HỌC BÁCH KHOA
KHOA ĐIỆN-ĐIỆN TỬ BỘ MÔN KỸ THUẬT ĐIỆN TỬ
TP.Hồ Chí Minh 01/2013
XỬ LÝ TÍN HiỆU SỐ VỚI FPGA
Chaper 1: Introduction
GV: Hoàng TrangEmail: [email protected]
Thank to: thầy Hồ Trung Mỹ
11
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Content
2
+ Tổng quan môn học
+ Phương pháp luận thiết kế và giải pháp FPGA
+ Thiết kế giải thuật DSP với FPGA
1/26/2013
2
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Outline: how to evaluate?
3
How to evaluate?
1. Quiz: 10%2. Homework (textbook) : 10% (team work) 3. Project: 20% (team work)4. Mid-term: 20%5. Final exam: 40%
Textbook:
“VLSI Digital Signal Processing: Design and Implementation”
Keshab K. Parhi
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 4
Outline
Hardware
• DSP Systems, A/D and D/A converters
• FPGA for signal processing (Altera, Xilinx),
• Application domain specific instruction set processors
• SoC, DSP Multiprocessors
• Signal processing arithmetic units
Algorithm design and transformations
• Scheduling, Resource Allocation, Synthesis
• Finite-word length effects
• Algorithmic transformations
• FIR filter design
• FFT design
• IIR filter design
• Adaptive filter design
1/26/2013
3
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 5
Course Conduct
• Course notes will be posted on the course web page
• Assignments with solutions will be provided and will not be graded
• The exam will be prepared based on lecture slides, references and assignments
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 6
Course Objectives … To
• Understand tradeoffs in implementing DSP
algorithms
• Know basic DSP architectures
• Know some reduced complexity strategies for
algorithms mainly on FPGA.
• Know about commercial DSP solution
• Know and understand system-level design tools
• Understand research topics related to algorithmic
modifications and algorithm-architecture
matching
1/26/2013
4
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 7
Why this course?
There is the demand to derive more information per
signal. “More” means
• Faster: Derive more information per unit time;
– Faster hardware
– Newer algorithms with fewer operations
• Cheaper: Derive information at a reduced cost in
processor size, weight, power consumption, or
dollars;
• Better: Derive higher quality information, (higher
precision, finer resolution, higher SNR)
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 8
Hardware and software elementsProgress in signal processing capability is the product of progress in IC devices, architectures, algorithms and mathematics.
1/26/2013
5
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 9
Moore’s Law
9http://www.icknowledge.com/trends/uproc.html
Predicts doubling of circuit density every 1.5 to 2 years.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 10
What is Signal Processing?
10
• Ways to manipulate signal
in its original medium or an
abstract representation.
• Signal can be abstracted as
functions of time or spatial
coordinates.
• Types of processing:
– Transformation
– Filtering
– Detection
– Estimation
– Recognition and classification
– Coding (compression)
– Synthesis and reproduction
– Recording, archiving
– Analyzing, modeling
1/26/2013
6
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 11
Digital Signal Processing
• Signals generated via
physical phenomenon are
analog in that
– Their amplitudes are defined
over the range of
real/complex numbers
– Their domains are continuous
in time or space.
• Digital signal processing
concerns processing signals
using digital computers.
– A continuous time/space
signal must be sampled to
yield countable signal
samples.
– The real-(complex) valued
samples must be quantized to
fit into internal word length.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 12
Digital Signal Processing applications
1/26/2013
7
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13
Signal Processing Systems
The task of digital signal processing (DSP) is to process
sampled signals (from A/D analog to digital converter), and
provide its output to the D/A (digital to analog converter) to
be transformed back to physical signals.
Digital Signal
ProcessingA/D
D/A
Copied from [Hu04-Slides] Design and Implementation of Signal Processing Systems: An Introduction
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Typical DSP Application
14
1/26/2013
8
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 15
Stratix DSP Development Board
40-Pin Connectors for Analog Devices Texas Instruments Connectors on
Underside of Board
Mictor-Type Connectors for HP Logic Analyzers
MAX 7000 Device
Analog SMA Connectors
D/A Converters
A/D Converters
Prototyping Area
Nios Expansion Prototype Connector
[AlteraDSP]
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 16
Example DSP Applications….
DSPDSP
� MILITARYMILITARY�Secure Communications
�Sonar Processing
�Image Processing
�Radar Processing
�Navigation, Guidance
� VOICE/SPEECHVOICE/SPEECH�Speech Recognition
�Speech Processing/Vocoding
�Speech Enhancement
�Text-to-Speech
�Voice Mail
� INSTRUMENTATIONINSTRUMENTATION�Spectrum Analyzers
�Seismic Processors
�Digital Oscilloscopes
�Mass Spectrometers
� MEDICALMEDICAL�Patient Monitoring
�Ultrasound Equipment
�Diagnostic Tools
�Fetal Monitors
�Life Support Systems
�Image Enhancement
� INDUSTRIAL/CONTROLINDUSTRIAL/CONTROL�Robotics
�Numeric Control
�Power Line Monitors
�Motor/Servo Control
� CONSUMERCONSUMER�Radar Detectors
�Power Tools
�Digital Audio / TV
�Music Synthesizers
�Toys / Games
�Answering Machines
�Digital Speakers
� PROPRO--AUDIOAUDIO�AV Editing
�Digital Mixers
�Home Theater
�Pro Audio
� COMMUNICATIONSCOMMUNICATIONS�Echo Cancellation
�Digital PBXs
�Line Repeaters
�Modems
�Global Positioning
�Sound/Modem/Fax Cards
�Cellular Phones
�Speaker Phones
�Video Conferencing
�ATMs
www.analog.com/dsp
1/26/2013
9
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Implementation of DSP Systems
• Platforms:– Native signal processing (NSP)
with general purpose processors
(GPP)
• Multimedia extension (MMX)
instructions
– Programmable digital signal
processors (PDSP)
– Application-Specific Integrated
Circuits (ASIC)
– Field-programmable gate array
(FPGA)
17
• Requirements:– Real time
• Processing must be done before a pre-specified deadline.
– Streamed numerical data
• Sequential processing
• Fast arithmetic processing
– High throughput
• Fast data input/output
• Fast manipulation of data
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
How Fast is Enough for DSP?
• Real time requirements:
– Example: data capture speed must
match sampling rate. Otherwise,
data will be lost.
– Processing must be done by a
specific deadline.
18
• Different throughput rates for
processing different signals
– Throughput ∝sampling rate.
– CD music: 44.1 kHz
– Speech: 8-22 kHz
– Video (depends on frame rate,
frame size, etc.) range from 100s
kHz to MHz.
Example:Processor clocked at 120 MHz and can perform 120MIPS + Sampling rate = 48KHz (Digital Audio Tape - DAT)
number of instructions per sample = (120 x 106)/(48 x 103) = 2500.+ Sampling rate = 8KHz (voice-band, telephony)
number of instructions per sample = 15000.+ Sampling rate = 75MHz (CIF 360x288 Video at 30 frames per second)
number of instructions per sample = 1.6.
1/26/2013
10
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
ASIC: Application Specific ICs
• Custom or semi-custom IC chip or
chip sets developed for specific
functions.
• Suitable for high volume, low cost
productions.
• Example: MPEG codec, 3D graphic
chip, etc.
19
• ASIC becomes popular due to availability of IC foundry services. Fab-less design houses turn innovative design into profitable chip sets using CAD tools.
• Design automation is a key enabling technology to facilitate fast design cycle and shorter time to market delay.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Programmable Digital Signal Processors (PDSPs)
• Micro-processors designed for signal processing applications.
• Special hardware support for:
– Multiply-and-Accumulate (MAC) ops
– Saturation arithmetic ops
– Zero-overhead loop ops
– Dedicated data I/O ports
– Complex address calculation and memory access
– Real time clock and other embedded processing supports.
20
• PDSPs were developed to fill a market segment between GPP and ASIC:
– GPP flexible, but slow
– ASIC fast, but inflexible
• As VLSI technology improves, role of PDSP changed over time.
– Cost: design, sales, maintenance/upgrade
– Performance
1/26/2013
11
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Programmable Digital Signal Processors (PDSPs)
example
21
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
PDSP Market – By Company
22
2001 Market Share
40%
12%16%
8%
24%
Texas Instruments
Motorola
Agere
Analog Devices
Other
2002 Market Share
43%
14%
14%
9%
20%
Ref: Forward Concepts
http://www.fwdconcepts.com/Pages/press42.htm
1/26/2013
12
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
DSP Market – By Application
23
Market Share - 2003
68%
11%
8%
6%4% 3%
WIRELESS
CONSUMER
MULTIPURPOSE
WIRELINE
COMPUTER
AUTOMOTIVE
Ref: Forward Concepts
http://www.fwdconcepts.com/Pages/press42.htm
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Computing using FPGA
• FPGA (Field programmable gate array) is a
derivative of PLD (programmable logic
devices).
• They are hardware configurable to behave
differently for different configurations.
• Slower than ASIC, but faster than PDSP.
• Once configured, it behaves like an ASIC
module.
24
• Use of FPGA
– Rapid prototyping: run fractional
ASIC speed without fab delay.
– Hardware accelerator: using the
same hardware to realize
different function modules to
save hardware
– Low quantity system
deployment
1/26/2013
13
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 25
FPGA example: Stratix EP1S10
Altera Corp., Stratix Module 2: Logic Structure & MultiTrack Interconnect, 2004.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 26
IP Cores
• Processor cores
Start-Core
– 16-bit fixed-point VLIW DSP core from Lucent/Motorola (a company is established by Lucent for DSP section called “Agere”)
– First VLIW machine to target low-power applications
– Pipeline relatively simple
– Targeting 198 mW @ 300 MHz, 1.5 V
• Hardware cores
Altera DSP coresDevice Type
– FIR Compiler
– IIR Compiler
– FFT/IFFT Compiler Transforms– NCO Compiler Signal Generation– Reed-Solomon Compiler Error Detection / Correction– Constellation Mapper/Demapper Modulation / Demodulation– Viterbi Compiler Error Detection / Correction
1/26/2013
14
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
SoC (System-on-Chip)
• With the continuing scaling of modern IC devices, it is now possible to incorporate
– Micro-processor cores + ASIC function blocks
– Analog + digital components
– Computation + communication functions
– I/O, memory + processor
into the same chip to form a comprehensive “system”.
Thus, the notion of System-on-chip (SoC)
27
• SoC uses intellectual properties (IPs)
that are pre-designed modules.
• Designing SoC thus becomes a task
of system integration.
• Challenge issues in SoC design:
– Interface among IPs from different
venders
– Verification of function
– Physical design challenges
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Design Issues????!!!!
• Given a DSP application, which
implementation option should be
chosen?
• For a particular implementation
option, how to achieve optimal
design? Optimal in terms of what
criteria?
28
• Software design:
– NSP, PDSP
– Algorithms are implemented as programs.
• Hardware design:
– ASIC, FPGA
– Algorithms are directly implemented in hardware modules.
• S/H Co-design: System level design methodology.
A design methodology is the overall strategy to organize and solve the design tasks at the different steps of the design processDesign methodology is viewed as the development of a sequence of models of the system, where each version is more refined than the previous one
1/26/2013
15
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Design Process Model
• Design is the process that links
algorithm to implementation
• Algorithm
– Operations
– Dependency between operations
determines a partial ordering of
execution
– Can be specified as a dependence
graph
29
• Implementation
– Assignment: Each
operation can be realized
with
• One or more instructions
(software)
• One or more function
modules (hardware)
– Scheduling: Dependence
relations and resource
constraints leads to a
schedule.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
A Design Example …
Consider the algorithm:
Program:y(0) = 0
For k = 1 to n Do
y(k) = y(k-1)+ a(k)*x(k)
End
y = y(n)
30
• Operations:
– Multiplication
– Addition
• Dependency
– y(k) depends on y(k-1)
– Dependence Graph:
∑=
=n
k
kxkay1
)()(
*
+
a(1) x(1)
*
+
a(2) x(2)
*
+
a(n) x(n)
y(0) y(n)
1/26/2013
16
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Design Example cont’d …
• Software Implementation:
– Map each * op. to a MUL instruction, and each + op. to a ADD instruction.
– Allocate memory space for {a(k)}, {x(k)}, and {y(k)}
– Schedule the operation by sequentially execute y(1)=a(1)*x(1), y(2)=y(1) + a(2)*x(2), etc.
– Note that each instruction is still to be implemented in hardware.
31
• Hardware Implementation:
– Map each * op. to a multiplier,
and each + op. to an adder.
– Interconnect them according
to the dependence graph:
*
+
a(1) x(1)
*
+
a(2) x(2)
*
+
a(n) x(n)
y(0) y(n)
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Observations
• Eventually, an implementation is
realized with hardware.
• However, by using the same
hardware to realize different
operations at different time
(scheduling), we have a
software program!
32
• Bottom line – Hardware/ software co-design. There is a continuation between hardware and software implementation.
• A design must explore both simultaneously to achieve best performance/cost trade-off.
1/26/2013
17
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
A Theme
• Matching hardware to algorithm
– Hardware architecture must match
the characteristics of the algorithm.
– Example: ASIC architecture is
designed to implement a specific
algorithm, and hence can achieve
superior performance.
33
• Formulate algorithm to match
hardware
– Algorithm must be formulated so
that they can best exploit the
potential of architecture.
– Example: GPP, PDSP architectures
are fixed. One must formulate the
algorithm properly to achieve best
performance. Eg. To minimize
number of operations.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 34
Algorithm Reformulation
• Algorithmic level equivalence
– Different filter structures implementing the same
specification
• Exploiting parallelism
– Regular iterative algorithms and loop
reformulation
• Well studied in parallel compiler technology
– Signal flow/Data flow representation
• Suitable for specification of pipelining
1/26/2013
18
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 35
Mapping Algorithm to Architecture
• Scheduling and Assignment Problem
– Resources: hardware modules, and time slots
– Demands: operations (algorithm), and throughput
• Constrained optimization problem
– Minimize resources (objective function) to meet
demands (constraints)
• For regular iterative algorithms and regular
processor arrays -> algebraic mapping.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 36
Implementation process for PDSP
1/26/2013
19
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 37
Direct Mapping Techniques
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 38
FIR Filters
1/26/2013
20
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 39
Transposed FIR Filter
Algorithm transform techniques:– Pipelining and parallelism (Parallelism parallel FIR filter: 3 inputs
are processed at the same time to produce 3 outputs)
– Retiming (Retiming is a transformation technique used to change location of delay elements: reducing the clock period, reducing the number of registers)
– Unfolding-loop unrolling
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 40
A B C D
A B C Dallocation
A B C Dassignment
A B C Dpipelining
clocked flip-flop
ff
clock
≡
Example: One-to-one mapping and pipelining
Analyse timing
• if OK then stop
• else pipelining
1/26/2013
21
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 41
Example of design flow: Coware SPW Design Flow
www.coware.com -> synopsys.com
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 42
System-level design flow: Simulink-Altera
[AlteraDSP]
1/26/2013
22
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 43
Arithmetic
• CORDIC
– Compute elementary functions
• Distributed arithmetic
– ROM based implementation
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 44
Floating to fixed point analysis
• Overflow of the number range
• Large errors in the output signal occur when the available number range is exceeded— overflow.
• Round-off errors
• Rounding or truncation of products must be done in recursive loops so that the word length does not increase for each iteration.
• Coefficient errors
• Coefficients can only be represented with finite precision.
• Design for fixed-point arithmetic:
• Peak value estimation
• Word-length optimization
• Saturation arithmetic
1/26/2013
23
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-45
ASIC Design Methodologies
ASIC Design Methodology
� This approach is extremely slow, expensive
� It is only used to design very highperformancesystems
Full-custom design
� This approach is reasonable fast, less expensive
� Most ASICs are currently designedusing this method
Standard-cell based design
� This approach is fast and less expensive
� ASIC performanceare relatively slow
Gate-array based design
� The design process is very fast and cost effective
� ASIC performanceare slow
FPGA based design
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-46
Full-Custom Design Methodology
Function Partition
Schematic Design
Function And Timing verification
Pass
Fail
Including transistor sizing
Layout DesignIncluding placement & routing
Post-Layoutsimulation
Pass
Fail
Go to fabrication
ASIC Chips
� It is a time consuming manual process, not pre-developed libraries needed.
1/26/2013
24
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-47
Full-Custom Design Methodology
� Design a chip from scratch.
� Custom mask layers are created in order to fabricate a full-custom IC.
� Engineers design some or all of the logic cells, circuits, and the chip layout specifically for a full-custom IC.
� Advantages: complete flexibility, high degree of optimization in performance and area.
� Disadvantages: large amount of design effort, expensive.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-48
Standard-Cell Based Design Methodology
High-level (RTL or behavioral-level) design VHDL or Verilog coding
High-level verification VHDL or Verilog simulation
Logic synthesis Logic gate library
Gate-level verification
Placement & Routing Cell layout library
Post-Layout verification Go to fabrication
Fail
Pass
Pass
Fail
Fail Pass
� It is highly automated, but need pre-developed libraries.
1/26/2013
25
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-49
Standard-Cell Based Design Methodology
� Use pre-developed logic cells from standard-cell library asbuilding blocks.
� As full-custom design, all mask layers need to be customized tofabricate a new chip.
� Advantages: save design time and money, reduce risk compared to full-custom design.
� Disadvantages: still incurs high non-recurring-engineering(NRE) cost and long manufacture time.
A
B CD
AD
Library Cells
Chip layout
D
A B
C
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-50
Gate-Array Based Design Methodology
Generating schematic (netlist)
The netlist can be designedusing full-custom or standard-cell based design method
Placement & Routing Cell layout library
Post-Layout verification
Pre-fabricated gate array template
Make the final connections for thepre-fabricated gate array base
ASIC Chips� It contains transistors
without connections
� This approach is faster than the standard-cell based approach because part ofthe fabrication process has been complete.
1/26/2013
26
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-51
Gate-Array Based Design Methodology� Parts of the chip (transistors) are pre-fabricated, and other parts
(wires) are custom fabricated for a particular customer’s circuit.
� Advantages: cost saving (fabrication cost of a large number ofidentical template wafers is amortized over different customers), shorter manufacture lead time.
� Disadvantages: performance not as good as full-custom orstandard-cell-based ICs.
Gate Array Sea-of-Gates
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-52
FPGA Based Design Methodology
Schematic Capture
HDL coding &Logic Synthesis
netlist
ImplementationTechnology mappingPlacement & routing
FPGA celllibrary
Verification Timing verification
Pass
Fail
Generate FPGA Bit Stream
Download
FPGA
� This approach has extremely fast turn-out time since FPGA devices has been fabricated.
1/26/2013
27
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-53
Comparison of Design MethodologiesFull-custom
design
Standard-cell
based design
Gate-array
based design
FPGA-based
design
Speed +++ ++ + -
Integration density +++ ++ + --
High-volume device
cost ++ ++ + +
low-volume device
cost--- -- + +++
Custom mask layer All All Some None
Fabrication time --- -- - +++
Time to Market --- -- ++ +++
Risk reduction --- -- - +++
Future design
modification--- -- - +++
+ desirable; - not desirable
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
Why do we want FPGAs
� Fast turn-out time
� The ability of re-programming
� The capability of dynamic reconfiguration
� Advantages of using FPGAs
� FPGA Applications
� Ideal platform for prototyping
� Providing fast implementation to reduce time-to-market
� Cost effective solutions for products with small volumes on demand
� Implementing hardware systems requiring re-programming flexibility
� Implementing dynamically re-configurable systems
1/26/2013
28
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013
FPGA Market
� Total Revenue is above two billion U.S. Dollar
Source from http://www.optimagic.com/
� Market Share in 1998
� Current FPGA revenue is about 3.6B USD. � Major players include: Xilinx, Altera, Actel, Lattice, Atmel, Cypress,
QuickLogic, SiliconBlue
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-56
The State-of-Art of FPGAs
� Various types of FPGAs are available for different applications
� Currently, FPGAs are widely used in implementing communicationsystems, configurable computers, and DSP applications
� Modern FPGAs are fabricated using the most advanced technologyand are capable to implement very high performance systems
— For example, the latest Xilinx Virtex-II Pro FPGAs are fabricated using90 nm technology, containing more than one million gates. Such devicesalso include PowerPC microprocessor, on-chip memories, and 3.125Gbit/s I/O interfaces.
1/26/2013
29
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-57
Definitions: FPD, PLD
� Field Programmable Device (FPD):
— a general term that refers to any type of integrated circuit
used for implementing digital hardware, where the chip can be
configured by the end user to realize different designs.
Programming of such a device often involves placing the chip
into a special programming unit, but some chips can also be
configured “in-system”. Another name for FPDs is
programmable logic devices (PLDs).
Source: S. Brown and J. Rose, FPGA and CPLD Architectures: A Tutorial,
IEEE Design and Test of Computer, 1996
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-58
Classifications
� PLA — a Programmable Logic Array (PLA) is a relativelysmall FPD that contains two levels of logic, an AND-plane and an OR-plane, where both levels are programmable
� PAL — a Programmable Array Logic (PAL) is a relativelysmall FPD that has a programmable AND-planefollowed by a fixed OR-plane
� SPLD — refers to any type of Simple PLD, usually either aPLA or PAL
� CPLD — a more Complex PLD that consists of anarrangement of multiple SPLD-like blocks on asingle chip.
� FPGA — a Field-Programmable Gate Array is an FPDfeaturing a general structure that allows very highlogic capacity.
1/26/2013
30
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-59
PLA
Programmable AND Plane
X Y O1 O2 O3 O4
Programmable Node
Programmable OR Plane
Connect
Disconnect
X X Y Y
XY
XY XY
XYXY
Un-programmed
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-60
PLA
Programmable AND Plane Programmable OR Plane
X Y Z XY+YZ ? ?
XZ+XYZ
YZ
XZ
XYZ
XY
1/26/2013
31
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-61
PAL
Programmable AND Plane
X Y O1 O2 O3 O4
Fix OR Plane
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-62
PAL with Logic Expanders
Programmable AND Plane
Logic expanders
Fix OR Plane
?
1/26/2013
32
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-63
PLA v.s. PAL
� PLAs are more flexible than PALs since both AND & OR planes are programmable in PLAs.
� Because both AND & OR planes are programmable, PLAs are expensive to fabricate and have large propagation delay.
� By using fix OR gates, PALs are cheaper and faster than PLAs.
� Logic expanders increase the flexibilities of PALs, but result in significant propagation delay.
� PALs usually contain D flip-flops connected to the outputs of OR gatesto implement sequential circuits.
� PLAs and PALs are usually referred to as SPLD.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-64
CPLD
� A CPLD comprises multiple PAL-like blocks on a singlechip with programmable interconnect to connect the blocks.
� CPLD Architecture
PAL-likeblock
PAL-likeblock
PAL-likeblock
PAL-likeblock
I/O b
lock
I/O b
lock
I/O b
lock
I/O b
lock
Programmable interconnect
1/26/2013
33
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-65
Altera MAX CPLD
LAB
LAB
LABLAB
LAB
LAB
I/O Cell
Chip-wideinterconnect
Altera MAX chip
LAB (Logic Array Block)
LA(local array) •
• •
Macroccell
�Each LAB contains 16 macrocells
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-66
Macrocell of Altera MAX CPLD
5
114
Product termselect
Programmable inversion
D QM
Local Array
3Clock, clear, preset, enable
System clock System enable
Parallel expanderTo next macrocell
Macrocell
MAX 9000 has 33 inputs, can you explain why LA has 114 inputs?
OUT
1/26/2013
34
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-67
FPGA
� FPGA consists of an array of programmable basic logiccells surrounded by programmable interconnect.
� FPGA Structure
Logic cell
Programmable interconnect
I/O Cell
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 3-68
FPGA v.s. CPLD
� Capacitance
SPLDs CPLDs FPGAs
Equivalent gates 0 ~ 200 200 ~ 12,000 1000 ~ 1,000,000
� Applications
CPLDs FPGAs
1. Implement random glue logics or Replace circuits previously implemented by multiple SPLDs
2. Circuits that can exploit wide AND/OR gates, and do not need a very large number of flip-flops are good candidates for implementation in CPLDs.
1. FPGAs can be used in various applications: prototyping, FPGA-based computers, on-site hardware re-configuration, DSP, logic emulation, network components, etc.
1/26/2013
35
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-69
Typical FPGA Architectures
Configurable Logic Block
Configurable Interconnects
FPGA I/O cell
Configurable InterconnectsFPGA I/O cell
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-70
Examples of FPGA Architectures
http://www.xilinx.com
1/26/2013
36
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-71
www.latticesemi.com/images/img24483.gif
Examples of FPGA Architectures
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-72
Examples of FPGA Architectures
http://www.xilinx.com
1/26/2013
37
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-73
http://www.bluemelon.org/index.php/Projects/FPGA_design
� Back view of Ball Grid Array (BGA)� An FPGA with BGA package on PCB
Examples of FPGA Package
www.altera.com
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-74
Examples of FPGA Applications
� Reconfigurable computing & hardware accelerator
http://www.fastertechnology.com/
1/26/2013
38
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-75
Examples of FPGA Applications
http://www.xilinx.com/publications/prod_mktg/pn2094.pdf
� 40Gbps datapath for internet connection
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 1-76
Examples of FPGA Applications
� Logic emulation
www.applistar.com/top/DN9000K10.jpg
1/26/2013
39
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-77
Thiết kế DSP với FPGA: Overview & remind
�Motivation
� Outline
� Number Systems
— Fixed-Point Number System— Floating-Point Number System
� VLSI Architectures for DSP Circuits
� Distributed Arithmetic Circuits
� Digital Signal Processing (DSP) is one of the most active area in VLSI applications
� Traditionally, DSP algorithms are implemented either using general purpose DSP
processors (Low speed, less expensive, flexible) or using ASICs (High speed,
expensive, less flexible)
� FPGAs provide solutions that maintain both the advantages of the approach based
on DSP processors and the approach based on ASICs
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-78
Fixed-Point Number System
� Binary number representation of fixed-point numbers
bn bn-1 bn-2 ••• b0 b-1 b-2 ••• b-m
∑∑==
•+•=m
j
j
j
n
i
i
i bb11
22
� Examples:
101.0101 5.3125
000.011 0.375
1.25 1.01
Binary Decimal Decimal Binary
1.24 1.001111010•••
� If the binary number can have 8 bits for fractional part, we can use
1.00111101 (1.23828125) to approximate 1.24
� If the binary number can have 7 bits for fractional part, we can use
1.0011111 (1.2421875) to approximate 1.24
“Binary point”
1/26/2013
40
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-79
Arithmetic Operations for Fixed-Point Numbers
� Add/Subtract
bn bn-1 bn-2 ••• b0 b-1 b-2 ••• b-m
an an-1 an-2 ••• a0 a-1 a-2 ••• a-m+/-
sn sn-1 sn-2 ••• s0 s-1 s-2 ••• s-mc
B
A
S
B
A
S+/-
� The addition or subtraction of two fixed-point numbers can be performed by regular adder or subtracter if the “binary points” of the two numbers are aligned.The “binary point” remains the same position in the resulted number.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-80
Arithmetic Operations for Fixed-Point Numbers
�Multiplication
k bitsn - k bits l bitsm - l bits××××
k+l bitsn+m-k-l bits=
� Arithmetic operation with fixed word-length
� For the convenience of hardware implementation, we prefer to have the product of a multiplication keeping the same length as the multiplicand or the multiplier(assume they have the same length). To achieve this, we normally truncate the least significant bits of the product.
×××× ××××
n
n
n n
n
n
n
1/26/2013
41
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-81
Arithmetic Operations for Fixed-Point Numbers
� Normalized fixed-point numbers
� Scaling all the numbers involved in computation by a factor K such that all the numbers are within the range from 0 to 1
n bits
� Fixed-point number after normalization
� Addition/Subtraction
n bits n bits+/- = n bits
� Addition/Subtraction
n bits n bits×××× = n bitsn bits
truncated
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-82
Representation of Negative Numbers
� Signed-magnitude numbers
Normalized magnitudeS
Sign bit: 0 for positive number and 1 for negative number
� 2’s complementary numbers
Normalized 2’s complementary number S
Sign bit: 0 for positive number and 1 for negative number
1/26/2013
42
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-83
Floating-Point Numbers
� Scientific Notation
6.02 x 1023
radix (base)decimal point
� Binary Floating-Point Numbers
1.0two x 2-1
radix (base)“binary point”
Mantissa
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-84
Floating-Point Representation
• Normal format: +1.xxxxxxxxxxtwo*2yyyytwo
S Exponent Significand
� S represents Sign - (1 for negative number and 0 for positive number)
� Exponent represents yyyy - (It is a biased number, is is also called as excess-bias number. E.g. if a number A is a excess-8 coding, the real value of the number is A-8)
� Significand represents xxxxxxxxx
(-1)S * (1 + Significand) * 2(Exponent - Bias)
1/26/2013
43
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-85
Arithmetic Operations of Floating-Point Numbers
� Assume number EX
MXX 2•= EY
MYY 2•=
� Addition/Subtraction:
EEE Y
M
YX
M YXYX 2)2( •±•=±−
Where XE < YE
1. Compute YE-XE, a fixed-point subtraction
2. Right shift XM by YE-XE bits to obtain XM•2Xe-Ye
3. Compute XM•2Xe-Ye±YM, a fixed-point addition or subtraction
�Multiplication:
EE YX
MM YXYX+
••=• 2)(
1. Compute XM•YM, a fixed-point multiplication
2. Compute XE+YE, a fixed-point addition
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-86
DSP Applications
� Common DSP Functions that are implemented using VLSIs
� Filters (FIR, IIR)
� Fast Fourier Transform (FFT)
� Direct Cosine Transform (DCT)
� Encoder/decoder and error correction/detection functions
� • • • • •� FIR (Finite Impulse Response) Filter
][]1[][][ 10 knxanxanxany k −•+⋅⋅⋅⋅⋅⋅+−•+•=
1. Y[n] is the output at nth clock cycle; X[n] is the input at nth clock cycle2. a0, a1, e.. ak-1 are filter coefficients
� IIR (Infinite Impulse Response) Filter
][]1[][][][ 10 mnybnybknxanxany mk −•+⋅⋅⋅+−•+−•+⋅⋅⋅+•=
1/26/2013
44
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-87
FIR Filter Implementation
� Example:
]3[]2[]1[][][ 3210 −•+−•+−•+•= nxanxanxanxany
Tap � This is a 4-tap FIR filter
� Canonic form implementation:
D D D
××××
+
×××× ×××× ××××
+ +
x[n]
y[n]
a0 a1 a2 a3
Clock frequency addermult
clktt
f•+
≤3
1
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-88
FIR Filter Implementation
� Pipelined implementation 1:
Clock frequency addermult
clktt
f+
≤1
D
××××
+
×××× ×××× ××××
+ +
x[n]
y[n-3]
a0 a1 a2 a3
D
D D
D D
D
D D
1/26/2013
45
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-89
FIR Filter Implementation
� Pipelined implementation 2:
Clock frequency mult
clkt
f1
≤
D
××××
+
×××× ×××× ××××
+ +
x[n]
y[n-3]
a0 a1 a2 a3
D D
D D
D
D D
D D D
(assume tmult > tadd)
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-90
FIR Filter Implementation
� Pipelined implementation 3 (inverted form):
××××
+
×××× ×××× ××××
+ +
x[n]
y[n]
a3 a2 a1 a0
D D D
Clock frequency addermult
clktt
f+
≤1
1/26/2013
46
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-91
××××
+
×××× ×××× ××××
+ +
x[n]
y[n-1]
a3 a2 a1 a0
D D D
D D D D
FIR Filter Implementation
� Pipelined implementation 4:
Clock frequency mult
clkt
f1
≤
(assume tmult > tadd)
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-92
FIR Filter Implementation
� Pipelined implementation 5:
D D D
××××
+
×××× ×××× ××××
x[n]
y[n-2]
a0 a1 a2 a3
D D D D
+
+
D Dmult
clkt
f1
≤
(assume tmult > tadd)
� Difficult to layout
1/26/2013
47
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-93
FIR Filter Implementation
� Parallel implementation 1:
x[n+1]
x[n]
×××× ××××
+
×××× ××××
+D + y[n+1]
×××× ××××
+
××××
+D
××××
+ y[n]
x[n+3]
x[n+2]
a2 a3 a0 a1
a2 a3 a0 a1
Dx[n-1]
addermult
clktt
f+
≤1
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-94
FIR Filter Implementation
� Parallel implementation 2:
x[n+1]
x[n]
×××× ××××
+
×××× ××××
+D + y[n-1]
×××× ××××
+
××××
+D
××××
+ y[n-2]
x[n+3]
x[n+2]
a2 a3 a0 a1
a2 a3 a0 a1
Dx[n-1]
mult
clkt
f1
≤
D D D D
D D D D
If tmult > tadder
1/26/2013
48
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-95
FIR Filter Implementation
� Parallel implementation 3:
D
D
D
x[n+1]
x[n]
×××× ××××
D D
+
×××× ××××
D D
+D
D
D D
+ y[n-5]
××××
D
××××
D
+
D
××××
D
+D
××××
D
+
D
D y[n-6]
x[n+3]
x[n+2]
a0 a1 a2 a3
a0 a1 a2 a3
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-96
FIR Filter Implementation
� Serial implementation:
+
××××a1
D
D
a0
a2
a3
x[n]x[n-1] x[n-2]
x[n-3]
y[n]
Multiplier accumulator (MAC)
1/26/2013
49
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-97
FIR Filter Implementation
� Implementation of FIR filters with large number of taps— Examples: implementation of a 16-tap FIR filter
××××
+
D
××××
D
D
+
××××
+
D
××××
D
D
D + D
Xk-12 a12
Xk-8 a8
Xk-4 a4
Xk a0
Xk-13 a13
Xk-9 a9
Xk-5 a5
Xk-1 a1
Xk-14 a14
Xk-10 a10
Xk-6 a6
Xk-2 a2
Xk-15 a15
Xk-11 a11
Xk-7 a7
Xk-3 a3
∑=
−•=15
0
][][i
i ikxaky
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-98
IIR Filter Implementation
� Example:
]2[]1[]1[][][ 2110 −•+−•+−•+•= nxanyanxbnxbny
� Direct Implementation:
××××
D
××××
+ +
DD
××××
+
××××a1a2b0 b1
x[n]
y[n]
Clock frequency addermult
clktt
f•+
≤3
1
1/26/2013
50
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-99
IIR Filter Implementation
� Pipelined Implementation 1:
××××
D
××××
+ +
××××
+
××××a1a2b0 b1
x[n]
y[n-3]
DD
D
D
D
D
Clock frequency mult
clkt
f1
≤
(assume tmult > tadd)
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-100
� Pipelined Implementation 2:
IIR Filter Implementation
+ D
+ ××××
D
××××
××××
××××
D
+
a1
a2
b0
b1
x[n] y[n-1]
addermult
clktt
f•+
≤2
1
1/26/2013
51
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-101
LUT-Based Multiplier
� In many DSP circuits, multipliers always have one constant input.
××××x[n] y[n]
Ci (constant)
� For the above multiplier, y[n] purely depends on x[n]. Thus,
a look-up table (LUT) can be used to implement the multiplier
• •
•X[n]
address
y[n]
� For example, a 256×16 bit memorycan be used to implement a 8-bitmultiplier if one of its input is always constant.
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-102
Distributed Arithmetic
� Multiplication by using shift-and-add technique
1/26/2013
52
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-103
Distributed Arithmetic
� Calculate A•Y0 + B•Y1 + C•Y2 + D•Y3
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-104
Distributed Arithmetic
� Serial Distributed Arithmetic for Computing A•Y0 + B•Y1 + C•Y2 + D•Y3
1/26/2013
53
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-105
Distributed Arithmetic
� LUT-Based SDA for Computing A•Y0 + B•Y1 + C•Y2 + D•Y3
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-106
Distributed Arithmetic
� LUT Technique for Distributed Arithmetic
1/26/2013
54
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-107
Distributed Arithmetic
� SDA 16-MAC Circuit
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-108
Distributed Arithmetic
� SDA 16-Tap FIR Filter
1/26/2013
55
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 13-109
Parallel Distributed Arithmetic
Hoàng TrangBM Điện Tử-DSP-FPGA-chapter1 01/2013 110
END chapter 1