View
1
Download
0
Category
Preview:
Citation preview
ΗΜΥ 408 ΨΗΦΙΑΚΟΣ ΣΧΕΔΙΑΣΜΟΣ ΜΕ FPGAs
Χειμερινό Εξάμηνο 2018
ΔΙΑΛΕΞΕΙΣ 6 - 7: Design Flow
ΧΑΡΗΣ ΘΕΟΧΑΡΙΔΗΣ
(ttheocharides@ucy.ac.cy) Some slides adopted from Digital Integrated Circuits, Rabbey et. al.
ΗΜΥ408 Δ06-7 Design Flow.2 © Θεοχαρίδης, ΗΜΥ, 2018
Design Process Steps (Review)
Definition of system requirements. Example: ISA (instruction set architecture) for CPU. Includes software and hardware interfaces including
timing. May also include cost, speed, reliability and
maintainability specifications.
Definition of system architecture. Example: high-level HDL (hardware description
language) representation - this is not required in ECE 408 specifically but is done in the real world).
Useful for system validation and verification and as a basis for lower level design execution and validation or verification.
ΗΜΥ408 Δ06-7 Design Flow.3 © Θεοχαρίδης, ΗΜΥ, 2018
Design Process Steps (Review)
Refinement of system architecture In manual design, descent in hierarchy, designing
increasingly lower-level components In synthesized design, transformation of high-level HDL to
“synthesizable” register transfer level (RTL) HDL
Logic design or synthesis In manual or synthesized design, development of logic
design in terms of library components Result is logic level schematic or netlist representation or
combinations of both. Both manual design or synthesis typically involve
optimization of cost, area, or delay.
ΗΜΥ408 Δ06-7 Design Flow.4 © Θεοχαρίδης, ΗΜΥ, 2018
Design Process Steps (Review)
Implementation Conversion of the logic design to physical implementation Involves the processes of:
Mapping of logic to physical elements, Placing of resulting physical elements, And routing of interconnections between the elements.
In case of SRAM-based FPGAs, represented by the programming bitstream which generates the physical implementation in the form of CLBs, IOBs and the interconnections between them
ΗΜΥ408 Δ06-7 Design Flow.5 © Θεοχαρίδης, ΗΜΥ, 2018
Design Process Steps (Review)
Validation (used at number of steps in the process) At architecture level - functional simulation of HDL At RTL level- functional simulation of RTL HDL At logic design or synthesis - functional simulation of gate-
level circuit - not usually done in ECE 408/664 At implementation - timing simulation of schematic, netlist or
HDL with implemention based timing information (functional simulation can also be useful here)
At programmed FPGA level - in-circuit test of function and timing
ΗΜΥ408 Δ06-7 Design Flow.6 © Θεοχαρίδης, ΗΜΥ, 2018
Hardware design in general
Logic (RTL) design Logic simulation Logic debugging
RTL code (Verilog)
Placement & routing
Timing simulation
Timing analysis
Netlist & Gate delay
(SDF) GDSII
Semiconductor fabrication GDSII & Test-vector
Logic synthesis
Gate-level simulation
Gate-level debugging
Netlist (EDIF)
RTL code & Target library
Logic synthesis
Placement & routing FPGA
bit-stream RTL code &
FPGA library
FPGA BOARD
FPGA Debugging
Logic synthesis RTL compilation
Placement & routing
H/W Platform
General Hardware Design Flow / Methodologies.
ΗΜΥ408 Δ06-7 Design Flow.7 © Θεοχαρίδης, ΗΜΥ, 2018
Xilinx HDL/Core Design Flow
DESIGN ENTRY
CORE GENERATION RTL HDL EDITING
RTL HDL-CORE SIMULATION
SYNTHESIS
IMPLEMENTATION
TIMING SIMULATION
FPGA PROGRAMMING & IN-CIRCUIT TEST
ΗΜΥ408 Δ06-7 Design Flow.8 © Θεοχαρίδης, ΗΜΥ, 2018
Xilinx HDL/Core Design Flow - HDL Editing
Language Construct Templates
HDL EDITOR
DESIGN WIZARD LANGUAGE ASSISTANT Accessed within HDL Editor
RTL HDL Files
HDL Module Frameworks
ΗΜΥ408 Δ06-7 Design Flow.9 © Θεοχαρίδης, ΗΜΥ, 2018
Xilinx HDL/core Design Flow – Core Generation
CORE GENERATOR
Select core and specify input parameters
HDL instantiation module for core_name
EDIF netlist for core_name
Other core_name files
ΗΜΥ408 Δ06-7 Design Flow.10 © Θεοχαρίδης, ΗΜΥ, 2018
Xilinx HDL/core Design Flow - HDL Functional Simulation
Compile HDL Files
Waveforms or List Files
Set Up and Map work library RTL HDL Files
Test Inputs or Force Files
HDL instantiation module for core_name
EDIF netlists for core_names
Functional Simulate
Testbench HDL Files
HDLSIMULATOR
ΗΜΥ408 Δ06-7 Design Flow.11 © Θεοχαρίδης, ΗΜΥ, 2018
All HDL Files
Gate/Primitive Netlist Files (EDIF or XNF)
Xilinx HDL Design Flow - Synthesis
Select Top Level
Select Target Device
Edit XST Synthesis Constraints
Synthesize
Synthesis/Implement-ation Constraints
Synthesis Report Files
EDIF netlists for core_names
XST
ΗΜΥ408 Δ06-7 Design Flow.12 © Θεοχαρίδης, ΗΜΥ, 2018
Model Extraction
Xilinx HDL/core Design Flow - Implementation
Netlist Translation
Map
Place & Route
BIT File
Create Bitstream
Timing Model Gen
Gate/Primitive Netlist Files (XNF or EDN)
Standard Delay Format File
HDL or EDIF for Implemented Design
XILINX DESIGN MANAGER
ΗΜΥ408 Δ06-7 Design Flow.13 © Θεοχαρίδης, ΗΜΥ, 2018
Xilinx HDL/core Design Flow- Timing Simulation
Test Inputs, Force Files
MODELSIM
Compile HDL Files
Waveforms or List Files
Set Up and Map work Directory
Compiled HDL
HDL Simulate
Standard Delay Format File HDL or EDIF for Implemented Design
Testbench HDL Files
ΗΜΥ408 Δ06-7 Design Flow.14 © Θεοχαρίδης, ΗΜΥ, 2018
Xilinx HDL Design Flow - Programming and In-circuit Verification
Bit File
FPGA Board
iMPACT
I/O Port
Input Byte
Human Inputs
Outputs
ΗΜΥ408 Δ06-7 Design Flow.15 © Θεοχαρίδης, ΗΜΥ, 2018
A Few Notes on Programming: Start up Sequence
° During an FPGA start-up, the device performs four operations:
1. The assertion of DONE signal. The failure of DONE to go High may indicate the unsuccessful loading of configuration data.
2. The release of the Global Three State (GTS) signal. This activates all the I/Os.
3. The release of the Global Set Reset (GSR) signal. This allows all flip-flops to change state.
4. The assertion of Global Write Enable (GWE) signal. This allows all RAMs and flip-flops to change state.
° By default, these operations are synchronized to the CCLK signal.
° The entire start-up sequence lasts eight cycles, called C0-C7, after which the loaded design is fully functional.
ΗΜΥ408 Δ06-7 Design Flow.16 © Θεοχαρίδης, ΗΜΥ, 2018
Serial Load Configuration
° There are two serial configuration modes. ° Master Serial mode
• the FPGA controls the configuration process by driving CCLK as an output.
° Slave Serial mode • the FPGA passively receives CCLK as an input from an external
agent (e.g., a microprocessor, CPLD, or second FPGA in master mode) that is controlling the configuration process.
° In both modes, the FPGA is configured by loading one bit per CCLK cycle.
° The MSB of each configuration data byte is always written to the DIN pin first.
ΗΜΥ408 Δ06-7 Design Flow.17 © Θεοχαρίδης, ΗΜΥ, 2018
ASIC Design Flow
°ASIC • Application Specific Integrated Circuits • Custom design, usually from scratch or from pre-built components
• Chip performs a particular function • Typically NOT general purpose
°Front End Back End • Front End – Synthesis / Gate Level • Back End – Layout / Mask Generation
ΗΜΥ408 Δ06-7 Design Flow.18 © Θεοχαρίδης, ΗΜΥ, 2018
ASIC Design Flow – Typical flow
° ASIC Design Flow Steps
• Specifications • Early Planning • Architecture • Design • Synthesis • Pre-Layout Static Timing Analysis • Layout • Post-Layout Static Timing Analysis • Pads Placement • Sent for Manufacturing
VERIFICATION
ΗΜΥ408 Δ06-7 Design Flow.19 © Θεοχαρίδης, ΗΜΥ, 2018
Design Flow – Commercial (Example Tools)
Synopsys Design Compiler
Modelsim
Prime Time
Cadence Silicon Ensemble
Silicon Ensemble/Virtuoso
HDL Model
Verilog Gate Level / Netlist
Verilog Gate Level / Netlist
DEF File
DEF File
DEF File
VHDL / Verilog
Verilog Simulation
Static Timing Analysis
Standard Cell Placement and Routing
Post-Layout Static Timing Analysis
Pads Placement
Prime Time
RTL (Register Transfer Language)
ΗΜΥ408 Δ06-7 Design Flow.20 © Θεοχαρίδης, ΗΜΥ, 2018
Other Commercial Tools
° RTL Verification with Specman e ° Gate-level simulation with ModelSim ° Logic Synthesis with Synopsys Design Compiler ° Static Timing Analysis with Synopsys PrimeTime ° Placement and Routing with Cadence Silicon
Ensemble ° Running Silicon Ensemble in the GUI mode ° Clock Tree Generation with Cadence CTGen ° Integrating IP Block, DesignWare and Virage SRAM ° Power Estimation with Synopsys Power Compiler ° Code Revision Control with CVS
ΗΜΥ408 Δ06-7 Design Flow.21 © Θεοχαρίδης, ΗΜΥ, 2018
° Must understand specifications first ° Start by looking it as black box
° e.g. Adder • F(X,Y) = X+Y • Takes two inputs, produces Sum of Inputs
Starting A Design
X Y F(X,Y)
ΗΜΥ408 Δ06-7 Design Flow.22 © Θεοχαρίδης, ΗΜΥ, 2018
Starting A Design
° SPECS Architecture • Block Diagram • Brainstorming (if collaborating) • Feedback • I/O Specs • Architectural Decisions
- Frequency? - Latency? - Power/Performance? - Reliability?
• Architectural Optimizations • Finalizing the initial Design
ΗΜΥ408 Δ06-7 Design Flow.23 © Θεοχαρίδης, ΗΜΥ, 2018
From “Architecture” to RTL
° Create Block Diagram of Design - with sub-blocks if necessary
° Create I/O Specs for each block • e.g adder
- Sum generator – Takes three inputs, produces one output
- Carry generator – Takes three inputs, produces one output
- Interconnected? • Place box in functional order
- i.e. can’t generate sum after carry-in arrives!!! • Create pipeline flow
- i.e. IF IDIXICWB • Clocked signals/registers/latches
° Proceed then to code module by module
ΗΜΥ408 Δ06-7 Design Flow.24 © Θεοχαρίδης, ΗΜΥ, 2018
Hierarchical Design
•Multiple modules •Multiple instances
•Top-Level Design •Contains all sub-modules and connection information
•Sub-Modules can be hierarchically built themselves
ΗΜΥ408 Δ06-7 Design Flow.25 © Θεοχαρίδης, ΗΜΥ, 2018
HDL
° Hardware Description Language • Verilog, VHDL, SystemC, etc.
° High Level of Design Abstraction • ex:
- Input A, B - Output C - Architecture entity of adder is
C A + B
° Not going to talk in depth about HDL • Refer to multiple online resources
- www.deeps.org • Behavioral vs. Structural • Code Simulate or Code Synthesize (Compile) Simulate
ΗΜΥ408 Δ06-7 Design Flow.26 © Θεοχαρίδης, ΗΜΥ, 2018
HDL - Tools
° Code programming • Just a text editor! • Today, fancy text editors with syntax highlighting are available
for free (emacs, nedit, etc.)
° Simulation • Multiple free HDL Simulators for simple designs • State-of-the-art Simulators available at CSE
- Modelsim - NCVHDL - NCVerilog
• Not necessary synthesized code
° Synthesis (Compilation) • Neet a target library of “standard” cells (i.e. AND, XOR, ADDER,
etc.) • Synopsys Design Compiler
ΗΜΥ408 Δ06-7 Design Flow.27 © Θεοχαρίδης, ΗΜΥ, 2018
HDL Simulation / Verification
° Upon coding each block / module, we can then simulate its functionality
° Use an HDL / RTL Simulator • Event Driven • Cycle Driven
° Simulator reads code and models code functionality based on clock cycles or events, e.g.
• CA+B @posedge clk • CA+B after 10 ns
° Tools Available: • Modelsim, NCVHDL, NCVerilog, etc.
ΗΜΥ408 Δ06-7 Design Flow.28 © Θεοχαρίδης, ΗΜΥ, 2018
HDL Synthesis H/W
Timing Analysis
Routing
Placement
Synthesis
ΗΜΥ408 Δ06-7 Design Flow.29 © Θεοχαρίδης, ΗΜΥ, 2018
Why learning about Logic Synthesis?
° Logic synthesis is the core of today's CAD flows for IC and system design
• course covers many algorithms that are used in a broad range of CAD tools
• basis for other optimization techniques, e.g. embedded software • basis for functional verification techniques
° Most algorithms are computationally hard
• covered algorithms and flows are good example for approaching hard algorithmic problems
• course covers theory as well as implementation details • demonstrates an engineering approaches based on theoretical
solid but also practical solutions - very few research areas can offer this combination
ΗΜΥ408 Δ06-7 Design Flow.30 © Θεοχαρίδης, ΗΜΥ, 2018
Design of Integrated Systems
System Level
Register Transfer Level
Gate Level
Transistor Level
Layout Level
Mask Level
ΗΜΥ408 Δ06-7 Design Flow.31 © Θεοχαρίδης, ΗΜΥ, 2018
System Level
° Abstract algorithmic description of high-level behavior
• e.g. C-Programming language
• abstract because it does not contain any implementation details for timing or data
• efficient to get a compact execution model as first design draft • difficult to maintain throughout project because no link to
implementation
Port* compute_optimal_route_for_packet(Packet_t *packet, Channel_t *channel) { static Queue_t *packet_queue; packet_queue = add_packet(packet_queue, packet); ... }
ΗΜΥ408 Δ06-7 Design Flow.32 © Θεοχαρίδης, ΗΜΥ, 2018
RTL Level
° Cycle accurate model “close” to the hardware implementation
• bit-vector data types and operations as abstraction from bit-level implementation
• sequential constructs (e.g. if - then - else, while loops) to support modeling of complex control flow
module mark1; reg [31:0] m[0:8192]; reg [12:0] pc; reg [31:0] acc; reg[15:0] ir; always begin ir = m[pc]; if(ir[15:13] == 3b’000) pc = m[ir[12:0]]; else if (ir[15:13] == 3’b010) acc = -m[ir[12:0]]; ... end endmodule
ΗΜΥ408 Δ06-7 Design Flow.33 © Θεοχαρίδης, ΗΜΥ, 2018
Gate Level
° Model on finite-state machine level • models function in Boolean logic using registers and gates • various delay models for gates and wires
• in this lecture we will mostly deal with gate level
1ns
4ns 3ns
5ns
ΗΜΥ408 Δ06-7 Design Flow.34 © Θεοχαρίδης, ΗΜΥ, 2018
Transistor Level
° Model on CMOS transistor level • depending on application function modeled as resistive
switches - used in functional equivalence checking
• or full differential equations for circuit simulation - used in detailed timing analysis
ΗΜΥ408 Δ06-7 Design Flow.35 © Θεοχαρίδης, ΗΜΥ, 2018
Layout Level
° Transistors and wires are laid out as polygons in different technology layers such as diffusion, poly-silicon, metal, etc.
ΗΜΥ408 Δ06-7 Design Flow.36 © Θεοχαρίδης, ΗΜΥ, 2018
Design of Integrated Systems R
elat
ive
Effo
rt
Project Time
System
RTL
Logic
- Design phases overlap to large degrees - Parallel changes on multiple levels, multiple teams - Tight scheduling constraints for product
Transistor
ΗΜΥ408 Δ06-7 Design Flow.37 © Θεοχαρίδης, ΗΜΥ, 2018
Design Challenges
° Systems are becoming huge, design schedules are getting tighter
• > 100 Mio gates becoming common for ASICs • > 0.4 Mio lines of C-code to describe system behavior • > 5 Mio lines of RLT code
° Design teams are getting very large for big projects • several hundred people • differences in skills • concurrent work on multiple levels • management of design complexity and communication very difficult
° Design tools are becoming more complex but still inadequate • typical designer has to run ~50 tools on each component • tools have lots of bugs, interfaces do not line up etc.
ΗΜΥ408 Δ06-7 Design Flow.38 © Θεοχαρίδης, ΗΜΥ, 2018
Design Challenges
° Decision about design point very difficult • compromise between performance / costs / time-to-market • decision has to be made 2-3 years before design finished • design points are difficult to predict without actually doing the
design • scheduling of product cycles
° Functional verification • simulation still main vehicle for functional verification but
inadequate because of size of design space • results in bugs in released hardware that is very expensive to
recover from (different in software ;-)
ΗΜΥ408 Δ06-7 Design Flow.39 © Θεοχαρίδης, ΗΜΥ, 2018
Design Challenges
° Fundamental tradeoffs between different modeling levels:
• modeling detail and team size to maintain model - high-level models can be maintained by one or two people - detailed models need to be partitioned which results in a
significant communication overhead • modeling accuracy versus modeling compactness
- compact models omit details and give only crude estimations for implementation
- detailed models are lengthy and difficult to adopt for major changes in design points
• simulation speed versus hardware performance - high-level models can be simulated fast but cannot be
implemented efficiently with automatic means - low-level models can be made to have a fast
implementation but cannot be simulated very fast
ΗΜΥ408 Δ06-7 Design Flow.40 © Θεοχαρίδης, ΗΜΥ, 2018
General Design Approach
° How do engineers build a bridge?
° Divide and conquer !!!! • partition design problem into many sub-problems which are
manageable • define mathematical model for sub-problem and find an
algorithmic solution - beware of model limitations and check them !!!!!!!
• implement algorithm in individual design tools, define and implement general interfaces between the tools
• implement checking tools for boundary conditions • concatenate design tools to general design flows which can be
managed • see what doesn’t work and start over
ΗΜΥ408 Δ06-7 Design Flow.41 © Θεοχαρίδης, ΗΜΥ, 2018
Design Automation
° Design Automation is one of the most advanced areas in practical computer science
• many problems require sophisticated mathematical modeling • many algorithms are computationally hard and require advanced and
fine-tuned heuristics to work on realistic problem sizes • boundary conditions need to be well declared and synchronized
between different tools (patchwork to cover all wholes)
° Two common pitfalls in CAD research • problem is looking for a solution:
- problem scope is too big, makes modeling difficult or algorithms don’t scale
- problem scope is too small, solutions are not good enough • solution is looking for a problem:
- model was oversimplified because real problem was too complex with too many boundary conditions
ΗΜΥ408 Δ06-7 Design Flow.42 © Θεοχαρίδης, ΗΜΥ, 2018
Key to Success
° Fine-tuned combination of Design Methodology and Tools
• addresses algorithmic complexity by requiring - manual partitioning of the problem - manual input of hints/suggestions - manual iterations to drive tool application to best solution
• makes CAD systems and design flows very complex and difficult to manage
Problem space Tools applicable
Practical combination through design methodology
ΗΜΥ408 Δ06-7 Design Flow.43 © Θεοχαρίδης, ΗΜΥ, 2018
Examples of Divide and Conquer
° RLT cycle simulation does only evaluate the next state logic of the circuits, timing is assumed to be correct
• combination of static timing analysis, formal equivalence checking, and cycle simulation allows separation of issues
• cycle simulation avoids expensive event scheduling and processing and performs significantly faster
° However: • timing analysis is conservative with respect to the achievable
clock cycle time
ΗΜΥ408 Δ06-7 Design Flow.44 © Θεοχαρίδης, ΗΜΥ, 2018
Examples of Divide and Conquer
° Static timing analysis assumed simple gate delay models
• complexity of static timing analysis becomes linear (simple longest and shortest paths analysis in circuit implementation)
• very efficient implementation of incremental static timing analysis which is needed in the inner loop of the technology dependent part of logic synthesis
° However: • actual gate delay varies a lot in reality
- models often assume average fan-out rather than actual gate load
• delay model assumes ideal signals - slew dependency ignored
ΗΜΥ408 Δ06-7 Design Flow.45 © Θεοχαρίδης, ΗΜΥ, 2018
Examples of Divide and Conquer
° Logic synthesis assumes ideal gates which are independent of physical environment
• standard cell place and route technology has made logic synthesis possible
- gates are heavily over-designed to be functional in a wide variety of combinations (e.g. range of fan-out gates possible, different wire loads
- layout placement and route done in standard rows that minimize latch-up effects and optimize power and clock wiring
° However: - layout implementation remains sub-optimal because cells
are designed for worst case application and with large safety margins with respect to environment
ΗΜΥ408 Δ06-7 Design Flow.46 © Θεοχαρίδης, ΗΜΥ, 2018
Examples of Divide and Conquer
° Logic synthesis uses crude model to estimate circuit area
- literal count or simple table-lookup for gates sizes allows fast comparison of different implementation choices
° However: - actual gate size can vary to a very large degree depending
on load and timing requirement - area for wiring completely ignored
ΗΜΥ408 Δ06-7 Design Flow.47 © Θεοχαρίδης, ΗΜΥ, 2018
Examples of Divide and Conquer
° Formal equivalence checking assumes identical state encoding of the two designs to be compared
• reduces the general equivalence checking problem to combinational equivalence checking which is computationally less complex
• exploitation of structural similarities between designs to be compared makes tools applicable for huge (multi-million gate) designs
• automatic algorithms for identifying register correspondence compensate to some extent for limited model
° However: • combinational verification model cannot handle sequential
verification problems
ΗΜΥ408 Δ06-7 Design Flow.48 © Θεοχαρίδης, ΗΜΥ, 2018
Full Custom Design Flow
° Application: ultra-high performance designs • general-purpose processors, DSPs, graphic chips, internet
routers, games processors etc.
° Target: very large markets with high profit margins • e.g. PC business
° Complexity: very complex and labor intense • involving large teams • high up-front investments and relatively high risks
° Role of Logic Synthesis: • limited to components that are not performance critical or that
might change late in design cycle (due to designs bugs found late)
- control logic - non-critical data paths logic
• bulk of data-path components and fast control logic are manually crafted for optimal performance
ΗΜΥ408 Δ06-7 Design Flow.49 © Θεοχαρίδης, ΗΜΥ, 2018
Full Custom Design Flow
ISA Specification
RTL Spec
Gate Level Netlist
Transistor Level Circuit
Layout
Circuit Simulation
Simulation
Design Rule Checker
Formal Equivalence
Checking
Simulation
Logic Synthesis
Manual or semi-automatic
Design
Extract&Compare
° Incomplete picture:
ΗΜΥ408 Δ06-7 Design Flow.50 © Θεοχαρίδης, ΗΜΥ, 2018
ASIC Design Flow
° Application: general IC market • peripheral chips in PCs, toys, handheld devices etc.
° Target: small to medium markets, tight design schedules
• e.g. consumer electronics
° Complexity of design: standard design style, quite predictable
• standard flows, standard off-the-shelf tools
° Role of Logic Synthesis: • used on large fraction of design except for special blocks such
as RAM’s, ROM’s, analog components
ΗΜΥ408 Δ06-7 Design Flow.51 © Θεοχαρίδης, ΗΜΥ, 2018
ASIC Design Flow
Informal Specification
RTL Spec
Gate Level Netlist
Modifies Gate Level Netlist Static Timing Analysis
Formal Equivalence
Checking
Simulation
Logic Synthesis
Manual Changes to fix timing
° Incomplete picture:
ASIC Foundry Test Logic Insertion
ΗΜΥ408 Δ06-7 Design Flow.52 © Θεοχαρίδης, ΗΜΥ, 2018
What is Logic Synthesis?
D
X Y λδ
Given: Finite-State Machine F(X,Y,Z, , ) where: λ δX: Input alphabet Y: Output alphabet Z: Set of internal states : X x Z Z (next state function) : X x Z Y (output function) λδ
Target: Circuit C(G, W) where: G: set of circuit components g {Boolean gates, flip-flops, etc} W: set of wires connecting G
∈
ΗΜΥ408 Δ06-7 Design Flow.53 © Θεοχαρίδης, ΗΜΥ, 2018
Objective Function for Synthesis
° Minimize area • in terms of literal count, cell count, register count, etc.
° Minimize power • in terms of switching activity in individual gates, deactivated
circuit blocks, etc.
° Maximize performance • in terms of maximal clock frequency of synchronous systems,
throughput for asynchronous systems
° Any combination of the above • combined with different weights • formulated as a constraint problem
- “minimize area for a clock speed > 300MHz”
° More global objectives • feedback from layout
- actual physical sizes, delays, placement and routing
ΗΜΥ408 Δ06-7 Design Flow.54 © Θεοχαρίδης, ΗΜΥ, 2018
Constraints on Synthesis
° Given implementation style: • two-level implementation (PLA, CAMs) • multi-level logic • FPGAs
° Given performance requirements • minimal clock speed requirement • minimal latency, throughput
° Given cell library • set of cells in standard cell library • fan-out constraints (maximum number of gates connected to
another gate) • cell generators
ΗΜΥ408 Δ06-7 Design Flow.55 © Θεοχαρίδης, ΗΜΥ, 2018
Why learn HDL coding styles for FPGAs? ° HDLs contain many complex constructs that are
difficult to understand at first. ° Methods and examples included in HDL manuals do
not always apply to the design of FPGA devices. ° If you currently use HDLs to design ASICs, your
established coding style may unnecessarily increase the number of gates or CLB levels in FPGA designs
° HDL synthesis tools implement logic based on the coding style of your design.
ΗΜΥ408 Δ06-7 Design Flow.56 © Θεοχαρίδης, ΗΜΥ, 2018
Naming Convention - Restrictions ° The following FPGA resource names are reserved
and should not be used to name nets or components. • Components (Comps), Configurable Logic Blocks (CLBs),
Input/Output Blocks (IOBs), Slices, basic elements (bels), clock buffers (BUFGs), tristate buffers (BUFTs), oscillators (OSC), CCLK, DP, GND, VCC, and RST
• CLB names such as AA, AB, SLICE_R1C2, SLICE_X1Y2, X1Y2, and R1C2
• Primitive names such as TD0, BSCAN, M0, M1, M2, or STARTUP • Do not use pin names such as P1 and A4 for component names • Do not use pad names such as PAD1 for component names
ΗΜΥ408 Δ06-7 Design Flow.57 © Θεοχαρίδης, ΗΜΥ, 2018
Use optional labels on flow control constructs
° Make the code structure more obvious ° Can slow execution in some simulators
/* Changing Latch into a D-Register * D_REGISTER.V */ module d_register (CLK, DATA, Q);
input CLK; input DATA; output Q; reg Q; always @ (posedge CLK) begin: My_D_Reg Q <= DATA; end
endmodule
ΗΜΥ408 Δ06-7 Design Flow.58 © Θεοχαρίδης, ΗΜΥ, 2018
Coding for Synthesis
° Omit the Wait for XX ns Statement • XX specifies the number of nanoseconds that must pass before a
condition is executed. • VHDL: wait for XX ns; • Verilog: #XX;
° Omit the ...After XX ns or Delay Statement • VHDL
(Q <=0 after XX ns) • Verilog assign #XX Q=0; • This statement is usually ignored by the synthesis tool. In this
case, the functionality of the simulated design does not match the functionality of the synthesized design.
ΗΜΥ408 Δ06-7 Design Flow.59 © Θεοχαρίδης, ΗΜΥ, 2018
Coding for Synthesis
° Omit Initial Values • VHDL signal sum : integer := 0; • Verilog initial sum = 1’b0;
° Order and Group Arithmetic Functions • ADD = A1 + A2 + A3 + A4; cascades three adders in series. • ADD = (A1 + A2) + (A3 + A4); two additions are evaluated in parallel and the results are
combined with a third adder. • RTL simulation results are the same for both statements, • however, the second statement results in a faster circuit after
synthesis (depending on the bit width of the input signals). • When is second construct preferred ?
ΗΜΥ408 Δ06-7 Design Flow.60 © Θεοχαρίδης, ΗΜΥ, 2018
Coding for Synthesis
For example, if the A4 signal reaches the adder later than the other signals, the first statement produces a faster implementation because the cascaded structure creates fewer logic levels for A4.
This structure allows A4 to catch up to the other signals. In this case, A1 is the fastest signal followed by A2 and A3; A4 is the slowest signal.
° Most synthesis tools can balance or restructure the arithmetic operator tree if timing constraints require it.
° However, Xilinx® recommends that you code your design for your selected structure.
ΗΜΥ408 Δ06-7 Design Flow.61 © Θεοχαρίδης, ΗΜΥ, 2018
Comparing If Statement vs.Case Statement ° If statement generally produces priority-encoded logic ° Case statement generally creates balanced logic. ° Use the Case statement for complex decoding and use the If
statement for speed critical paths. ° Make sure that all outputs are defined in all branches of an if
statement. • If not, it can create latches or long equations on the CE signal. • Have default values for all outputs before the if statements.
° Limiting the number of input signals into an if statement can reduce the number of logic levels.
° If there are a large number of input signals, see if some of them can be pre-decoded and registered before the if statement.
° Avoid bringing the dataflow into a complex if statement. ° Only control signals should be generated in complex if-else
statements.
ΗΜΥ408 Δ06-7 Design Flow.64 © Θεοχαρίδης, ΗΜΥ, 2018
Implementation
ΗΜΥ408 Δ06-7 Design Flow.65 © Θεοχαρίδης, ΗΜΥ, 2018
Case vs IF
° Case implementation requires only one Virtex™ slice while the If construct requires two slices in some synthesis tools.
° In this case, design the multiplexer using the Case construct because fewer resources are used and the delay path is shorter.
ΗΜΥ408 Δ06-7 Design Flow.66 © Θεοχαρίδης, ΗΜΥ, 2018
Example – From XCELL
° Verilog designs that use the CASE construct with the NESTED IF to more effectively describe the same function.
° The CASE construct reduces the delay by approximately 3 ns (using an XC4005E-2 part)
Source:http://www.xilinx.com/xcell/xl30/xl30_21.pdf
ΗΜΥ408 Δ06-7 Design Flow.67 © Θεοχαρίδης, ΗΜΥ, 2018
From IF construct
ΗΜΥ408 Δ06-7 Design Flow.68 © Θεοχαρίδης, ΗΜΥ, 2018
From Case construct
ΗΜΥ408 Δ06-7 Design Flow.69 © Θεοχαρίδης, ΗΜΥ, 2018
Implementing Latches and Registers
° Synthesizers infer latches from incomplete conditional expressions, such as an If statement without an Else clause.
° This can be problematic for FPGA designs because not all FPGA devices have latches available in the CLBs.
° In addition, you may think that a register is created, and the synthesis tool actually created a latch.
° The Spartan-II™, Spartan-3™ and Virtex™, Virtex-E™, Virtex-II™, Virtex-II Pro™ and Virtex-II Pro X™ FPGA devices do have registers that can be configured to act as latches.
° For these devices, synthesizers infer a dedicated latch from incomplete conditional expressions.
ΗΜΥ408 Δ06-7 Design Flow.70 © Θεοχαρίδης, ΗΜΥ, 2018
D Latch
module d_latch (GATE, DATA, Q);
input GATE;
input DATA;
output Q;
reg Q;
always @ (GATE or DATA)
begin
if (GATE == 1'b1)
Q <= DATA;
end // End Latch
endmodule
ΗΜΥ408 Δ06-7 Design Flow.71 © Θεοχαρίδης, ΗΜΥ, 2018
D register
module d_register (CLK, DATA, Q);
input CLK;
input DATA;
output Q;
reg Q;
always @ (posedge CLK)
begin: My_D_Reg
Q <= DATA;
end
endmodule
ΗΜΥ408 Δ06-7 Design Flow.72 © Θεοχαρίδης, ΗΜΥ, 2018
How to handle latches?
° With some synthesis tools you can determine the number of latches that are implemented in your design.
° You should convert all If statements without corresponding Else statements and without a clock edge to registers.
° Use the recommended register coding styles in the synthesis tool documentation to complete this conversion.
ΗΜΥ408 Δ06-7 Design Flow.73 © Θεοχαρίδης, ΗΜΥ, 2018
Resource Sharing
° Resource sharing is an optimization technique that uses a single functional block (such as an adder or comparator) to implement several operators in the HDL code.
° Use resource sharing to improve design performance by reducing the gate count and the routing congestion.
° If you do not use resource sharing, each HDL operation is built with separate circuitry.
° However, you may want to disable resource sharing for speed critical paths in your design.
ΗΜΥ408 Δ06-7 Design Flow.74 © Θεοχαρίδης, ΗΜΥ, 2018
Resource Sharing module res_sharing (A1, B1, C1, D1, COND_1, Z1);
input COND_1; input [7:0] A1, B1, C1, D1; output [7:0] Z1; reg [7:0] Z1; always @(A1 or B1 or C1 or D1 or COND_1) begin
if (COND_1) Z1 <= A1 + B1; else Z1 <= C1 + D1;
end
endmodule
ΗΜΥ408 Δ06-7 Design Flow.75 © Θεοχαρίδης, ΗΜΥ, 2018
With and Without Resource Sharing
ΗΜΥ408 Δ06-7 Design Flow.76 © Θεοχαρίδης, ΗΜΥ, 2018
Resource Sharing
° The following operators can be shared either with instances of the same operator or with an operator on the same line.
• * • + – • > >= < <=
° For example, a + operator can be shared with instances of other + operators or with – operators.
° A * operator can be shared only with other * operators.
ΗΜΥ408 Δ06-7 Design Flow.77 © Θεοχαρίδης, ΗΜΥ, 2018
Resource Sharing
° You can implement arithmetic functions (+, –, magnitude comparators) with gates or with your synthesis tool’s module library.
° The library functions use modules that take advantage of the carry logic in CLBs/slices.
° Resource sharing of the module library automatically occurs in most synthesis tools if the arithmetic functions are in the same process.
° Resource sharing adds additional logic levels to multiplex the inputs to implement more than one function.
• Do not use it for arithmetic functions that are part of your design’s time critical path.
ΗΜΥ408 Δ06-7 Design Flow.78 © Θεοχαρίδης, ΗΜΥ, 2018
Using Preset Pin or Clear Pin
° Xilinx® FPGA devices consist of CLBs that contain function generators and flip-flops. Spartan-II™, Spartan-3™ , Virtex™, Virtex-E™, Virtex-II™, Virtex-II Pro™ and Virtex-II Pro X™ registers can be configured to have either or both preset and clear pins.
ΗΜΥ408 Δ06-7 Design Flow.79 © Θεοχαρίδης, ΗΜΥ, 2018
FlipFlop module ff_example( RESET, SET, CLOCK, ENABLE;D_IN;
A_Q_OUT; B_Q_OUT; C_Q_OUT; D_Q_OUT; E_Q_OUT); input RESET; input SET; input CLOCK; input ENABLE; input [7:0] D_IN; output [7:0] A_Q_OUT; output [7:0] B_Q_OUT; output [7:0] C_Q_OUT; output [7:0] D_Q_OUT; output [7:0] E_Q_OUT; // D flip-flop
always @(posedge CLOCK) begin A_Q_OUT <= D_IN;
end // End FF
ΗΜΥ408 Δ06-7 Design Flow.80 © Θεοχαρίδης, ΗΜΥ, 2018
Asynchronous Reset
always @(posedge CLOCK || posedge RESET) begin if (RESET == 1'b1) B_Q_OUT <= “00000000”; else if (CLOCK == 1'b1) B_Q_OUT <= D_IN; end
ΗΜΥ408 Δ06-7 Design Flow.81 © Θεοχαρίδης, ΗΜΥ, 2018
Asynchronous Set
always @(posedge CLOCK || posedge SET) begin if (SET == 1'b1) C_Q_OUT <= “11111111”; else if (CLOCK == 1'b1) C_Q_OUT <= D_IN; end
ΗΜΥ408 Δ06-7 Design Flow.82 © Θεοχαρίδης, ΗΜΥ, 2018
What is this ?
always @(posedge CLOCK || posedge RESET) begin if (RESET == 1'b1) D_Q_OUT <= “00000000”; else if (CLOCK == 1'b1) begin if (ENABLE == 1'b1) D_Q_OUT <= D_IN; end end
ΗΜΥ408 Δ06-7 Design Flow.83 © Θεοχαρίδης, ΗΜΥ, 2018
Answer
° Flip-flop with asynchronous reset and clock enable
ΗΜΥ408 Δ06-7 Design Flow.84 © Θεοχαρίδης, ΗΜΥ, 2018
Flip-flop with asynchronous reset; asynchronous set and clock enable
always @(posedge CLOCK || posedge RESET || posedge SET) begin
if (RESET == 1'b1) E_Q_OUT <= "00000000"; else if (SET == 1'b1) E_Q_OUT <= "11111111"; else if (CLOCK == 1'b1) begin if (ENABLE == 1'b1) E_Q_OUT <= D_IN; end
end
ΗΜΥ408 Δ06-7 Design Flow.85 © Θεοχαρίδης, ΗΜΥ, 2018
Using Clock Enable Pin Instead of Gated Clocks
° Use the CLB clock enable pin instead of gated clocks in your designs. Gated clocks can introduce glitches, increased clock delay, clock skew, and other undesirable effects
ΗΜΥ408 Δ06-7 Design Flow.86 © Θεοχαρίδης, ΗΜΥ, 2018
Gated Clock module gate_clock(IN1, IN2, DATA, CLK,LOAD,OUT1); input IN1; input IN2; input DATA; input CLK; input LOAD; output OUT1; reg OUT1; wire GATECLK; assign GATECLK = (IN1 & IN2 & CLK); always @(posedge GATECLK) begin if (LOAD == 1'b1) OUT1 <= DATA; end endmodule
ΗΜΥ408 Δ06-7 Design Flow.87 © Θεοχαρίδης, ΗΜΥ, 2018
Gated Clock
BAD IDEA !!
ΗΜΥ408 Δ06-7 Design Flow.88 © Θεοχαρίδης, ΗΜΥ, 2018
Clock Enable
module clock_enable (IN1, IN2, DATA, CLK, LOAD, DOUT); input IN1, IN2, DATA; input CLK, LOAD; output DOUT; wire ENABLE; reg DOUT; assign ENABLE = IN1 & IN2 & LOAD; always @(posedge CLK) begin if (ENABLE) DOUT <= DATA; end endmodule
ΗΜΥ408 Δ06-7 Design Flow.89 © Θεοχαρίδης, ΗΜΥ, 2018
Clock Enable
ΗΜΥ408 Δ06-7 Design Flow.90 © Θεοχαρίδης, ΗΜΥ, 2018
PLACEMENT AND ROUTING
Post-Synthesis Implementation
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.91 © Θεοχαρίδης, ΗΜΥ, 2018
Placement and routing
Two critical phases of layout design: – placement of components on the chip; – routing of wires between components.
Placement and routing interact, but separating layout design into phases helps us understand the problem and find good solutions.
ΗΜΥ408 Δ06-7 Design Flow.92 © Θεοχαρίδης, ΗΜΥ, 2018
Placement metrics
° Quality metrics for layout: • Area • Delay • Energy consumption
° Ideally placement and routing would be performed together • Both problems are NP-hard • For practical considerations placement and routing must be
performed separately
° Design time may be important for FPGAs
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.93 © Θεοχαρίδης, ΗΜΥ, 2018
Wire length as a quality metric
bad placement good placement
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.94 © Θεοχαρίδης, ΗΜΥ, 2018
Wire length measures
Estimate wire length by distance between components.
Possible distance measures: – Euclidean distance (sqrt(x2 +
y2)); – Manhattan distance (x + y).
Multi-point nets must be broken up into trees for good estimates.
Euclidean
Manhattan
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.95 © Θεοχαρίδης, ΗΜΥ, 2018
Wiring trees
Steiner point
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.96 © Θεοχαρίδης, ΗΜΥ, 2018
Placement techniques
Can construct an initial solution, improve an existing solution. Pairwise interchange is a simple
improvement metric: – Interchange a pair, keep the swap if it helps
wire length. – Heuristic determines which two components to
swap.
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.97 © Θεοχαρίδης, ΗΜΥ, 2018
Placement by partitioning
Works well for components of fairly uniform size. Partition netlist to minimize total wire
length using min-cut criterion. Partitioning may be interpreted as 1-D or 2-
D layout.
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.98 © Θεοχαρίδης, ΗΜΥ, 2018
Recursive partitioning
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.99 © Θεοχαρίδης, ΗΜΥ, 2018
Min-cut bisecting partitioning
partition 1 partition 2
A B
C D
3 nets
1 net
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.100 © Θεοχαρίδης, ΗΜΥ, 2018
Min-cut bisecting partitioning, cont’d
Swapping A and B: – B drags 1 net; – A drags 3 nets; – total cut increase: 3 nets.
Conclusion: probably not a good swap, but must be compared with other pairs.
ΗΜΥ408 Δ06-7 Design Flow.101 © Θεοχαρίδης, ΗΜΥ, 2018
Before Placement: Clustering ° Need to group BLEs into
groups ° Goals:
• Minimize number of clusters
• Minimize inter-cluster wiring
• Minimize critical path (timing-driven)
° How do we do this • Take advantage of cluster
architecture
ΗΜΥ408 Δ06-7 Design Flow.102 © Θεοχαρίδης, ΗΜΥ, 2018
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
44
6
6
5
5
7
4
netlist with delay for each gate
Timing Analysis
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
44
6
6
5
5
7
4
arrival times
0
0
0
1
3
1
7
9
7
7
13
15
14
18
22
18
Source: David Pan
ΗΜΥ408 Δ06-7 Design Flow.103 © Θεοχαρίδης, ΗΜΥ, 2018
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
44
6
6
5
5
7
4
arrival time/required time
0/4
0/0
0/8
1/5
3/3
1/9
7/9
9/9
7/15
7/13
13/15
15/15
14/18
18/22
22/22
18/22
PO1
PO2
PO3
PI1
PI2
PI3
1
3
1
4
6
44
6
6
5
5
7
4
slack = required time - arrival time
4
0
8
4
0
8
2
0
8
6
2
0
4
4
0
4
Timing Analysis
ΗΜΥ408 Δ06-7 Design Flow.104 © Θεοχαρίδης, ΗΜΥ, 2018
Example with interconnect delay
5 5 5
4 4 4
2
F F
F F
3 2 1 1
2 1 3 2
1
22
19
ΗΜΥ408 Δ06-7 Design Flow.105 © Θεοχαρίδης, ΗΜΥ, 2018
Placement • Placement has a set of competing goals. • Can’t optimize locally and globally simultaneously. • Use heuristic approaches to evaluate quality.
C D F
A B
E 1 2
LUT1 LUT2 A B C D E
ΗΜΥ408 Δ06-7 Design Flow.106 © Θεοχαρίδης, ΗΜΥ, 2018
Placement Algorithms
• Constructive methods: begin from netlist and generate an initial placement.
- Partitioning methods: mincut and Kernighan-Lin methods
- Clustering • Iterative improvement
- Begin with random or constructive placement. - Iterate to improve it. - Hill-climbing
ΗΜΥ408 Δ06-7 Design Flow.107 © Θεοχαρίδης, ΗΜΥ, 2018
Iterative Placement Algorithms • Pairwise interchange methods • Force-directed methods
- FD relaxation - FD pairwise exchange
• Simulated annealing - Generates best results - Can be time consuming
• Macro-based approaches - Genetic algorithms - Quad swaps
ΗΜΥ408 Δ06-7 Design Flow.108 © Θεοχαρίδης, ΗΜΥ, 2018
Iterative Improvement Algorithms
Force-directed: (classical mechanics) - Force vector computed on each module corresponding to
all nets - Solve set of non-linear differential equations.
Simulated annealing: (statistical mechanics)
- Model a physical annealing process which optimizes energy.
- Similar to “quenching” metal.
°
ΗΜΥ408 Δ06-7 Design Flow.109 © Θεοχαρίδης, ΗΜΥ, 2018
Timing-driven Placement
• Take both wire length and critical path into account • Problem
- Critical path changes as I move blocks - How do I balance the two objectives
• How do we go about modeling routing delay during placement?
ΗΜΥ408 Δ06-7 Design Flow.110 © Θεοχαρίδης, ΗΜΥ, 2018
Determining Criticality
• Same basic approach as used for clustering criticality • For each (i, j) connection from source i and sink j
- Determine arrival times (pre-order BFS) - Determine required arrival times (post-order BFS) - Determine slack -> required_arrival_time –
arrival_time - Criticality(i, j) = [1- slack(i, j)]/ (Max slack)
What is the purpose of the criticality exponent?
ΗΜΥ408 Δ06-7 Design Flow.111 © Θεοχαρίδης, ΗΜΥ, 2018
Balancing Wiring and Timing Cost
• Need to determine relative changes in timing and wiring based on moves
• Idea: Use relative changes from previous calculation - Both values less than 1 - Helps balance effect based on scaling parameter
This still doesn’t help address changes in delay
ΗΜΥ408 Δ06-7 Design Flow.112 © Θεοχαρίδης, ΗΜΥ, 2018
Routing • Problem Given a placement, and a fixed number of metal
layers, find a valid pattern of horizontal and vertical wires that connect the terminals of the nets
Levels of abstraction: o Global routing o Detailed routing
• Objectives Cost components:
o Area (channel width) – min congestion in prev levels helped o Wire delays – timing minimization in previous levels o Number of layers (less layers less expensive) o Additional cost components: number of bends, vias
ΗΜΥ408 Δ06-7 Design Flow.113 © Θεοχαρίδης, ΗΜΥ, 2018
Metal layer 1
Via
Routing Anatomy
Top view
3D view
Metal layer 2
Metal layer 3
Symbolic
Layout
Note: Colors used in this slide are not standard
ΗΜΥ408 Δ06-7 Design Flow.114 © Θεοχαρίδης, ΗΜΥ, 2018
Global vs. Detailed Routing • Global routing
Input: detailed placement, with exact terminal locations
Determine “channel” (routing region) for each net
Objective: minimize area (congestion), and timing (approximate)
• Detailed routing Input: channels and approximate routing from
the global routing phase Determine the exact route and layers for each
net Objective: valid routing, minimize area
(congestion), meet timing constraints Additional objectives: min via, power
Figs. [©Sherwani]
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.115 © Θεοχαρίδης, ΗΜΥ, 2018
Channel graph
LE LE
LE LE
channel channel
channel
channel
channel channel
channel
channel channel
channel
channel channel switch box
switch box
switch box
switch box
switch box
switch box
switch box
switch box
switch box
ΗΜΥ408 Δ06-7 Design Flow.116 © Θεοχαρίδης, ΗΜΥ, 2018
Routing Environment • Routing regions Channel
o Fixed height ? ( fixed number of tracks)
o Fixed terminals on top and bottom o More constrained problem: switchbox.
Terminals on four sides fixed Area routing
o Wires can pass through any region not occupied by cells (exception: over-the-cell routing)
• Routing layers Could be pre-assigned (e.g., M1 horizontal, M2 vert.) Different weights might be assigned to layers
1 1 4 5 4
3 2 3 2 5
1,3 4,5
ΗΜΥ408 Δ06-7 Design Flow.117 © Θεοχαρίδης, ΗΜΥ, 2018
Routing Environment • Chip architecture Full-custom:
o No constraint on routing regions Standard cell:
o Variable channel height? o Feed-through cells connect
channels FPGA:
o Fixed channel height o Limited switchbox connections o Prefabricated wire segments
have different weights
Failed net Channel
Feedthroughs
Figs. [©Sherwani]
Tracks
Failed connection
ΗΜΥ408 Δ06-7 Design Flow.118 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA Programmable Switch Elements • Used in connecting: The I/O of functional units
to the wires
A horizontal wire to a vertical wire
Two wire segments to form a longer wire segment
ΗΜΥ408 Δ06-7 Design Flow.119 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA Routing Channels Architecture • Note: fixed channel widths (tracks) • Should “predict” all possible connectivity
requirements when designing the FPGA chip • Channel -> track -> segment
• Segment length? Long: carry the signal longer,
less “concatenation” switches, but might waste track Short: local connections, slow for longer connections
channel track
segment
ΗΜΥ408 Δ06-7 Design Flow.120 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA Switch Boxes • Ideally, provide switches
for all possible connections
• Trade-off: Too many switches:
o Large area o Complex to program
Too few switches: o Cannot route signals
Xilinx 4000 One possible
solution
ΗΜΥ408 Δ06-7 Design Flow.121 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA Routing Architecture
°Island – Style FPGA °Row – Based FPGA °Sea – Gates FPGA °Hierarchical FPGA
Commercial FPGAs can be classified into the four groups, based on their routing architecture.
ΗΜΥ408 Δ06-7 Design Flow.122 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA Architecture - Layout • Island FPGAs Array of functional units Horizontal and vertical routing
channels connecting the functional units
Versatile switch boxes Example: Xilinx, Altera
• Row-based FPGAs Like standard cell design Rows of logic blocks Routing channels (fixed width)
between rows of logic Example: Actel FPGAs
ΗΜΥ408 Δ06-7 Design Flow.123 © Θεοχαρίδης, ΗΜΥ, 2018
The Four Classes of FPGA
ΗΜΥ408 Δ06-7 Design Flow.124 © Θεοχαρίδης, ΗΜΥ, 2018
An Island – Based FPGA
ΗΜΥ408 Δ06-7 Design Flow.125 © Θεοχαρίδης, ΗΜΥ, 2018
Island-Style Devices
• Two dimensional problem • (X+Y)!/(X!Y!) possible paths • Restricted within bounding box
ΗΜΥ408 Δ06-7 Design Flow.126 © Θεοχαρίδης, ΗΜΥ, 2018
Example channel segmentation distribution
ΗΜΥ408 Δ06-7 Design Flow.127 © Θεοχαρίδης, ΗΜΥ, 2018
Virtex Routing Architecture
ΗΜΥ408 Δ06-7 Design Flow.128 © Θεοχαρίδης, ΗΜΥ, 2018
18Kb BRAM
CAM
Multiplier BLVDS
Backplane
PCI-X
DDR
DDR
DDR
CAM
QDR SRAM
DDR SDRAM Distri
RAM
LVDS
Shift Registers
DCM
FIFO PCI
SONET / SDH
Virtex II Architecture
ΗΜΥ408 Δ06-7 Design Flow.129 © Θεοχαρίδης, ΗΜΥ, 2018
Virtex II Routing Hierarchy
ΗΜΥ408 Δ06-7 Design Flow.130 © Θεοχαρίδης, ΗΜΥ, 2018
Virtex II Clock Distribution
ΗΜΥ408 Δ06-7 Design Flow.131 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA Routing • Routing resources pre-fabricated
100% routability using existing channels If fail to route all nets, redo placement
• FPGA architectural issues Careful balance between number of logic blocks and routing
resources (100% logic area utilization?) Designing flexible switchboxes and channels
(conflicts with high clock speeds) • FPGA routing algorithms
Graph search algorithms o Convert the wire segments to graph nodes, and switch
elements to edges Bin packing heuristics (nets as objects, tracks as bins) Combination of maze routing and graph search algorithms
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.132 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA issues
Often want a fast answer. May be willing to accept lower quality result for less place/route time. May be interested in knowing wirability
without needing the final configuration. Fast placement: constructive placement,
iterative improvement through simulated annealing.
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.133 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA routing
Finding a route into given interconnection network. Global routing assigns to channels. Local routing selects the programming
points used to make the connections.
FPGA-Based System Design: Chapter 4 Copyright 2004 Prentice Hall PTR
ΗΜΥ408 Δ06-7 Design Flow.134 © Θεοχαρίδης, ΗΜΥ, 2018
FPGA routing techniques
Nair: route based on congestion, not distance. Route in two passes: – Estimate congestion. – Final routing.
Triptych: more gradual penalty for congestion.
ΗΜΥ408 Δ06-7 Design Flow.135 © Θεοχαρίδης, ΗΜΥ, 2018
Xilinx XC4000 Routing
25
ΗΜΥ408 Δ06-7 Design Flow.136 © Θεοχαρίδης, ΗΜΥ, 2018
Altera Stratix Logic Array Blocks (Clusters)
ΗΜΥ408 Δ06-7 Design Flow.137 © Θεοχαρίδης, ΗΜΥ, 2018
Routing Connections
Based on the switch and wire parasitic, interconnect routes can be modeled as RC networks.
S S
Other issues: Power
Routability
ΗΜΥ408 Δ06-7 Design Flow.138 © Θεοχαρίδης, ΗΜΥ, 2018
Timing-Driven Routing
• Add delay cost component to routing. • Represent delay along path as RC chain. Buffering
important here. • Note that timing driven routing selects most distant
point for first route. - Sets upper bound on delay.
• Need for combined breadth-first congestion and
timing-driven route.
ΗΜΥ408 Δ06-7 Design Flow.139 © Θεοχαρίδης, ΗΜΥ, 2018
Timing-Driven Routing
• Difficult to estimate remaining timing along a path
• Difficult to balance costs for each critical net
• Some routers attempt to “look-ahead” to anticipate congested or time-critical areas
• Optimal approaches have generally failed.
ΗΜΥ408 Δ06-7 Design Flow.140 © Θεοχαρίδης, ΗΜΥ, 2018
Combined Placement and Routing
• Used depth-first route to select initial connections • Swap blocks and rip up attached nets • Bias nets that span the bulk of device onto long-line
resources. • Took 16X longer than place and route
- 8% to 15% improvement.
ΗΜΥ408 Δ06-7 Design Flow.141 © Θεοχαρίδης, ΗΜΥ, 2018
Optimizing your FPGA design
° Pinout and Area Constraints Editor (PACE) ° Implementation (Mapping, Placing, Routing)
• Constraints Editor • Text Editor (HDL source) • Floorplanner -- Placement • FPGA Editor – Routing
° Timing Constraints • Xilinx Constraints Editor
ΗΜΥ408 Δ06-7 Design Flow.142 © Θεοχαρίδης, ΗΜΥ, 2018
VHDL based synthesis
ΗΜΥ408 Δ06-7 Design Flow.143 © Θεοχαρίδης, ΗΜΥ, 2018
VHDL code architecture RTL1 of RESOURCE is begin seq : process (RSTn, CLOCK) begin if (RSTn = '0') then DOUT <= (others => '0'); elsif (CLOCK'event and CLOCK = '1') then case SEL is when "00" => DOUT <= unsigned(A) - 1; when "01" => DOUT <= unsigned(B) - 1; when "10" => DOUT <= unsigned(C) - 1; when others => DOUT <= unsigned(D) - 1; end case; end if; end process; end RTL1;
ΗΜΥ408 Δ06-7 Design Flow.144 © Θεοχαρίδης, ΗΜΥ, 2018
Synthesized schematic
for RTL1 of resource
delay 57 ns
area 65 number of
flip-flops 16
ΗΜΥ408 Δ06-7 Design Flow.145 © Θεοχαρίδης, ΗΜΥ, 2018
4-bit Shift Register
ΗΜΥ408 Δ06-7 Design Flow.146 © Θεοχαρίδης, ΗΜΥ, 2018
4-bit Shift Register
ΗΜΥ408 Δ06-7 Design Flow.147 © Θεοχαρίδης, ΗΜΥ, 2018
HDL: Design Verification
HDL
Synthesis
Implementation
Download
HDL Implement your design using VHDL or Verilog
Functional Simulation
Timing Simulation
In-Circuit Verification
Behavioral Simulation
ΗΜΥ408 Δ06-7 Design Flow.148 © Θεοχαρίδης, ΗΜΥ, 2018
Behavioral Simulation
Synthesis: Design Verification
HDL
Synthesis
Implementation
Download
HDL
Synthesize the design to create an FPGA netlist
Functional Simulation
Timing Simulation
In-Circuit Verification
ΗΜΥ408 Δ06-7 Design Flow.149 © Θεοχαρίδης, ΗΜΥ, 2018
Implementation: Design Verification
Behavioral Simulation HDL
Synthesis
Implementation
Download
HDL
Translate, place and route and generate a bitstream to download in the FPGA
Functional Simulation
Timing Simulation
In-Circuit Verification
ΗΜΥ408 Δ06-7 Design Flow.150 © Θεοχαρίδης, ΗΜΥ, 2018
HDL: Summary
° Full VHDL/Verilog (RTL code) • Advantages:
- Portability - Complete control of the design implementation and tradeoffs - Easier to debug and understand a code that you own
• Disadvantages:
- Can be time consuming - Don’t always have control over the Synthesis tool - Need to be familiar with algorithm and how to write it
ΗΜΥ408 Δ06-7 Design Flow.151 © Θεοχαρίδης, ΗΜΥ, 2018
But…
What about the custom ASIC case?
ΗΜΥ408 Δ06-7 Design Flow.152 © Θεοχαρίδης, ΗΜΥ, 2018
Layout – Back End Tools
° Once our design meets all static timing requirements, we move on
° Next step: Layout ° Objective: Receive an HDL gate level netist ° Create a custom cell (or a chip) using that netlist ° We use Cadence Silicon Ensemble now
ΗΜΥ408 Δ06-7 Design Flow.153 © Θεοχαρίδης, ΗΜΥ, 2018
A typical ASIC Design Flow
HDL HDL Simulation
Pass?
HDL Synthesis
Func. Sim.
Physical Implementation
Netlist
Pass?
Pass?
Tim Sim & STA & DRC/ERC/LVS
no
no
no
fabrication
Floor Planning
Placement
Routing
DRC/LVS
ΗΜΥ408 Δ06-7 Design Flow.154 © Θεοχαρίδης, ΗΜΥ, 2018
A typical Layout Flow
Import Files (Design files & Libraries)
Floor Planning (Create cell rows)
Placement (Place IO & cells)
Routing • Power Ring Generation • Global Routing • Detailed Routing
Timing Data Generation (RC Extraction & Delay Calculation)
Design Rules Check (Antenna, connectivity, geometry)
Output Generation (GDSII, DEF, LEF, and SDF)
Clock Tree Synthesis
ΗΜΥ408 Δ06-7 Design Flow.155 © Θεοχαρίδης, ΗΜΥ, 2018
Sample Design
Recommended