ASIC Vs FPGA

AS

ICvs

FP

GA

S.M

anci

ni

Plan

Introduction

ASICsFPGAModèles de coûts

Méthodologie de conceptionDurcissement aux radiationsSoCBilan

1- ASIC vs FPGA

Problématique

FPGA ou ASIC ?

Sur quels critères fonder son choix ?Quels sont les points communs et différences desméthodes de conception ?

2- ASIC vs FPGAIntroduction

S. Mancini

Plan

Introduction



3- ASIC vs FPGA

Les familles

Les ASICs (Application Specific Integrated Circuit) sedécomposent en plusieurs familles :

Full CustomLes masques des transistors sont dessinés.

Standard cellsLe circuit est un assemblage de cellules placées/routées.

Gate arrayUne “mer” de portes est routée.

Embedded Gate arrayC’est un Gate array avec des macro-blocs complexes (RAM).

4- ASIC vs FPGAIntroduction- ASICs

S. Mancini

Evolution des technologie

1st April, 2003 UK Design Forum 6

Technology TrendsTechnology Trends

Proc

ess

Proc

ess

Gen

erat

ion

Gen

erat

ion

YearYear19199999

100nm

130nm

180180nmnm

250250nmnm

19199797 20200101 20200303 20200505 20200707 20200909

65nm

•

′99 ITRS (International Technology Roadmap for Semiconductors)

Leading Foundry

′00 ITRS

•

•

•

• ♦′01 ITRS

90nm

♦

1

Technologie 90 nm

430 KPortes/mm2

SRAM 1.6 à 1.2 mm2 par MbitDRAM 0.5 mm2 par Mbit6 à 9 couches de métal

5- ASIC vs FPGAIntroduction- ASICs

S. Mancini

Plan

Introduction



6- ASIC vs FPGA

Principe

Proposer des circuits génériques reconfigurables à vo-lonté. Ils sont constitués de matrices de cellules reconfi-gurables et d’un réseau d’interconnexion.

Principaux vendeurs :ActelAlteraAtmelCypress

LatticeMincQuickLogicXilinx

Les technologies diffèrent par :

La technologie de mémorisation de la configurationLe type de cellules élémentaires

7- ASIC vs FPGAIntroduction- FPGA

S. Mancini

Technologies de programmation

Les trois principales technologies de programmation sont :

SRAM

Flash

Anti-fusibles

Q

Q’RW

Data

Reconfigurable dynamique-ment

Technologie standardPerte de configuration à lamise hors tension


S. Mancini



SRAM

Flash

Anti-fusibles

Grille flottante

Conserve la configuration

Circuit “autonome”

Technologie non-standard


S. Mancini



SRAM

Flash

Anti-fusibles

Anti−fusible

Encombrement minimal

Non reprogrammable

Technologie spécifique


S. Mancini

Actel (ProAsic)

v3.1 5

ProASICPLUS Flash Family FPGAs

ProASICPLUS ArchitectureThe proprietary ProASICPLUS architecture providesgranularity comparable to gate arrays.

The ProASICPLUS device core consists of a Sea-of-Tiles™

(Figure 1). Each tile can be configured as a 3-input logicfunction (e.g., NAND gate, D-Flip-Flop, etc.) byprogramming the appropriate Flash switchinterconnections (Figure 2 on page 6 and Figure 3 onpage 6). Tiles and larger functions are connected with anyof the four levels of routing hierarchy. Flash switches aredistributed throughout the device to provide nonvolatile,reconfigurable interconnect programming. Flash switchesare programmed to connect signal lines to the appropriatelogic cell inputs and outputs. Dedicated high-performancelines are connected as needed for fast, low-skew globalsignal distribution throughout the core. Maximum coreutilization is possible for virtually any design.

ProASICPLUS devices also contain embedded two-portSRAM blocks with built-in FIFO/RAM control logic.Programming options include synchronous or asynchronousoperation, two-port RAM configurations, user defined depthand width, and parity generation or checking. Please see

the “Embedded Memory Configurations” section on page 21for more information.

Flash Switch

Unlike SRAM FPGAs, ProASICPLUS uses a live on power-upISP Flash switch as its programming element.

In the ProASICPLUS Flash switch, two transistors share thefloating gate, which stores the programming information.One is the sensing transistor, which is only used for writingand verification of the floating gate voltage. The other is theswitching transistor. It can be used in the architecture toconnect/separate routing nets or to configure logic. It is alsoused to erase the floating gate (Figure 2 on page 6).

Logic Tile

The logic tile cell (Figure 3 on page 6) has three inputs (anyor all of which can be inverted) and one output (which canconnect to both ultra-fast local and efficient long-linerouting resources). Any three-input, one-output logicfunction (except a three-input XOR) can be configured asone tile. The tile can be configured as a latch with clear orset or as a flip-flop with clear or set. Thus, the tiles canflexibly map logic and sequential gates of a design.

Figure 1 • The ProASICPLUS Device Architecture

256x9 Two-Port SRAM or FIFO Block

Logic Tile

256x9 Two Port SRAM or FIFO Block

RAM Block

RAM Block

I/Os

1

ProASICPLUS Flash Family FPGAs

6 v3.1

Routing Resources

The routing structure of ProASICPLUS devices is designed toprovide high performance through a flexible four-levelhierarchy of routing resources: ultra-fast local resources,efficient long-line resources, high speed very long-lineresources, and high performance global networks.

The ultra-fast local resources are dedicated lines that allowthe output of each tile to connect directly to every input ofthe eight surrounding tiles (Figure 4 on page 7).

The efficient long-line resources provide routing for longerdistances and higher fanout connections. These resourcesvary in length (spanning 1, 2, or 4 tiles), run both verticallyand horizontally, and cover the entire ProASICPLUS device(Figure 5 on page 7). Each tile can drive signals onto theefficient long-line resources, which can in turn, access everyinput of every tile. Active buffers are inserted automaticallyby routing software to limit the loading effects due todistance and fanout.

The high-speed very long-line resources, which span theentire device with minimal delay, are used to route very longor very high fanout nets. (Figure 6 on page 8).

The high-performance global networks are low skew, highfanout nets that are accessible from external pins or frominternal logic (Figure 7 on page 9). These nets are typicallyused to distribute clocks, resets, and other high fanout netsrequiring a minimum skew. The global networks areimplemented as clock trees, and signals can be introducedat any junction. These can be employed hierarchically withsignals accessing every input on all tiles.

Figure 2 • Flash Switch

Figure 3 • Core Logic Tile

Switch In

Switch Out

Word

Floating Gate

Sensing Switching

Local RoutingIn 1

In 2 (CLK)

In 3 (Reset)

Efficient Long-Line Routing

1

Mot

Switch

Switch in

Switch out

Grille flottante

Test

Flash

Circuit APA100System Gates 1 000 000Tiles (Registers) 56 320RAM 198 kBit

PLL 2Clocks 88


S. Mancini

Actel (Axcelerator)Axcelerator Family FPGAs

6 Advanced v1.5

Embedded MemoryAs mentioned earlier, each core tile has either three (in asmaller tile) or four (in the regular tile) embedded SRAMblocks along the west side, and each variable-aspect-ratioSRAM block is 4,608 bits in size. Available memoryconfigurations are: 128x36, 256x18, 512x9, 1kx4, 2kx2 or4kx1 bits. The individual blocks have separate read andwrite ports that can be configured with different bit widthson each port. For example, data can be written in by 8 andread out by 1. The embedded SRAM blocks can be initializedat power up via the device JTAG port (ROM emulationmode).

In addition, every SRAM block has an embedded FIFOcontrol unit. The control unit allows the SRAM block to beconfigured as a synchronous FIFO without using core logicmodules. The FIFO width and depth are programmable. TheFIFO also features programmable ALMOST-EMPTY(AEMPTY) and ALMOST-FULL (AFULL) flags in addition tothe normal EMPTY and FULL flags. The embedded FIFOcontrol unit also contains the counters necessary for thegeneration of the read and write address pointers as well ascontrol circuitry to prevent metastability and erroneousoperation. The embedded SRAM/FIFO blocks can becascaded to create larger configurations.

Figure 6 • AX Device Architecture (AX1000 shown)

Chip Layout

SuperCluster

I/O Structure(See Figure 6)

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

RAMC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

RD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

SC

HD

SC

SC

SC

SC

Core Tile

4kRAM/FIFO

4kRAM/FIFO

4kRAM/FIFO

4kRAM/FIFO

RX

TX

BC R CC C R

RX RX RX

TX TXTX

Table 1 • Number of Core Tiles per Device

Device Number of Core Tiles

AX125 1 regular tile

AX250 4 smaller tilesAX500 4 regular tilesAX1000 9 regular tilesAX2000 16 regular tiles

1

Axcelerator Family FPGAs

4 Advanced v1.5

Logic ModulesActel’s Axcelerator family provides two types of logicmodules, the register cell (R-cell) and the combinatorialcell (C-cell). The AX C-cell can implement more than 4,000combinatorial functions of up to 5 inputs (Figure 3 onpage 5). The C-cell contains carry logic for even moreefficient implementation of arithmetic functions. With itssmall size, the C-cell structure is extremelysynthesis-friendly, simplifying the overall design as well asreducing design time.

The R-cell contains a flip-flop featuring asynchronous clear,asynchronous preset, and active-low enable control signals(Figure 3 on page 5). The R-cell registers featureprogrammable clock polarity selectable on aregister-by-register basis. This provides additional flexibility(e.g., easy mapping of dual-data-rate functions into theFPGA) while conserving valuable clock resources. The clocksource for the R-cell can be chosen from the hard-wiredclocks, the routed clocks, or the internal logic.

Two C-cells, a single R-cell, and two Transmit (TX) and twoReceive (RX) routing buffers form a Cluster, and twoClusters comprise a SuperCluster (Figure 4 on page 5).Each SuperCluster contains an independent Buffer module,which supports automatic buffer insertion on high-fanoutnets by the place-and-route tool, minimizing system delayswhile improving logic utilization.

The logic modules within the SuperCluster are arranged sothat two combinatorial modules are side by side, giving aC–C–R – C–C–R pattern to the SuperCluster. This C–C–Rpattern enables efficient implementation (minimum delay)of 2-bit carry logic for improved arithmetic performance(Figure 5 on page 5).

The AX architecture is fully fracturable, meaning that if oneor more of the logic modules in a SuperCluster are used by aparticular signal path, the other logic modules are stillavailable for use by other paths.

At the chip level, SuperClusters are organized into coretiles, which are arrayed to build up the full chip. Each coretile consists of an array of 336 SuperClusters and four SRAMblocks (176 SuperClusters and 3 SRAM blocks for theAX250). The SRAM blocks are arranged in a column on thewest side of the tile (Figure 6 on page 6). For example, theAX1000 is composed of a 3x3 array of 9 core tiles.Surrounding the array of core tiles are blocks of I/O Clustersand the I/O bank ring (Table 1 on page 6).

Figure 2 • Axcelerator Family Interconnect Elements1

Circuit AX2000System Gates 2 000 000R-Cells 10 752C-Cells 21 504

RAM 338 kBitPLL 8Clocks 4


S. Mancini

Xilinx (Spartan 3/Virtex II)

Functional Description: FPGAR

36 www.xilinx.com DS083-2 (v2.7) June 2, 20031-800-255-7778 Advance Product Specification

3. NO_CHANGE

The NO_CHANGE option maintains the content of the out-put registers, regardless of the write operation. The clockedge during the write mode has no effect on the content ofthe data output register DO. When the port is configured asNO_CHANGE, only a read operation loads a new value inthe output register DO, as shown in Figure 42.

Control Pins and Attributes

Virtex-II Pro SelectRAM+ memory has two independentports with the control signals described in Table 19. All con-trol inputs including the clock have an optional inversion.

Initial memory content is determined by the INIT_xxattributes. Separate attributes determine the output registervalue after device configuration (INIT) and SSR is asserted(SRVAL). Both attributes (INIT_B and SRVAL) are availablefor each port when a block SelectRAM+ resource is config-ured as dual-port RAM.

Total Amount of SelectRAM+ Memory

Virtex-II Pro SelectRAM+ memory blocks are organized inmultiple columns. The number of blocks per columndepends on the row size, the number of Processor Blocks,and the number of RocketIO transceivers.

Table 20 shows the number of columns as well as the totalamount of block SelectRAM+ memory available for eachVirtex-II Pro device. The 18 Kb SelectRAM+ blocks arecascadable to implement deeper or wider single- or dual-portmemory resources.

Figure 43 shows the layout of the block RAM columns in theXC2VP4 device.

18-Bit x 18-Bit Multipliers

Introduction

A Virtex-II Pro multiplier block is an 18-bit by 18-bit 2’s com-plement signed multiplier. Virtex-II Pro devices incorporatemany embedded multiplier blocks. These multipliers can beassociated with an 18 Kb block SelectRAM+ resource orcan be used independently. They are optimized forhigh-speed operations and have a lower power consump-tion compared to an 18-bit x 18-bit multiplier in slices.

Figure 42: NO_CHANGE Mode

Table 19: Control Functions

Control Signal Function

CLK Read and Write Clock

EN Enable affects Read, Write, Set, Reset

WE Write Enable

SSR Set DO register to SRVAL (attribute)

CLK

WE

Data_in

Data_in

New

aa

Last Read Cycle Content (no change)

Address

Internal Memory DO No change during write

Data_out

DI

DS083-2_12_050901

RAM Contents NewOld

Table 20: Virtex-II Pro SelectRAM+ Memory Available

Device Columns

Total SelectRAM+ Memory

Blocks in Kb in Bits

XC2VP2 4 12 216 221,184

XC2VP4 4 28 504 516,096

XC2VP7 6 44 792 811,008

XC2VP20 8 88 1,584 1,622,016

XC2VP30 8 136 2,448 2,506,752

XC2VP40 10 192 3,456 3,538,944

XC2VP50 12 232 4,176 4,276,224

XC2VP70 14 328 5,904 6,045,696

XC2VP100 16 444 7,992 8,183,808

XC2VP125 18 556 10,008 10,248,192

Figure 43: XC2VP4 Block RAM Column Layout

BRAMMultiplierBlocks

PPC405CPU

CLBs

CLB

s

CLB

s CLBs

CLBs

DS083-2_11_010802

TMRocketIOSerial Transceivers

TMRocketIOSerial Transceivers

DCM DCM

DCM DCM

1



Configurable Logic Blocks (CLBs)The Virtex-II Pro configurable logic blocks (CLB) are orga-nized in an array and are used to build combinatorial andsynchronous logic designs. Each CLB element is tied to aswitch matrix to access the general routing matrix, asshown in Figure 23.

A CLB element comprises 4 similar slices, with fast localfeedback within the CLB. The four slices are split in two col-umns of two slices with two independent carry logic chainsand one common shift chain.

Slice Description

Each slice includes two 4-input function generators, carrylogic, arithmetic logic gates, wide function multiplexers andtwo storage elements. As shown in Figure 24, each 4-inputfunction generator is programmable as a 4-input LUT, 16bits of distributed SelectRAM+ memory, or a 16-bit vari-able-tap shift register element.

Figure 23: Virtex-II Pro CLB Element

SliceX1Y1

SliceX1Y0

SliceX0Y1

SliceX0Y0

FastConnectsto neighbors

SwitchMatrix

DS083-2_32_122001

SHIFTCIN

COUT

TBUF

COUT

CIN

TBUF

Figure 24: Virtex-II Pro Slice Configuration

Register/Latch

MUXF5

MUXFx

CYSRL16

RAM16

LUTG

Register/Latch

Arithmetic Logic

CYLUT

F

DS083-2_31_122001

SRL16

RAM16

ORCY

1



Configurable Logic Blocks (CLBs)The Virtex-II Pro configurable logic blocks (CLB) are orga-nized in an array and are used to build combinatorial andsynchronous logic designs. Each CLB element is tied to aswitch matrix to access the general routing matrix, asshown in Figure 23.

A CLB element comprises 4 similar slices, with fast localfeedback within the CLB. The four slices are split in two col-umns of two slices with two independent carry logic chainsand one common shift chain.

Slice Description

Each slice includes two 4-input function generators, carrylogic, arithmetic logic gates, wide function multiplexers andtwo storage elements. As shown in Figure 24, each 4-inputfunction generator is programmable as a 4-input LUT, 16bits of distributed SelectRAM+ memory, or a 16-bit vari-able-tap shift register element.

Figure 23: Virtex-II Pro CLB Element

SliceX1Y1

SliceX1Y0

SliceX0Y1

SliceX0Y0

FastConnectsto neighbors

SwitchMatrix

DS083-2_32_122001

SHIFTCIN

COUT

TBUF

COUT

CIN

TBUF

Figure 24: Virtex-II Pro Slice Configuration

Register/Latch

MUXF5

MUXFx

CYSRL16

RAM16

LUTG

Register/Latch

Arithmetic Logic

CYLUT

F

DS083-2_31_122001

SRL16

RAM16

ORCY

1

Circuit Spartan 3 VirtexIILogic Cells 74 880 125 136Slices 33080 55 616RAM 2,5 MBit 11 MBit

Circuit Spartan 3 VirtexIIMult. (18x18) 104 556Clock man. 4 12µP 0 4 PPC


S. Mancini

Altera (Apex/Stratix)

Circuit Apex II (EP2A70) Stratix (EP1S80) Excalibur (EPXA10)LEs 67 200 79 040 38 400RAM 1 Mbit 7 Mbit 3 MbitMult. (9x9) 176PLL 4 12 ?µP ARM922T


S. Mancini

Plan

Introduction



15- ASIC vs FPGA

Coûts des FPGAs

Exemple de prix unitaires pour de grandes quantités :

Société Référence Prix

Altera EP20K200 (Apex 20k) 340 $

Altera EP1S80 800 $

Altera EPXA1 (Excalibur ARM) 27 $

Xilinx XC3S1000 (Spartan 3) 200 $

Xilinx XC2V8000 (Virtex II) 8000 $

Xilinx XC2VP100 (Virtex II Pro) 11000 $

Actel APA1000 (ProAsic+) 400 $

Actel AX2000 (Axcelerator) 630 $

16- ASIC vs FPGAIntroduction- Modèles de coûts

S. Mancini

S’ajoute{

Outils de CAOEEPROMs externes

18

Coût des ASICS

Troix composantes :

Coût de conceptionIngénieurs

Outils de CAO ≈ 500 000 $ par an.

NRE (Non-Recurring Engineering Charges)Coûts de fabrication incompressibles (masques, . . . )

≈ 50 000 $, jusqu’à 1,5 M$ pour wafer 300 mm techno 90 nm

Coût unitaireCoût de fabrication unitaire ≈ 0.2 $ par mm2

Un wafer 300 mm (90000 mm2)= 300 $

Les gate-arrays réduisent les NRE.


S. Mancini

Comparaison

Données : système de 250K portes

NRE ($) Coût unitaire ($)

FPGA 3 200ASIC 350 000 30

Device Only Cost (ASIC includes NRE)Unit’s FPGA Cost ASIC Cost FPGA 3,200$ Each

5 16,000$ 350,150$ FPGA NRE -$

10 32,000$ 350,300$ ASIC 30$ Each

50 160,000$ 351,500$ ASIC NRE 350,000$

100 320,000$ 353,000$ 150 480,000$ 354,500$

Device + EDA Tools Estimate (ASIC includes NRE)FPGA EDA 82,000$ Simulation+Synthesis+FPGA Place&RouteASIC EDA 343,000$ Simulation+Synthesis+Timing+ATPG

Unit’s FPGA Cost ASIC Cost FPGA 3,200$ Each

10 114,000$ 693,300$ FPGA NRE -$

50 242,000$ 694,500$ ASIC 30$ Each

100 402,000$ 696,000$ ASIC NRE 350,000$

150 562,000$ 697,500$ 200 722,000$ 699,000$ 250 882,000$ 700,500$

FPGA/ASIC Cost vs Units (250KGates)

$-

$100,000

$200,000

$300,000

$400,000

$500,000

$600,000

5 10 50 100 150

# of Units

To

tal U

nit

Co

st (

US

$)

FPGA Cost

ASIC Cost


$-

$200,000

$400,000

$600,000

$800,000

$1,000,000

10 50 100 150 200 250

# of Units

To

tal U

nit

Co

st (

US

$)

FPGA Cost

ASIC Cost

1


5 16,000$ 350,150$ FPGA NRE -$

10 32,000$ 350,300$ ASIC 30$ Each

50 160,000$ 351,500$ ASIC NRE 350,000$

100 320,000$ 353,000$ 150 480,000$ 354,500$



10 114,000$ 693,300$ FPGA NRE -$

50 242,000$ 694,500$ ASIC 30$ Each

100 402,000$ 696,000$ ASIC NRE 350,000$

150 562,000$ 697,500$ 200 722,000$ 699,000$ 250 882,000$ 700,500$


$-

$100,000

$200,000

$300,000

$400,000

$500,000

$600,000

5 10 50 100 150

# of Units

To

tal U

nit

Co

st (

US

$)

FPGA Cost

ASIC Cost


$-

$200,000

$400,000

$600,000

$800,000

$1,000,000

10 50 100 150 200 250

# of Units

To

tal U

nit

Co

st (

US

$)

FPGA Cost

ASIC Cost

1Coût du circuit


5 16,000$ 350,150$ FPGA NRE -$

10 32,000$ 350,300$ ASIC 30$ Each

50 160,000$ 351,500$ ASIC NRE 350,000$

100 320,000$ 353,000$ 150 480,000$ 354,500$



10 114,000$ 693,300$ FPGA NRE -$

50 242,000$ 694,500$ ASIC 30$ Each

100 402,000$ 696,000$ ASIC NRE 350,000$

150 562,000$ 697,500$ 200 722,000$ 699,000$ 250 882,000$ 700,500$


$-

$100,000

$200,000

$300,000

$400,000

$500,000

$600,000

5 10 50 100 150

# of Units

To

tal U

nit

Co

st (

US

$)

FPGA Cost

ASIC Cost


$-

$200,000

$400,000

$600,000

$800,000

$1,000,000

10 50 100 150 200 250

# of Units

To

tal U

nit

Co

st (

US

$)

FPGA Cost

ASIC Cost

1. . . et la CAO

http ://www.altera.com/products/devices/cost/cst-cost_step1.jsp


S. Mancini

Les circuits multi-projets

Plusieurs projets/circuits sont faits sur le même waferpour partager les NRE.

Europractice : AMI Semiconductor 0,35 µm CMOS680 Euro/mm2

CMP : STMicroelectronics 0,18 µm CMOS HC-MOS8D 990 Euro/mm2

19- ASIC vs FPGAMéthodologie de conception

S. Mancini

Plan

IntroductionMéthodologie de conception

Méthodes communesSpécificité des ASICsSpécificité des FPGAsLe prototypage : FPGA vers ASICExemple de projet “multi-plateforme” : LEON

Durcissement aux radiationsSoCBilan

20- ASIC vs FPGA

Flot de conception

nonoui oui

FPGA

ASICFabrication

Programmation

Validation

Simulation

routagePlacementSynthèse

Simulation

ValidationValidation

Simulation

Spécification

de testVecteurs

VHDL(RTL)

21- ASIC vs FPGAMéthodologie de conception- Méthodes communes

S. Mancini

Plan




22- ASIC vs FPGA

Synthèse directe

Les descriptions à un "haut" niveau d’abstraction desblocs fonctionnels sont transformées en cellules stan-dards.

Entity

e

ee

s

3

21

VHDL

LAYOUT

NETLIST

Synthèse

PlacementRoutage

Pas de circuits spécifiques de type RAM/CAM, PLL

23- ASIC vs FPGAMéthodologie de conception- Spécificité des ASICs

S. Mancini

Composants "précaractérisés"-IP

Les circuits complexes sont proposés sous la forme demacro-blocs.

Les fondeurs pro-posent des modèlesde simulation etdes masques (vueabstraite).La synthèse se faitpar instanciation de“boîte noire”.

e

ee

s

3

21Entity

VHDL NETLIST

LAYOUT

IP

RAM


S. Mancini

Le “Back-End”

Le placement/routage se décompose en plusieursétapes :

PlacementInsertion testInsertion arbre d’horlogeRoutage des horloges

Routage completAnalyse de timingVérification (DRC, LVS, simu-lation post placement/routage,. . . )

Les blocs fonctionnelspeuvent être décom-posés et placés/routésséparement

chiplet timing, clock matching, and I/O tim-

ing analysis.

To achieve timing closure, we made engi-

neering change orders to the netlist after routing.

Following each manipulation step, formal verifi-

cation ensured that the modified netlist was func-

tionally equivalent to the one after test insertion.

We aligned all clock domains having syn-

chronous chiplet crossings. For example, if the

memory interface clock in one chiplet was syn-

chronously connected to the same clock in

another chiplet, we phase-aligned these clocks

and analyzed the signal paths to meet timing

constraints. We achieved clock alignment by

tweaking the clock insertion delays, using align-

ers in the clock module. Similarly, we made the

clock trees as structurally identical as possible.

As part of the physical design process, we met

design completion and manufacturability goals

by implementing techniques such as design rule

checks, antenna fixes, track filling, and doubling

of vias wherever possible. Figure 4 shows the lay-

out plot for the Viper design’s initial version.

Table 3 summarizes the major design

parameters.

WE HAVE LEARNED much from the Viper design

experience and trust it will guide us in the

future, particularly since the next-generation

SOC designs are significantly more complex,

calling for still higher levels of integration. Some

of our current activities, in addition to regular

chip-development tasks, are investigating more

efficient on-chip bus architectures and better

design-reuse methodologies. �

AcknowledgmentsWe thank the Viper management and design

teams for their hard work, particularly chief

architects Gert Slavenburg and Lane Albanese,

without whose foresight and leadership the pro-

ject never would have been successful.

References1. S. Rathnam and G. Slavenburg, “An Architectural

Overview of the Programmable Multimedia

Processor, TM-1,’’ Proc. 41st IEEE Computer

Society Int’l Conf. (COMPCON 96), IEEE CS

Press, Los Alamitos, Calif., 1996, pp. 319-326.

2. D. Paret and C. Fenger, The I2C Bus, John Wiley

& Sons, New York, 1997.

Santanu Dutta is a designengineering manager atPhilips Semiconductors inSunnyvale, California. Hisresearch interests includedesign of high-performance

Application-Specific SOC Multiprocessors

30 IEEE Design & Test of Computers

CAB MPEG

MBS+

VIP1+

VIP2

ICP1 + ICP2 + MMI

Conditionalaccess

(MSP1 + MSP2)T-PI

M-PI

TM32

1394

MSP3

PR3940

Figure 4. Layout of Viper (PNX8500).

Table 3. Design statistics.

Parameter Value

Process technology TSMC 0.18 µm, six metal layers

Transistors About 35 million

Instances 1.2 million instances, or 8 million gates

Memories 243 instances, 750-Kbit memory

CPUs 2 (TriMedia TM32 and MIPS PR3940)

Peripherals 50

Clock domains 82

Clock speed TM32: 200 MHz; PR3940: 150 MHz;

SDRAM: 143 MHz

Power 4.5 W

Supply voltage 1.8-V core and 3.3-V I/O

Package BGA456

1PNX8500 (philips)

La physique des interconnexions doit être prise en compte.


S. Mancini

Plan




26- ASIC vs FPGA

Modèles d’entrées

Les vendeurs de FPGA proposent des outils“propriétaires” pour utiliser les FPGAs :

Saisie de schématiqueLangages de description spécifiques

AHDL - AlteraABEL - Xilinx

La synthèse peut être réalisée par des outils tiers(Leonardo, Synplicity, Synopsys, etc ...).

27- ASIC vs FPGAMéthodologie de conception- Spécificité des FPGAs

S. Mancini

Placement/routage

Le placement/routage est réalisé par des outilspropriétaires. Ces outils permettent :

d’allouer les blocs fonctionnelsd’extraire une analyse de timing

L’acroissement de com-plexité des FPGA imposel’utilisation de méthodolo-gies hiérarchiques.


S. Mancini

Utilisation des ressources

?Comment utiliser les ressources des FPGAs ?

Instanciation directePrimitives (macro-cells,RAM, etc ...)

Bibliothèques de macro-fonctions

Selon les outils de syn-thèse ces instances nepeuvent pas être synthé-tisées de façon classique

MainMacro

Enveloppe

Enveloppe

Synthèse

RoutagePlacement

Description de haut niveau/ inférenceLes synthétiseurs détectent les blocs complexes.

Exemple : RAM, multiplieurs, etc ...


S. Mancini

Plan




30- ASIC vs FPGA

Principe

On utilise des FPGAs pour valider la conception d’unASIC.

Il existe des plateformes d’émulation génériques degrandes complexité (Aptix, Quickturn, . . . ).

Accroissement de la vitesse de simulation

Pas de vérification temporelle

L’architecture de l’émulateur peut être inadaptée auprojet

31- ASIC vs FPGAMéthodologie de conception- Le prototypage : FPGA vers ASIC

S. Mancini

Exemple : Aptix

“Nokia made a commitment to create real-time prototypes of

all its new mobile phone designs. Prototypes are the only way

to validate our algorithms by testing actual voice transmission

quality. We adopted the Aptix solution because it provides a

productive debug environment while maintaining our objective

of real-time verification.

Stelios Podimatis Member of Technical Staff, ASIC Engineering, Nokia (San Diego, CA)

The System Explorer MP3CF is optimizedfor prototyping DSP-based pipelineddesigns with moderate requirements forinterconnect between prototyping compo-nents. The MP3CF architecture providesmaximum performance for prototypesincorporating fixed-pin prototyping com-ponents such as CPUs, DSPs, memorycards, etc. Use the MP3CF for buildinghigh-speed prototypes of wireless commu-nication and digital-imaging applications.

FPCB® user “freehole” areawith 1,920 routable pins

accommodates a wide varietyof prototyping components

FPIC® Programmable InterconnectComponents (3) provides soft-

ware-controlled interconnect anddiagnostic probing

Microcontroller configures all programmable hardware,

performs system self-test andstores data for stand-alone

configurationBoard-edge I/O

Modular hard-wired buses forhigh-fanout bi-directional nets

System Explorer MP3CF hardware

I/O cable connectors (20) withinterleaved grounds provide flexibleconnection to target systems

System Explorer MP3CF interconnect architecture

/ /

/

REGION #3REGION #2REGION #1

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

USER COMPONENT HOLES

FPIC#1

FPIC#2

FPIC#3

140 140

140

GLOBAL INTERCONNECT LINES

One-to-one connectionsbetween FPIC®

Device and component pins

All component pins in agiven region connectthrough one FPIC® device

Component pins in differentregions connect through twoFPIC® devices

Solutions for Wireless Communications and Image Processing

User-controlled power supply voltageselection and monitoring to supportadvanced prototyping components todayand tomorrow

Modular low-skewclock circuits (8)

12

3

5

4

6

7

8

1

“Nokia made a commitment to create real-time prototypes of

all its new mobile phone designs. Prototypes are the only way

to validate our algorithms by testing actual voice transmission

quality. We adopted the Aptix solution because it provides a

productive debug environment while maintaining our objective

of real-time verification.

Stelios Podimatis Member of Technical Staff, ASIC Engineering, Nokia (San Diego, CA)

The System Explorer MP3CF is optimizedfor prototyping DSP-based pipelineddesigns with moderate requirements forinterconnect between prototyping compo-nents. The MP3CF architecture providesmaximum performance for prototypesincorporating fixed-pin prototyping com-ponents such as CPUs, DSPs, memorycards, etc. Use the MP3CF for buildinghigh-speed prototypes of wireless commu-nication and digital-imaging applications.

FPCB® user “freehole” areawith 1,920 routable pins

accommodates a wide varietyof prototyping components

FPIC® Programmable InterconnectComponents (3) provides soft-

ware-controlled interconnect anddiagnostic probing

Microcontroller configures all programmable hardware,

performs system self-test andstores data for stand-alone

configurationBoard-edge I/O

Modular hard-wired buses forhigh-fanout bi-directional nets

System Explorer MP3CF hardware

I/O cable connectors (20) withinterleaved grounds provide flexibleconnection to target systems

System Explorer MP3CF interconnect architecture

/ /

/

REGION #3REGION #2REGION #1

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

FPGA FPGA

USER COMPONENT HOLES

FPIC#1

FPIC#2

FPIC#3

140 140

140

GLOBAL INTERCONNECT LINES

One-to-one connectionsbetween FPIC®

Device and component pins

All component pins in agiven region connectthrough one FPIC® device

Component pins in differentregions connect through twoFPIC® devices

Solutions for Wireless Communications and Image Processing

User-controlled power supply voltageselection and monitoring to supportadvanced prototyping components todayand tomorrow

Modular low-skewclock circuits (8)

12

3

5

4

6

7

8

132- ASIC vs FPGAMéthodologie de conception- Le prototypage : FPGA vers ASIC

S. Mancini

Plan




33- ASIC vs FPGA

Architecture de LEON

LEON-2 User’s Manual 9 Version 1.0.19

Gaisler Research

1.4 Functional overview

A block diagram of LEON-2 can be seen in figure 1.

1.4.1 Integer unit

The LEON integer unit implements the full SPARC V8 standard, including all multiply anddivide instructions. The number of register windows is configurable within the limit of theSPARC standard (2 - 32), with a default setting of 8. To aid software debugging, up to fourwatchpoint registers can be configured. Each register can cause a trap on an arbitraryinstruction or data address range. If the debug support unit is enabled, the watchpoints canbe used to enter debug mode.

1.4.2 Floating-point unit and co-processor

The LEON model does not include an FPU, but provides a direct interface to the Meiko FPUcore, and a general interface to connect other floating-point units. A generic co-processorinterface is provided to allow interfacing of custom co-processors.

1.4.3 Cache sub-system

Separate, multi-set instruction and data caches are provided, each configurable with 1 - 4sets, 1 - 64 kbyte/set, 16 - 32 bytes per line. Sub-blocking is implemented with one valid bitper 32-bit word. The instruction cache uses streaming during line-refill to minimise refill

Figure 1: LEON-2 block diagram

Integer unit

I-Cache D-Cache

FPU

MemoryController

AMBA AHB

UARTS

Timers IrqCtrl

I/O port

AMBA APB

AHB/APBBridge

AHBController

PCI

LEON processor

I/OPROM SRAM

8/16/32-bits memory bus

DebugSupport Unit CP

DebugSerial Link

MMU

SDRAM

EthernetLocal ram

Local ram

1Références : http ://www.gaisler.com

34- ASIC vs FPGAMéthodologie de conception- Exemple de projet “multi-plateforme” : LEON

S. Mancini

Cibles technologiques

Technologie RAM PADSModèle comportemental inférée inférésXilinx VIRTEX/2 FPGA instanciée inférésAtmel ATC18/25/35 instanciée instanciésUMC FS90A/B instanciée instanciésUMC 0.18 um CMOS instanciée instanciésTSMC 0.25 um w. Artisan rams instanciée instanciésActel Proasic FPGA instanciée inférésActel AX anti-fuse FPGA instanciée inférés


S. Mancini

Organisation du projet

hdss1_128x32cm4sw0

atc18_syncram

Code VHDL

generic_syncram

RAMB16_S36

virtex2_syncram

RAM256x9SST

proasic_syncram

syncram

cache

Les mémoires instanciées sont à la fois :

Des boîtes noires pour la synthèseLes entités sont considérées comme des cellules de la biblio-thèque.

Des descriptions comportementales pour la simulationElles peuvent être fournies par le vendeur de RAM.


S. Mancini

Exemple de codecachemem.vhdentity cachemem is...dtags0 : syncram port map (......

tech_map.vhdentity syncram is...inf : if INFER_RAM generateu0 : generic_syncram generic map (...hb : if (not INFER_RAM) generateatc1 : if TARGET_TECH = atc18 generateu0 : atc18_dpram generic map (......

tech_act18.vhd– pragma translate_offentity hdss2_512x32cm4sw0 is...architecture behavioral of hdss2_512x32cm4sw0 is...– pragma translate_on

entity atc18_syncram is...id0 : hdss1_128x32cm4sw port map (......

37- ASIC vs FPGADurcissement aux radiations

S. Mancini

Plan

IntroductionMéthodologie de conceptionDurcissement aux radiations

Durcissement des ASICsDurcissement des FPGAs

SoCBilan

38- ASIC vs FPGA

Single Event Upset (SEU)

Une particule peut faire changer d’état les éléments de mé-morisation (Latch, registres, SRAM, . . . ) .

� � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

� � � � � � � � � � � � � � � � � � � � � � � � � � � � ��

sgnd vdd

N+ P P N N P+

Substrat NCaisson P

Select

Select

se

e

0


S. Mancini

Single Event Transient (SET)

La circuiterie combinatoire peut être altérée :

Une erreur à l’instant d’échantillonnage peut être mé-morisée

L’arbre d’horloge génère des fronts parasites

Q

D

Clk

Q

D

Clk

Clk

QD

SET sur l’horlogeSET sur la donnée


S. Mancini

Latchup

gnd vdd

P P N+P+ N N

Caisson NSubstrat P


S. Mancini

Plan



SoCBilan

42- ASIC vs FPGA

Principales méthodes

Utilisation de technologies :

Sur-mesures

Dissipation des charges (dimensionnement, capacités)Filtrage temporel (retard+vote)Isolation des transistorsCellules intra-redondantes

Standards

TMRCodes correcteurs d’erreurAuto-test

43- ASIC vs FPGADurcissement aux radiations- Durcissement des ASICs

S. Mancini

Les registres

TMR : Triple Modular Redundancy

CLK

Vote

Les registres doivent être éloignés pour ne pas subirle même défaut. Il doivent être mis à jour par la valeurcorrigée.


S. Mancini

Les mémoires

SRAM

StandardDes codes correcteurs d’erreurs protègent les donnéesstockées. Des bits supplémentaires sont nécessaires.

SpécifiquesLes bits d’un mot sont spatialement séparés. La surface estaccrue.

(S)DRAMLes SEU accélèrent la décharge des points mémoire.

On peut accroître le taux de rafraîchissement.


S. Mancini

Méthodologies de durcissement

Méthodes automatiques

Technologies spécifiquesLes cellules durcies sont utilisées au lieu des cellulesstandards.

Atmel propose la technologie durcie 0.18µ

ATC18RHA.TMRla synthèse “classique” est suivie d’une modification denetlist.

Cela peut être fait par des scripts des outils de synthèse ou par

modification des fichiers résultats.

Utilisation de gate-array durcisPar conception


S. Mancini

Introduire des technologies d’auto-test dans les circuits.

49

Plan



SoCBilan

47- ASIC vs FPGA

Origine des disfonctionnements

Les éléments des FPGAs qui sont susceptibles deprovoquer des disfonctionnements :

Registres des cellulesRAM embarquéeLa configuration est sensible aux SEU

La SRAM peut être altérée (XC2VP125 : 43 Mbits de configuration)

Les Anti-fusibles peuvent “claquer”Les EEPROM peuvent changer d’état

La logique générique génère des SETLogique d’interconnexionArbre d’horloge

48- ASIC vs FPGADurcissement aux radiations- Durcissement des FPGAs

S. Mancini

Les éléments de configuration externe (pour les FPGAs de typeSRAM) doivent aussi être protégés.

52

Remèdes

Les FPGAs sont plus délicats à durcir :Les registres et la RAM

Ce sont les mêmes méthodes que les ASICs.

La configuration

Adopter des technologies moins sensibles aux SEUsLes anti-fusibles sont moins sensibles que les SRAM/EEPROM

Vérifier la configurationUtilisation de la configuration partielle des FPGAs pour vérifier les cellules

automatiquement.

Insérer de l’auto-contrôle des calculsInsérer des séquences connues dans les calculs pour vérifier les résultats

ROM de séquences et référencesLFSR

Une détection de faute provoque la reconfiguration du FPGA.


S. Mancini

Méthodologie de durcissement

Il est possible d’implanter des TMR de façon transparente.

Pour les FPGAS d’Actel, Synplify permet d’implanterdirectement :

des Flip-flop combinatoiredes TMRdes Flip-flop combinatoire avec TMR

En VHDL, cela se fait à l’aide d’attributs :architecture top of top isattribute syn_radhardlevel of top : architecture is "tmr_cc" ;...attribute syn_radhardlevel of counter_q : signal is "tmr" ;...


S. Mancini

Composants spécifiques

Actel propose des circuits résistants aux radiations :

Programmation par anti-fusibles résistantsSans registres

Les registres sont faits avec des éléments combinatoire

Avec des registres durcis

RT54SX-S RadTolerant FPGAs for Space Applications

4 Advanced v1.4

To achieve the SEU requirements, the D flip-flop in the RT54SX-S R-cell is enhanced (Figure 3). Both the master and slave “latches” are actually implemented with three latches. The feedback path of each of the three latches is voted with the outputs of the other two latches. If one of the three latches is struck by an ion and starts to change state, the voting with the other two latches prevents the change from feeding back and permanently latching. Care was taken in the layout to ensure that a single ion strike could not affect more than one latch.

Figure 4 is a simplified schematic of the test circuitry that has been added to test the functionality of all the components of the flip-flop. The inputs to each of the three latches are independently controllable so the voting circuitry in the feedback paths can be exhaustively tested. This testing is performed on an unprogrammed array during wafer sort, final test and post burn-in test. This test circuitry cannot be used to test the flip-flops once the device has been programmed.

Figure 3 • RT54SX-S R-Cell Implementation of D Flip-Flop Using Voter Gate Logic

Figure 4 • R-Cell Implementation— Test Circuitry

CLKCLK

D

CLK

Q

VoterGate

CLK

CLK

CLK

CLK

CLK

Tst1

CLK

D Q

VoterGate

Tst2

Tst3Test

Circuitry

1Les latchs sont séparées pour ne pas subir les mêmes

rayonnements.


S. Mancini

Efficacité des durcissements

Quelques circuits d’Actel :

LRH1280 0.8 µm ( A1280 )

GEO SEUFlip Flop 10−6

Flip Flop (CC) 10−7

TMR 10−10

RTAX 0.15 µm (AX 0.15 µm S-cell=TMR)SRAM Registre

Famille LETTH GEO SEU LETTH GEO SEUAX 1, 4 3.10−7 3, 36 > .. > 2, 89 10−6

RTAX 1, 4 10−10 (EDAC) > 37 < 10−10

Pas de SEL pour LET = 120 MeV-cm2/mgLETTH en MeV-cm2/mg

GEO SEU= erreur/bit/jour en orbite géostationnaire

52- ASIC vs FPGASoC

S. Mancini

Plan

IntroductionMéthodologie de conceptionDurcissement aux radiationsSoC

Rappels sur les SoCsEtude comparative

Bilan

53- ASIC vs FPGA

Constituants des SoCs

Les technologies actuelles permettent de mettre sur unmême circuit :

ASICProcesseursMémoire (SRAM et DRAM)Bus systèmesAnalogique

SoC=System on Chip.

Les circuit programmables permettent le même type deréalisation : les SoPC (System on Programmable Chip).

54- ASIC vs FPGASoC- Rappels sur les SoCs

S. Mancini

Un SoPC : Excalibur (Altera)

55- ASIC vs FPGASoC- Rappels sur les SoCs

S. Mancini

Plan

IntroductionMéthodologie de conceptionDurcissement aux radiationsSoC

Rappels sur les SoCsEtude comparative

Bilan

56- ASIC vs FPGA

Les microprocesseurs

ASIC

FPGA

Ils sont disponibles selon les besoins.

PrécaractérisésOptimisés par les fondeurs sous licence.

SynthétisablesModèles disponibles de haut niveau pour la synthèse. Certaines

parties doivent être adaptées à la technologies.

ParamétrablesLes processeurs s’adaptent aux besoins de l’application :

Taille et type des cachesMécanismes systèmes (TLB, adressage virtuel, . . . )Co-processeurs

Performances : MIPS 32 bits = 300 MHz

57- ASIC vs FPGASoC- Etude comparative

S. Mancini

Les microprocesseurs

ASIC

FPGA

On trouve deux type de processeurs :Synthétisables

Modèles génériques (ex Leon) ou processeur fournis par ven-

deurs de FPGAs (ex : NIOS (Altera), MicroBlaze (Xilinx)).

Ressources utilisées : RAM double port, CAM.

Performance ≈ 50 MHz

La limitation des ressources impose desprocesseurs simples.

Intégrés dans les FPGAExempleExcalibur ARM (Altera), Virtex II Pro (Xilinx)

Performance ≈ 300 MH

Leurs caractéristiques sont figées.


S. Mancini

Les bus

ASIC

FPGA

Les technologies sont adaptées aux be-soins.

Maître Maître

Esclave Esclave Esclave

Mux Mux Mux

Maître Maître

Esclave Esclave Esclave

Bus Trois-états Bus à multiplexeurs

et peuvent cohabiter dans un même circuit.


S. Mancini

Les bus

ASIC

FPGA

La technologie est imposée par les res-sourcesLes bus trois-états sont peu recomman-dés (et même souvent impossibles).Pour économi-ser la logique,l’arbitrage peutêtre fait au ni-veau de chaqueesclave : les filsd’interconnexionssont nombreux.

6 Altera Corporation

Avalon Bus Specification

The Avalon bus module is generated automatically by the SOPC Builder, so that the system designer is spared the task of connecting the bus and peripherals together. The Avalon bus module is very rarely used as a discrete unit, because the SOPC Builder will almost always be used to automate the integration of processors and other Avalon bus peripherals into a system module. The designer’s view of the Avalon bus module usually is limited to the specific ports that relate to the connection of custom Avalon peripherals.

Note that the Avalon bus module (an Avalon bus) is a unit of active logic that takes the place of passive, metal bus lines on a physical PCB. (See Example 2). In this context, the ports of the Avalon bus module could be thought of as the pin connections for all peripheral devices connected to a passive bus. The Avalon Bus Specification Reference Manual defines only the ports, logical behavior and signal sequencing that comprise the interface to the Avalon bus module. It does not specify any electrical or physical characteristics of a physical bus.

Figure 2. Avalon Bus Module Block Diagram - an example system

The Avalon bus module provides the following services to Avalon peripherals connected to the bus:

1Bus AvalonLes CPUs embarqués imposent des bus sys-tèmes.


S. Mancini

La mémoire

ASIC

FPGA

Les mémoires sont disponibles sousforme de blocs pré-caractérisés .ROM et RAM sont générées selon lesbesoins.Les technologies actuelles permettentla cohabitation de plusieurs types demémoires (SRAM, SDRAM, associatives,. . . ).Les ROMs sont crées sur-mesures.

UMC propose des bibliothèque et généra-teurs de SRAM.http ://www.umc.com/english/design/b_1.asp

Performances 0,13 µm : SRAM 1K x 16 ac-cess time = 1,1 ns


S. Mancini

La mémoire

ASIC

FPGA

Les FPGAs fournissent des blocs de mé-moire élémentaires (≈ 4 KOctets).Ils peuvent être assemblées pour former degrandes quantités.Les ROMs sont synthétisées en circuitscombinatoires.Pas de SDRAMs .

Xilinx XC2VP125 (Virtex II Pro) (0,13 µm )556 blocs de SRAM de 18Kbits = 10,008KbitsConfigurations

{16K x 1 bit 4K x 4 bits 1K x 18 bits8K x 2 bits 2K x 9 bits 512 x 36 bits

Timings

{Setup Prop Clk min

SelectRAM 0,4 1,5 1,3CLB 0,5 1,8 1,4


S. Mancini

Horloges multiples

ASIC

FPGA

Les ASICs permettent des architecturesde domaines d’horloges complexes.

Des FIFOs asynchrones adaptées per-mettent les changements de domaines : lesméta-stabilitées sont résolues.Chaque domained’horloge a sonarbre d’hor-loge propre.

82 horlogesdans lePNX8500(Philips).

chiplet timing, clock matching, and I/O tim-

ing analysis.

To achieve timing closure, we made engi-

neering change orders to the netlist after routing.

Following each manipulation step, formal verifi-

cation ensured that the modified netlist was func-

tionally equivalent to the one after test insertion.

We aligned all clock domains having syn-

chronous chiplet crossings. For example, if the

memory interface clock in one chiplet was syn-

chronously connected to the same clock in

another chiplet, we phase-aligned these clocks

and analyzed the signal paths to meet timing

constraints. We achieved clock alignment by

tweaking the clock insertion delays, using align-

ers in the clock module. Similarly, we made the

clock trees as structurally identical as possible.

As part of the physical design process, we met

design completion and manufacturability goals

by implementing techniques such as design rule

checks, antenna fixes, track filling, and doubling

of vias wherever possible. Figure 4 shows the lay-

out plot for the Viper design’s initial version.

Table 3 summarizes the major design

parameters.

WE HAVE LEARNED much from the Viper design

experience and trust it will guide us in the

future, particularly since the next-generation

SOC designs are significantly more complex,

calling for still higher levels of integration. Some

of our current activities, in addition to regular

chip-development tasks, are investigating more

efficient on-chip bus architectures and better

design-reuse methodologies. �

AcknowledgmentsWe thank the Viper management and design

teams for their hard work, particularly chief

architects Gert Slavenburg and Lane Albanese,

without whose foresight and leadership the pro-

ject never would have been successful.

References1. S. Rathnam and G. Slavenburg, “An Architectural

Overview of the Programmable Multimedia

Processor, TM-1,’’ Proc. 41st IEEE Computer

Society Int’l Conf. (COMPCON 96), IEEE CS

Press, Los Alamitos, Calif., 1996, pp. 319-326.

2. D. Paret and C. Fenger, The I2C Bus, John Wiley

& Sons, New York, 1997.

Santanu Dutta is a designengineering manager atPhilips Semiconductors inSunnyvale, California. Hisresearch interests includedesign of high-performance

Application-Specific SOC Multiprocessors

30 IEEE Design & Test of Computers

CAB MPEG

MBS+

VIP1+

VIP2

ICP1 + ICP2 + MMI

Conditionalaccess

(MSP1 + MSP2)T-PI

M-PI

TM32

1394

MSP3

PR3940

Figure 4. Layout of Viper (PNX8500).

Table 3. Design statistics.

Parameter Value

Process technology TSMC 0.18 µm, six metal layers

Transistors About 35 million

Instances 1.2 million instances, or 8 million gates

Memories 243 instances, 750-Kbit memory

CPUs 2 (TriMedia TM32 and MIPS PR3940)

Peripherals 50

Clock domains 82

Clock speed TM32: 200 MHz; PR3940: 150 MHz;

SDRAM: 143 MHz

Power 4.5 W

Supply voltage 1.8-V core and 3.3-V I/O

Package BGA456

1


S. Mancini

Horloges multiples

ASIC

FPGA

les arbres d’horloge sont déjà construits.Le nombre d’horloges est limité.

macro bloc Apex 20k

Les changements dedomaines sont déli-cats.

Xilinx propose desDigital Clock Ma-nagerles FIFOs asyn-

chrones sontfaites de cellulesdu FPGA : leurperformances sontlimitées .



Each global clock multiplexer buffer can be driven either bythe clock pad to distribute a clock directly to the device, orby the Digital Clock Manager (DCM), discussed in DigitalClock Manager (DCM), page 40. Each global clock multi-

plexer buffer can also be driven by local interconnects. TheDCM has clock output(s) that can be connected to globalclock multiplexer buffer inputs, as shown in Figure 47.

Global clock buffers are used to distribute the clock to someor all synchronous logic elements (such as registers inCLBs and IOBs, and SelectRAM+ blocks.

Eight global clocks can be used in each quadrant of theVirtex-II Pro device. Designers should consider the clockdistribution detail of the device prior to pin-locking and floor-planning. (See the Virtex-II Pro Platform FPGA UserGuide.)

Figure 48 shows clock distribution in Virtex-II Pro devices.

In each quadrant, up to eight clocks are organized in clockrows. A clock row supports up to 16 CLB rows (eight up andeight down).

To reduce power consumption, any unused clock branchesremain static.

Figure 47: Virtex-II Pro Clock Multiplexer Buffer Configuration

ClockPad

LocalInterconnect

ClockPad

ClockBuffer

Clock Multiplexer

I

O

Clock Distribution

CLKIN

CLKOUT

DCM

DS083-2_43_122001

Figure 48: Virtex-II Pro Clock Distribution

8

88

8

NW NE

SWSE

DS083-2_45_122001

8 BUFGMUX

8 max

8 BUFGMUX

16 Clocks

NW NE

SW SE

8 BUFGMUX

8 BUFGMUX

16 Clocks

1Horloges Virtex II Pro


S. Mancini

L’analogique

ASIC

FPGA

La plupart des technologies numériquessont compatibles avec l’analogique.

Les blocs analogiques sont conçus à partet intégrés à l’assemblage.Les zones numériques/analogiques sontséparées pour réduire le bruit d’horloge.


S. Mancini

L’analogique

ASIC

FPGA

Pas d’analogique intégrée.Les circuit analogiques programmablesexistent mais ils sont peu performants.

66- ASIC vs FPGABilan

S. Mancini

Plan

IntroductionMéthodologie de conceptionDurcissement aux radiationsSoCBilan

67- ASIC vs FPGA

Comparaisons de performances

Performances et complexité de la réalisation dumicroprocesseur LEON pour différentes ciblestechnologiques :

Technologie Complexité Fréquence

ASIC

Atmel 0.18 CMOS std-cell 35K gates + RAM 165 MHz (pre-layout)Atmel 0.25 CMOS std-cell 33K gates + RAM 140 MHz (pre-layout)UMC 0.25 CMOS std-cell 35K gates + RAM 130 MHz (pre-layout)Atmel 0.35 CMOS std-cell 2 mm2+ RAM 65 MHz (pre-layout))

FPGA

Xilinx XC2V500-6 (0.15 µm ) 4,800 LUT + 14/32 block RAM 65 MHz (post-layout)Altera 20K200C-7 (0.15 µm ) 5,700 LCELLs + EAB RAM (52%) 49 MHz (post-layout)Actel AX1000-3 (0.15 µm ) 7,600 cells + 14/36 RAM 48 MHz (post-layout)

http ://www.gaisler.com/


S. Mancini

Bilan

ASIC

FPGA

Maîtrise complète du projet

Maîtrise de la résistance aux radiationsCoûts réduits à grande échelle

Fort taux d’intégration

Performances maximum

Les erreurs coûtent cherConnaissance approfondie de la tech-nologie

NRE


S. Mancini

Bilan

ASIC

FPGA

Temps de développement réduits

Familles résistantes aux radiationsInvestissements réduitsContraintes d’architectureMéconnaissance des détails internes/caractéristiques

Relachement de l’attentionAccroissement des risques de pannes

Coûts unitaires élevéesComplexité limitée

Performances limitées70- ASIC vs FPGA

BilanS. Mancini

Conclusion

Choisir entre un FPGA et un ASIC ?Surface/coût

Efficacité

Souplesse

Réutilisabilité

Temps de développement

DébitsConsommationArchitecture mémoire

Puissance de calcul

Technologie

Fonctionnalité

?

... ça dépend ...


S. Mancini

Références


S. Mancini

ASIC vs FPGA

S. Mancini

Plan Détaillé

Introduction

Problématique

ASICs

Les famillesEvolution des technologie

FPGA

PrincipeTechnologies de programmationActel (ProAsic)Actel (Axcelerator)Xilinx (Spartan 3/Virtex II)Altera (Apex/Stratix)

Modèles de coûts

Coûts des FPGAsCoût des ASICSComparaisonLes circuits multi-projets

Méthodologie de conception

Méthodes communes

Flot de conception

Spécificité des ASICs

Synthèse directe

Composants "précaractérisés"-IPLe “Back-End”

Spécificité des FPGAs

Modèles d’entréesPlacement/routageUtilisation des ressources

Le prototypage : FPGA vers ASIC

PrincipeExemple : Aptix

Exemple de projet “multi-plateforme” :LEON

Architecture de LEONCibles technologiquesOrganisation du projetExemple de code

Durcissement aux radiations

Single Event Upset (SEU)Single Event Transient (SET)Latchup

Durcissement des ASICs

Principales méthodesLes registresLes mémoires

Méthodologies de durcissement

Durcissement des FPGAs

Origine des disfonctionnementsRemèdesMéthodologie de durcissementComposants spécifiquesEfficacité des durcissements

SoC

Rappels sur les SoCs

Constituants des SoCsUn SoPC : Excalibur (Altera)

Etude comparative

Les microprocesseursLes busLa mémoireHorloges multiplesL’analogique

Bilan

Comparaisons de performancesBilanConclusionRéférences

Documents

ASIC Vs FPGA