Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
VirtexVirtex--4 DSP Architecture4 DSP Architecture��2 DSP slices per tile2 DSP slices per tile
��1616--256 tiles in 1256 tiles in 1--8 8 columnscolumns
�Each DSP includes:�3-input, 48-bit
adder/subtractor ��P = ZP = Z±±(X+Y+Cin)(X+Y+Cin)�Optional accum reg
××××±±±±
X
Y
ZC(48)
A(18)B(18) P
(48)
Inputs for cascadingOutputs w/ dedicated routing
Outputs w/ dedicated routing
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 11
�Optional accum reg�18x18-bit 2's-comp
multiplier (w/o adder)��User controlled User controlled
operational modesoperational modes��For X, Y, & Z MUXsFor X, Y, & Z MUXs
��Configuration bits Configuration bits control other MUXscontrol other MUXs��Pipelining registersPipelining registers��Accumulator registerAccumulator register
××××±±±±
X
Y
Z
A(18)B(18) P
(48)
Inputs for cascading
Outputs w/ dedicated routing
Adder BISTAdder BIST��Test algorithm depends on architectureTest algorithm depends on architecture
��But architecture is not specified in data sheetsBut architecture is not specified in data sheets��Eliminate sequential logic architecturesEliminate sequential logic architectures�� “Based on modified Booth”“Based on modified Booth”
��Adder choices include:Adder choices include:��Ripple carryRipple carry
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 22
��Carry selectCarry select��Carry saveCarry save��CarryCarry--looklook--ahead (CLA)ahead (CLA)
��Our assumption based on area/performance analysisOur assumption based on area/performance analysis��But multiple types of But multiple types of CLACLA
��Our goal: find/develop architecture independent Our goal: find/develop architecture independent test algorithm(s) test algorithm(s)
CarryCarry--LookLook--Ahead AdderAhead Adder
��Recall CLA was Recall CLA was more difficult to testmore difficult to test
��Basic CLA is 4Basic CLA is 4--bitsbits��44--bit CLAs then bit CLAs then
combined to form combined to form
Gi=Ai•BiPi=Ai+Bi
FullAdder
A3 B3
S
FullAdder
A2 B2
S
FullAdder
A1 B1
S
FullAdder
A0 B0
S
C0
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 33
combined to form combined to form larger adderslarger adders��Ripple CLAsRipple CLAs��2 types based on 2 types based on
Lookahead Carry Lookahead Carry Unit (LCU):Unit (LCU):��Ripple LCURipple LCU��MultiMulti--stage LCUstage LCU
C1=G0+P0•C0C2=G1+G0•P1+P1•P0•C0C3=G2+G1•P2+G0•P1•P2+P2•P1•P0•C0C4=G3+G2•P3+G1•P2•P3+G0•P1•P2•P3+P3•P2•P1•P0•C0
S3 S2 S1 S0
P3G3 C3 P2G2 C2 P1G1 C1 P0G0
4-bit Carry Look Ahead PG GG
C4
PG=P0•P1•P2•P3GG=G3+G2•P3+G1•P2•P3+G0•P1•P2•P3
CLA Test AlgorithmsCLA Test Algorithms��“On the Adders with Minimum Tests”“On the Adders with Minimum Tests”
��KajiharaKajihara and and SasaoSasao��Proc. VLSI Test Proc. VLSI Test SympSymp, pp. 10, pp. 10--15, 1997 (VTS’97)15, 1997 (VTS’97)
��10 vectors detect all single and multiple faults10 vectors detect all single and multiple faults��In any size In any size rippleripple CLA (CLA (not an LCU implementationnot an LCU implementation))
��“Scalable Test Generators for High“Scalable Test Generators for High--Speed Speed
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 44
DatapathDatapath Circuits”Circuits”��AlAl--AsaadAsaad, Hayes, and Murray, Hayes, and Murray
��J. Electronic Testing, J. Electronic Testing, volvol 12, pp. 11112, pp. 111--125, 1998 (JETTA’98)125, 1998 (JETTA’98)
��22××((NN+1) vector sequence (for an +1) vector sequence (for an NN--bit adder)bit adder)��TPG implementation requires:TPG implementation requires:
��NN+1+1--bit shift registerbit shift register��NN XOR gates, XOR gates, NN XNOR gates, and 1 inverterXNOR gates, and 1 inverter
N+1-bit Serial Shift Register
CLA BIST SchemeCLA BIST Scheme�Easy BIST circuit to implement
�But we found a problem in design�2 missing patterns needed for 100% FC
�Replace inverter with flip-flop�2×(N+2) vector sequence
11111111100000000001111111110000000001111111110000000000111111110100000000111111110110000000111111110111000000111111110111100000111111110111110000111111110111111000111111110111111100111111110111111110111111111
Ai Bi Cin
reset
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 55
Qi Qi+1
to CLAcarry-in
Ai Bi
N+1-bit Serial Shift Register011111111011111111100000000011111111110000000001111111110000000001111111111000000001011111111000000001001111111000000001000111111000000001000011111000000001000001111000000001000000111000000001000000011000000001000000001000000000
Fault Simulation ResultsFault Simulation Results��JETTA’98 approach gives best overall fault coverage JETTA’98 approach gives best overall fault coverage
regardless of adder implementationregardless of adder implementation��Undetected faults in JETTA’98 approach can be detectedUndetected faults in JETTA’98 approach can be detected
��Results in “New BIST” column for Results in “New BIST” column for 2×(N+2) vector sequencevector sequence
��JETTA’98 also claims similar BIST approach for JETTA’98 also claims similar BIST approach for ModifiedModified--Booth multiplierBooth multiplier
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 66
ModifiedModified--Booth multiplierBooth multiplier��But description of test algorithm is very sketchyBut description of test algorithm is very sketchy
48-bit CLA AdderImplementation
GateDelays
#Faults
Test Algorithm
VTS’97 JETTA’98 New BIST
Ripple CLA 28 1392 100% 99.9% 100%
Ripple LCU 12 1542 95.7% 99.9% 100%
Multi-stage LCU 10 1506 95.9% 99.9% 100%
Adder in VirtexAdder in Virtex--4 DSP4 DSP��Adder has 3 input portsAdder has 3 input ports
��P = ZP = Z±±(X+Y+Cin)(X+Y+Cin)��We interpret this as a 2We interpret this as a 2--stage CLA stage CLA
adder/subtractor implementationadder/subtractor implementation
��Apply test patterns to each stage in turnApply test patterns to each stage in turn��2 clock cycles2 clock cycles
per vectorper vector
48-bit CLA
48-bit CLA
(X MUX)A port
(Y MUX)B port
(Z MUX)C port CIN
Subtract
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 77
per vectorper vector��OPMODEOPMODE
controlcontrolClock cycle #1Clock cycle #1X test vectorX test vectorClock cycle #2Clock cycle #2Y test vectorY test vectorClock cycle #2Clock cycle #2Z test vectorZ test vector
BIST Approach for VirtexBIST Approach for Virtex--5 DSP5 DSP
Larger multiplier butsame test algorithm
Optional regs like V4 butdata sheets have less info
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 88
same test algorithm
Logical operations but48-bit cascade of A:Ballows direct testing
Pattern detect but knownalgorithm for = comparator
Multiplier Multiplier BISTBIST��Test algorithm depends on architectureTest algorithm depends on architecture
��VirtexVirtex--4/5 architecture is not specified in data sheets4/5 architecture is not specified in data sheets��Eliminate sequential logic architecturesEliminate sequential logic architectures�� “Based on modified Booth”“Based on modified Booth”
��Multiplier choices include:Multiplier choices include:��Unsigned ArrayUnsigned Array
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 99
��Baugh Baugh WooleyWooley��Modified BoothModified Booth��Modified Booth/Wallace treeModified Booth/Wallace tree
��Our assumption based on area/performance analysisOur assumption based on area/performance analysis
��Our goal: find/develop architecture independent Our goal: find/develop architecture independent test algorithm(s)test algorithm(s)
Modified Booth Test AlgorithmsModified Booth Test Algorithms��Test algorithm uses 8Test algorithm uses 8--bit counter bit counter (256 vectors)(256 vectors)
��““Effective BuiltEffective Built--In SelfIn Self--Test for Booth Multipliers”Test for Booth Multipliers”��GizopoulosGizopoulos, Paschalis & , Paschalis & ZorianZorian
�� IEEE Design & Test of Computers, 1998IEEE Design & Test of Computers, 1998��Claim fault coverage ~ 99.8%Claim fault coverage ~ 99.8%
��4x4 connections to multiplier inputs4x4 connections to multiplier inputs××××
AAAAAAAA BBBBBBBBAAAAAAAA BBBBBBBB76543210 7654321076543210 7654321000000000 0000000000000000 0000000000000000 0001000100000000 0001000100000000 0010001000000000 0010001000000000 0011001100000000 00110011
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 1010
××××nn
2n
Booth encoding
n××××n multiplier
8-bit counterMSB LSB
4 4
4××××4 algorithm 00000000 0011001100000000 00110011… …… …
00000000 1110111000000000 1110111000000000 1111111100000000 1111111100010001 0000000000010001 0000000000010001 0001000100010001 0001000100010001 0010001000010001 00100010
CCCCCCCC CCCCCCCC CCCCCCCCCCCCCCCC76547654 3210321076547654 32103210
��Test algorithm also uses 8Test algorithm also uses 8--bit counter bit counter (256 vectors)(256 vectors)��“Effective BIST Architecture for Fast Multiplier Cores”“Effective BIST Architecture for Fast Multiplier Cores”
��Paschalis, Paschalis, KranitisKranitis, , PsarakisPsarakis, , GizopoulusGizopoulus & & ZorianZorian�� Proc. Design AutomationProc. Design Automation
and Test in Europe Conf., 1999and Test in Europe Conf., 1999
��“Low Power BIST for Wallace “Low Power BIST for Wallace TreeTree--based Fast Multipliers”based Fast Multipliers”��BakalisBakalis, , KalligerosKalligeros, , NikolosNikolos,,
××××
AAAAAAAA BBBBBBBBAAAAAAAA BBBBBBBB76543210 7654321076543210 7654321000000000 0000000000000000 0000000000000000 0100100100000000 0100100100000000 1001001000000000 1001001000000000 1101101100000000 11011011
Modified Booth Wallace Tree AlgorithmsModified Booth Wallace Tree Algorithms
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 1111
��BakalisBakalis, , KalligerosKalligeros, , NikolosNikolos,,VergosVergos & & AlexiouAlexiou�� Proc. Int. Proc. Int. SympSymp. on . on
Quality of Electronic Quality of Electronic Design, 2000Design, 2000
��5x3 connections with 5x3 connections with 5 inputs to Booth5 inputs to Boothencoding portencoding port��Both papers claimBoth papers claim
fault coverage > 99%fault coverage > 99%
××××nn
2n
Booth encoding
n××××n multiplier
8-bit counterMSB LSB
5 3
5××××3 algorithm00000000 1101101100000000 11011011
… …… …00000000 1011011000000000 1011011000000000 1111111100000000 1111111100100001 0000000000100001 0000000000100001 0100100100100001 0100100100100001 1001001000100001 10010010
CCCCCCCC CCCCCCCC CCCCCCCCCCCCCCCC54376543 0121021054376543 01210210
Modified Booth Test AlgorithmsModified Booth Test Algorithms��Test algorithm uses 8Test algorithm uses 8--bit counter bit counter (256 vectors)(256 vectors)
��But which side is Booth encoding?But which side is Booth encoding?��Xilinx does not specifyXilinx does not specify
��Our original approachOur original approach��Run 5x3 algorithmRun 5x3 algorithm
��256 vectors256 vectors nn
8-bit counterMSB LSB
5 3
5××××3 algorithm
3 5
3××××5 algorithm
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 1212
��256 vectors256 vectors
��andand run 3x5 algorithmrun 3x5 algorithm��512 vectors512 vectors
��Include 4x4 if fault coverage improvesInclude 4x4 if fault coverage improves��768 vectors768 vectors
��Additional algorithms only require multiplexers to Additional algorithms only require multiplexers to change inputschange inputs��Use same 8Use same 8--bit counterbit counter
××××nn
2n
Booth encoding
n××××n multiplier
AnalysisAnalysis��Multipliers evaluatedMultipliers evaluated
��Unsigned array Unsigned array ��Signed array Signed array –– Baugh Baugh WooleyWooley��Modified Booth Modified Booth
��Carry lookCarry look--ahead adders sum partial products in every stageahead adders sum partial products in every stage
��Modified Booth Wallace Tree Modified Booth Wallace Tree
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 1313
��Modified Booth Wallace Tree Modified Booth Wallace Tree ��Carry lookCarry look--ahead adder sums final stage partial products ahead adder sums final stage partial products ��Carry select adder sums final stage partial products Carry select adder sums final stage partial products ��Ripple carry adder sums final stage partial products Ripple carry adder sums final stage partial products
��Custom Implementation of Modified BoothCustom Implementation of Modified Booth
AnalysisAnalysis��Designed 8Designed 8--bit models of the multipliersbit models of the multipliers��Fault model: Collapsed single stuckFault model: Collapsed single stuck--at at
gate level faultsgate level faults��Exhaustive testingExhaustive testing
��To determine undetectable faultsTo determine undetectable faults
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 1414
��To determine undetectable faultsTo determine undetectable faults
��Test algorithms evaluatedTest algorithms evaluated��44××4 4 ��55××33��33××55��55××3 & 33 & 3××55��44××4, 54, 5××3 & 33 & 3××55
MultiplierTotal Faults
Test Algorithm# faults detected (effective fault coverage)
Exhaust 4×4 5×3 3×55×3 & 3×5
5×3, 3×5 & 4×4
Unsigned array 16481644 (100)
1644 (100)
1644 (100)
1621 (98.60)
1644 (100)
1644 (100)
Signed array 16481644 (100)
1644 (100)
1644 (100)
1644 (100)
1644 (100)
1644 (100)
Mod-Booth 24992196 (100)
2180 (99.27)
2168 (98.72)
2179 (99.23)
2182 (99.36)
2193 (99.86)
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 1515
Mod-Booth 2499(100) (99.27) (98.72) (99.23) (99.36) (99.86)
Mod-Booth Wall-Tree CLA
21842090 (100)
2061 (98.61)
2068 (98.95)
2070 (99.04)
2071 (99.09)
2074 (99.23)
Mod-Booth Wall-Tree CSA
24222243 (100)
2215 (98.75)
2217 (98.84)
2218 (98.89)
2222 (99.06)
2228 (99.33)
Mod-Booth Wall-Tree RCA
20211962 (100)
1937 (98.73)
1944 (99.08)
1944 (99.08)
1944 (99.08)
1947 (99.24)
Custom Mod-Booth
19081805(100)
1781(98.67)
1787(99.00)
1785(98.89)
1791(99.22)
1793(99.34)
SummarySummary��If the architecture of the multiplier is not known:If the architecture of the multiplier is not known:
��33××5 algorithm gives best overall fault coverage for 5 algorithm gives best overall fault coverage for most multipliersmost multipliers��Contradicting the claim of the authors who proposed 5Contradicting the claim of the authors who proposed 5××33
��Running 3Running 3××5 & 55 & 5××3 gives better fault coverage for all 3 gives better fault coverage for all multipliersmultipliers
��Running all three algorithms: 3Running all three algorithms: 3××5, 55, 5××3 and 43 and 4××4 test 4 test
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 1616
��Running all three algorithms: 3Running all three algorithms: 3××5, 55, 5××3 and 43 and 4××4 test 4 test algorithms provides the best fault coverage for all algorithms provides the best fault coverage for all multipliersmultipliers��Architecture independent testingArchitecture independent testing
��VirtexVirtex--4 & Virtex4 & Virtex--5 multipliers5 multipliers��Original approach was 3Original approach was 3××5 and 55 and 5××33��Better approach would be 3Better approach would be 3××5 and 5 and 44××44
SummarySummary��Adder test algorithm in JETTA’98Adder test algorithm in JETTA’98
��Easy to implement and excellent FC on all Easy to implement and excellent FC on all adders (stuckadders (stuck--at and bridging faults)at and bridging faults)��100% FC on most adders100% FC on most adders
��Easily adapted to Easily adapted to subtractorssubtractors & & adder/adder/subtractorssubtractorsadder/adder/subtractorssubtractors
��Used in BIST for VirtexUsed in BIST for Virtex--4 & 5 FPGAs4 & 5 FPGAs
��Both test algorithms (adder & multiplier) Both test algorithms (adder & multiplier) wwill be used on Spartanill be used on Spartan--66 FPGA DSPsFPGA DSPs��Currently under developmentCurrently under development
C. Stroud 1/08C. Stroud 1/08 VLSI D&T SeminarVLSI D&T Seminar 1717
Read More About ItRead More About It��M. M. PulukuriPulukuri, G. Starr & C. Stroud, “On BIST for , G. Starr & C. Stroud, “On BIST for
Multipliers,” Proc. Multipliers,” Proc. IEEE Southeast Regional IEEE Southeast Regional Conf.Conf., pp. 25, pp. 25--28, 201028, 2010
��M. M. PulukuriPulukuri & C. Stroud, “On BIST for Adders,” & C. Stroud, “On BIST for Adders,” J. Electronic Testing: Theory & ApplicationsJ. Electronic Testing: Theory & Applications, pp. , pp. 343343--346, 2009346, 2009343343--346, 2009346, 2009
��M. M. PulukuriPulukuri & C. Stroud, “BIST for DSPs in & C. Stroud, “BIST for DSPs in VirtexVirtex--4 FPGAs,” 4 FPGAs,” Proc. IEEE Southeast Proc. IEEE Southeast SympSymp. . on System Theoryon System Theory, pp. 34, pp. 34--38, 200938, 2009
��M. M. PulukuriPulukuri, , BIST of DSP Cores in VirtexBIST of DSP Cores in Virtex--4 & 5 4 & 5 FPGAsFPGAs, AU MS Thesis, 2010, AU MS Thesis, 2010
10/15/201010/15/2010 VLSI D&T SeminarVLSI D&T Seminar 1818