Upload
trinhtruc
View
227
Download
0
Embed Size (px)
Citation preview
A Behavioral Synthesis Frontend to the Haste/TiDE design flow
Sune F. NielsenJens SparsøJonas B. Jensen Johan S. R. Nielsen
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
2 DTU Informatics, Technical University of Denmark
Introduction / Motivation
• Most high-level async tools are based on syntax-directed translation (Haste/TiDE, Balsa, …)
• One-to-one mapping from source program to implementation. Handshake components = syntactic elements in source code.
• No automatic optimization. Designer has to implement alternative solutions.
• Optimized code very hard to understand. • Very different from behavioural synthesis in synchronous
design: specification + constraints/goals for the desired implementation.
• Idea underlying this work: adapt existing (synchronous) behavioural synthesis techniques to the domain of asynchronous design.
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
3 DTU Informatics, Technical University of Denmark
Outline
1. Motivation, background, and contributions.2. Overview of the design flow.3. Related work.4. CDFG representation of a source program5. Data-path implementation6. Haste code generation7. Results8. Conclusion
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
4 DTU Informatics, Technical University of Denmark
The Haste/TiDE design flow
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
5 DTU Informatics, Technical University of Denmark
This work – an extension to The Haste/TiDE design flow
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
6 DTU Informatics, Technical University of Denmark
This work – an extension to The Haste/TiDE design flow
• One-to-one mapping used to advantage.
• Direct control of synthesized implementation at high level
• Avoids tedious back-end design
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
7 DTU Informatics, Technical University of Denmark
Previous work at DTU• S. F. Nielsen. Behavioral synthesis of asynchronous
circuits. PhD thesis, DTU Informatics, 2005.• S. F. Nielsen, J. Sparsø, and J. Madsen. ”High-level
synthesis of asynchronous circuits using syntax directed translation as backend.” IEEE Transactions on VLSI Systems, 17(2):248–261, Feb. 2009. (subm. Nov. 2006)
Targeted the Balsa system from U. of Manchester. Demonstrated the idea.Shortcomings:• Assumed the existence of a CDFG-representation of the
code to be synthesized. CDFG -> Synthesis -> Balsa program.
• Limited set of CDFG’s: A loop with DFG-body. No procedures and functions
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
8 DTU Informatics, Technical University of Denmark
Contributions of the current work
A fully automatic, Haste-in-Haste-out synthesis tool, which can handle large non-trivial code examples.
1. The tool now supports the Haste language. 2. Front-end: Source code -> CDFG representation.3. Much larger set of source language constructs, with
arbitrary compositions of loops, procedures etc. 4. Haste allow certain circuit level optimizations not
available in Balsa.5. Haste/TiDE tool flow provides reliable (i.e. realistic)
area and power figures: • Cost functions in optimization• Benchmark results
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
9 DTU Informatics, Technical University of Denmark
Contributions of the current work
A fully automatic, Haste-in-Haste-out synthesis tool, which can handle large non-trivial code examples.
1. The tool now supports the Haste language. 2. Front-end: Source code -> CDFG representation.3. Much larger set of source language constructs, with
arbitrary compositions of loops, procedures etc. 4. Haste allow certain circuit level optimizations not
available in Balsa.5. Haste/TiDE tool flow provides reliable (i.e. realistic)
area and power figures: • Cost functions in optimization• Benchmark results
Limitations•Arrays•Arbitration•Direct I/O•Parallel write access to variables•Function pointers.
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
10 DTU Informatics, Technical University of Denmark
Flow overview (Haste code example)& word = type [0..2^16−1]& EX: main proc(X0,X1,X2?chan word & Y0,Y1!chan word).begin
& a0 = const 255& a1 = const 255& a2 = const 255 & a3 = const 255& x0,x1,x2,y0,y1: var word| forever do
X0?x0 || X1?x1 || X2?x2 ;y0 := ((a0+x0)+(x0*x1)−a1) fit word ;if x1>a2 then
y1:= (a3*(x1+x2)) fit wordelse
y1:= (x1−x2) fit wordfi ;Y0!y0 || Y1!y1
odend
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
11 DTU Informatics, Technical University of Denmark
Flow overview (CDFG for loop body)
X0?x0 || X1?x1 || X2?x2 ;y0 := ((a0+x0)+(x0*x1)−a1) fit word ;if x1>a2 theny1:= (a3*(x1+x2)) fit word
elsey1:= (x1−x2) fit word
fi ;Y0!y0 || Y1!y1
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
12 DTU Informatics, Technical University of Denmark
Flow overview (Generic data-path template)
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
13 DTU Informatics, Technical University of Denmark
Flow overview (Haste implementation of generic datapath)
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
14 DTU Informatics, Technical University of Denmark
Flow overview (Behavioral synthesis)• Scheduling:
–Fine grain discrete-time model.–Operator nodes of CDFG placed into (one or more)
time-slots • Allocation:
–Determine (minimum) required hardware resources: functional units, registers/variables, multiplexing, and control.
• Binding:–Operator nodes and (temporary) variables are mapped
onto specific hardware resources: latches/FFs and FU’s.
• Optimization using simulated annealing (cost function = area or speed)
• Solution space reduced by ASAP and ALAP schedules.
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
15 DTU Informatics, Technical University of Denmark
Scheduling, assignment and binding
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
16 DTU Informatics, Technical University of Denmark
Related work[Ref 6] Cortadella, Badia, Pastor and Pardo. (UPC Barcelona)
6th International Workshop on High-Level Synthesis, 1992.CDFG. Scheduling, allocation and binding. Distibuted control specified at STG-level.
[Ref 1] Bachman, Zheng and Myers. (U. of Utah) ICCD’99: DFG only. Synthesis algorithm and its runtime. No implementation.
[Ref 15] Kim, Lee, Lee. (KAIST, Korea) ASYNC 2000CDFG + sched., alloc. and bind. Focus on controller impl. STG.
[Ref 4] Budiu, Venkataramani, Chelcea and Goldstein. (CMU) ASPLOS’04. CASH-system. ANSI C. Functional dataflow representation. Direct implementation. Pipelining.
[ref 21] Sacker, Brown, Wilson, Rushton. (U. Southhampton) ASYNC’04Extending the MOODS-system to handle async. VHDL subset.Average case performance. Unconvential implementation of control.
[Ref 12] Hansen, Singh. (UNC, Chapel Hill) ASYNC’08. Haste -> Haste. Basic blocks. Focus on speed, parallelism, pipelining.
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
17 DTU Informatics, Technical University of Denmark
Source code to CDFG
• More elaborate than what is hinted in most literature. Precise definitions are hard to find.
• Fundamentally a CDFG is a 1-bounded Petri net.• Source code and CDFG must have same black box
behaviour.
X0?x0 || X1?x1 || X2?x2 ;
y0 := ((a0+x0)+(x0*x1)−a1) fit word ;if x1>a2 theny1:= (a3*(x1+x2)) fit word
else
y1:= (x1−x2) fit wordfi ;Y0!y0 || Y1!y1
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
18 DTU Informatics, Technical University of Denmark
Source code to CDFG
• More elaborate than what is hinted in most literature. Precise definitions are hard to find.
• Fundamentally a CDFG is a 1-bounded Petri net.• Source code and CDFG must have same black-box
behaviour.
X0?x0 || X1?x1 || X2?x2 ;
y0 := ((a0+x0)+(x0*x1)−a1) fit word ;if x1>a2 theny1:= (a3*(x1+x2)) fit word
else
y1:= (x1−x2) fit wordfi ;Y0!y0 || Y1!y1
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
19 DTU Informatics, Technical University of Denmark
Source code to CDFG
• More elaborate than what is hinted in most literature. Precise definitions are hard to find.
• Fundamentally a CDFG is a 1-bounded Petri net.• Source code and CDFG must have same black-box
behaviour.
X0?x0 || X1?x1 || X2?x2 ;
y0 := ((a0+x0)+(x0*x1)−a1) fit word ;if x1>a2 theny1:= (a3*(x1+x2)) fit word
else
y1:= (x1−x2) fit wordfi ;Y0!y0 || Y1!y1
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
20 DTU Informatics, Technical University of Denmark
Source code to CDFG
• More elaborate than what is hinted in most literature. Precise definitions are hard to find.
• Fundamentally a CDFG is a 1-bounded Petri net.• Source code and CDFG must have same black-box
behaviour.
X0?x0 || X1?x1 || X2?x2 ;
y0 := ((a0+x0)+(x0*x1)−a1) fit word ;if x1>a2 theny1:= (a3*(x1+x2)) fit word
else
y1:= (x1−x2) fit wordfi ;Y0!y0 || Y1!y1
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
21 DTU Informatics, Technical University of Denmark
Source code to CDFG
• More elaborate than what is hinted in most literature. Precise definitions are hard to find.
• Fundamentally a CDFG is a 1-bounded Petri net.• Source code and CDFG must have same black-box
behaviour.
X0?x0 || X1?x1 || X2?x2 ;
y0 := ((a0+x0)+(x0*x1)−a1) fit word ;if x1>a2 theny1:= (a3*(x1+x2)) fit word
else
y1:= (x1−x2) fit wordfi ;Y0!y0 || Y1!y1
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
22 DTU Informatics, Technical University of Denmark
Complete CDFG
… including forever-do loop ForkFork
Fork
Join / Sync
Join / Sync
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
23 DTU Informatics, Technical University of Denmark
CDFG nodes for Send and Receive
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
24 DTU Informatics, Technical University of Denmark
CDFG nodes for Send and Receive
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
25 DTU Informatics, Technical University of Denmark
CDFG nodes for Send and Receive
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
26 DTU Informatics, Technical University of Denmark
CDFG nodes for Send and Receive
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
27 DTU Informatics, Technical University of Denmark
CDFG nodes for procedure call
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
28 DTU Informatics, Technical University of Denmark
Implementation and optimizations
Variables• Latches and flip-flops
Functional Units• Chaining• Multiple pull-type output channels. Same FU computes
different results: a<b, a-b, etc
Functions and procedures:• Input muxes can be avoided when input parameters are the
same (variables), by defining a parameterless function.
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
29 DTU Informatics, Technical University of Denmark
Functional Unit Templates
Basic FU ChainingMultiple
(alternative)pull-typeoutputs
Negationenables
swappinginputs
Combining FU templates
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
30 DTU Informatics, Technical University of Denmark
Synthesized Haste code
Input and output channels declared
Variables declared
Basic FU’s ”MUL0”and ”ALU0” declared
FU ” MUL1” w. two output’s declared- subb.0 difference (i.e., a-b)- subb.1 is sign (i.e., a>b)
Constant declarations. Missing!
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
31 DTU Informatics, Technical University of Denmark
Synthesized Haste code
ALU0 used to compute v3:=a0+v0
ALU0 chained with MUL0 v3:=v3+v1*v2
ALU1 used to compute v3:=v3-a1
if v1>a2 then …[v1-a2; if not bit_16 then … ]
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
32 DTU Informatics, Technical University of Denmark
Results – optimizing for area
Area: 5-58% reduction (avg. 30%)
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
33 DTU Informatics, Technical University of Denmark
Results – optimizing for speed
Speed: Up to 67% reduction (avg. 40%)
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
34 DTU Informatics, Technical University of Denmark
Conclusion• A fully automatic Haste-in-Haste-out synthesis tool.• The tool can handle large non-trivial subset of Haste.• Results:
–Area: 5-58% reduction (avg. 30%)–Speed: 0-67% reduction (avg. 40%)
• Source-to-source optimization (behavioural synthesis) combined with syntax-directed-translation is a promising approach.
• Using syntax-directed-translation as backend for a synthesis system from <your favourite high level language> seems promising as well.
IEEE Async'09A Behavioral Synthesis Frontend to the Haste/TiDE design flow
35 DTU Informatics, Technical University of Denmark
References
• S. F. Nielsen. Behavioral synthesis of asynchronous circuits. PhD thesis, DTU Informatics, 2005.
• S. F. Nielsen, J. Sparsø, and J. Madsen. High-level synthesis of asynchronous circuits using syntax directed translation as backend. IEEE Transactions on VLSI Systems, 17(2):248–261, Feb. 2009.
• J. B. Jensen and J. S. R. Nielsen. Compiling from Haste to CDFG: A Front end for an Asynchronous Circuit Synthesis System. Bachelor’s thesis, Dept. of Informatics and Mathematical Modelling, Technical University of Denmark, June 2007.