I I I I T T
DynaCOREDynaCOREDynamically Reconfigurable Dynamically Reconfigurable CoprocessorCoprocessor
for Network Processorsfor Network Processors
CarstenCarsten Albrecht, Albrecht, Roman KochRoman Koch , , ChristophChristoph OsterlohOsterloh ,,ThiloThilo PionteckPionteck , Erik , Erik MaehleMaehle
InstitutInstitut ffüürr TechnischeTechnische InformatikInformatik
UniversitUniversitäät zu Lt zu Lüübeckbeck
Head: Prof. Dr.Head: Prof. Dr.--IngIng. Erik . Erik MaehleMaehle
DFGDFG--SPPSPP--1148 Final Colloquium1148 Final Colloquium
Karlsruhe, September 24Karlsruhe, September 24thth 20092009
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering2
OverviewOverview
Introduction
System Architecture� Key Components� Internal Interconnect
Runtime-Adaptive Network-on-Chip� Architecture� Buffer Sizes
Fault Tolerance� Fault Scenarios� Stepwise Procedure
Modelling DynaCORE� Principles� DynaCore Model� Simulation
Runtime Reconfiguration� Point of Reconfiguration� Technical Aspects
Evaluation and Demonstrator
Publications
Summary
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering3
IntroductionIntroduction (1/2)(1/2)
In-transit packet processing in edge routers
Header processing� Routing� Quality-of-Service� Accounting
Situation
Payload processing� Encryption/decryption� Compression� Intrusion Detection
Processing tasks
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering4
IntroductionIntroduction (2/2)(2/2)
DynaCORE = Dynamically adaptable COprocessor based on Reconfiguration
Reconfigurable hardware accelerator for payload processing
Allows flexible adaptation to changes in network traffic profile→ Dynamic partial reconfiguration of FPGA
Combination ofNetwork processor(e. g. FlexPath NP)
→header processing
+ DynaCORE(in Xilinx Virtex-4 FX)
→payload processing
Loose coupling� Gigabit Ethernet� Suitable for various network processors
DynaCORE Approach
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering5
System System ArchitectureArchitecture (1/3)(1/3)
InterfaceInterface
Type HType SType 0ApplicationspecificHardware Assist1
Hardware Assist2Hardware Assist3
HardwareAssist4Transmit-UnitReceive-UnitDispatcherReconfiguration Manager(HW + SW)
External memoryICAPReconfigurationLogic
Static partition Dynamic partition
Type V
Overview
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering6
System System ArchitectureArchitecture (2/3)(2/3)
Transmit Unit� Send processed packets back to NP
Receive Unit/Dispatcher� Recognise requested type of processing� Assign packets to suitable hardware assists� Report to reconfiguration manager in case of unassignable packets
Reconfiguration Manager� Implemented as software running on embedded PowerPC� Collect utilisation information from hardware assists,decide when and how to reconfigure� Control actual process of reconfiguration,i.e. send configuration data to reconfiguration logic
Reconfiguration Control Logic� Write configuration data to FPGA-internal configuration access port (ICAP)
Software-based Hardware Assist� Backup processing unit� Utilises additional hard-wired PowerPC cores (UltraController II)
Components in the Static PartitionI/O Interface
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering7
System System ArchitectureArchitecture (3/3)(3/3)
Hardware Assists� Actual payload processing modules� Equipped with universal, algorithm-independent interface� Embedded off-the-shelf IP cores
Switches� Forward packets from static partition to HAs and back
Components in the Dynamic Partition
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering8
RuntimeRuntime --AdaptiveAdaptive NetworkNetwork --onon --ChipChip (1/2)(1/2)
� NoC architecture for runtime reconfigurable FPGAs� Virtual cut-through switches with for equal full-duplex links (16 bit)� Low hardware overhead compared to other NoCsSwitches not needed for a certain setting of processing units can be removed from the network → low latency� Support for QoS� Physical and logical addresses
• Physical addresses: refer to specific switches at specific locations within the NoC topology
• Logical addresses: refer to processing entities inside hardware modules
CoNoChiCoNoChi = Confígurable Network on Chip
log addInterfacephy addHardwareAssistphysical addressphysical address
logical addresslogical address
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering9
RuntimeRuntime --AdaptiveAdaptive NetworkNetwork --onon --ChipChip (2/2)(2/2)
InterfaceHA 6 InterfaceHA 5Topology Adaptation� Network topology can be
adapted at runtime� Coarse-grained tileMerging/separation ofneighbouring tiles
→ Provides space for modules of varying complexity
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering10
Fault Fault ToleranceTolerance (1/3)(1/3)
Fault scenarios:� User data• Non-permanent fault• Huge hardware effort to detect and correct • Tolerated by application area � Processing units and infrastructure• Device degradation
Fault in hardware structure• Single-Event Functional Interrupts (SEFIs)
Bitflip in configuration data
Approach: Combination of� Configuration readback• Slow (33 ms for one tile)• Does not detect hardware faults� Test packets
Do not cover all faults� Alive messagesMissing alive message indicates problem
Permanent faults
→→→→ need to be corrected
DynaCORE
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering11
Fault Fault ToleranceTolerance (2/3)(2/3)
Fault detectionFault detection� Alive messages� Test packets� Periodic configuration readback
Fault localization and correctionFault localization and correction� Stepwise procedure using test packets� Test against different assumptions� SEU in control registers → tile reset� SEFI → rewritting reconfiguration� Permanent hardware fault → reorganization
DynaCORE
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering12
Fault Fault ToleranceTolerance (3/3)(3/3)
Example: no alive message from switch 1
1. Identification of faulty segment� Identify path under testKnown by the reconfiguration manager� Send test packets to all switch along the path under test� If a test packet does not return correctly, faulty segment has been identified
DynaCORE
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering13
Fault Fault ToleranceTolerance (3/3)(3/3)
Example: no alive message from switch 1
2. Assumption: SEU in control registers of switches or routing tables� Reset switches in affected section� Send new routing tables� Repeat test
DynaCORE
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering14
Fault Fault ToleranceTolerance (3/3)(3/3)
Example: no alive message from switch 1
3. Assumption: SEFI� Readback configuration data for each tile and compare with reference� In case of mismatch, reconfigure tileIf tile contains a switch, send new routing tables� Repeat test
permanent hardware error→ reorganize system
Procedure takes time, does not cover all fault scenarios, yet is hardware efficient
DynaCORE
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering15
ModellingModelling DynaCOREDynaCORE (1/4)(1/4)
Dynamically Structured Discrete Event-Based System Network (DSDEVN)� Extends discrete-event based system (DEVS) formalism� States of controller χ can again be models� „Simple“ DEVS simulator sufficient for simulation of DSDEVN
DynaCORE Model:
DSDEVN∆= < X∆, Y∆, χ, Mχ >� ∆ identifies DynaCORE� X∆, valid inputs of the system, and Y∆, outputs of the system:messages received from and send to the NP� χ: DynaCORE-specific controller� Mχ :model description of the controller (as DEVS)
Abstract DynaCORE Model
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering16
ModellingModelling DynaCOREDynaCORE (2/4)(2/4)
Controller Description as DEVS:
Mχ = < Xχ, Sχ, Yχ, δintχ,, δext
χ, λχ, τχ >� Xχ: Set of valid controller input� Sχ: Controller state space� Yχ: Set of valid controller output� δintχ: State transition function for internal events – including „timeouts“� δextχ: State transition function for external events� λχ: Output function� τχ: Timeout function (assigns a timeout value to states from Sχ)
Controller States� Include information on system configuration, i.e. configured HAs� Contain, in turn, models of system components active in respective state
Abstract DynaCORE Model
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering17
0
200
400
600
800
1000
1200
1400
0.0005 0.0165 0.0325 0.0485 0.0645
Ban
dwid
th [M
bit/s
]
Rec
onfig
urat
ion
input data rate output data rate reconfiguration
ModellingModelling DynaCOREDynaCORE (3/4)(3/4)
Structure of SystemCSimulation Model
Simulation Stimulus and Output
� Input burst
• Aggregated traffic composed of fourb-modeled packet streams� No packet loss (sufficient buffer sizes)
Simulation
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering18
ModellingModelling DynaCOREDynaCORE (4/4)(4/4)
Influence of Buffer Sizes
4
16
64
2
8
32
1288,0
9,0
10,0
11,0
12,0
13,0
Latency [ms]
Buffer Switch[#Pkt] Buffer NoC-
Interface [#Pkt]
0,00
0,20
0,40
0,60
0,80
1,00
1,20
4 8 16 32 64 128
Buffer size [#packets]
Rat
io
0,00
2,00
4,00
6,00
8,00
10,00
12,00
Tim
e [m
s]
Data rate Packet loss Latency� Low impact of buffer sizes between NoC and HA� Large switch buffers:• Only little advantage for latency• Increased packet loss in case of reconfiguration
Simulation
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering19
RuntimeRuntime ReconfigurationReconfiguration (1/3)(1/3)
Configuration State Space� Three modules� Three types of HA� Possible transitions betweenconfigurations� Transition costs(number of HAs to bereconfigured) { A B C } { A B B }{ A C C }
{ B C C } { B B B }{ B B C }
{ A A A } { A A B }{ A A C }{ C C C }
21 1
11
111
111
1 11 1 1 222 212212 2 1 3 22322 3 323323 3 2 3
Determining the Point of Reconfiguration
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering20
RuntimeRuntime ReconfigurationReconfiguration (2/3)(2/3)
Reduced Configuration State Space� Transition cost limited
A B C A B BA C CC B C B B BB B C
A A A A B AA A CC C C
1 9222 3 10413516619
725 8 1112 281417 15 2729 1821 302023 24 26
Reconfiguration Trigger� Configurable per-HA utilisationthreshold exceeded multipletimes in sequence
ZeitSchwellwert TSχuSχv Sχu Sχv SχuSχu Sχv Sχu
Monitor-datum SχuSχvDetermining the Point of Reconfiguration
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering21
RuntimeRuntime ReconfigurationReconfiguration (3/3)(3/3)
Merging and Separating Tiles� Changes number and shapes ofpartially reconfigurable regions� Different sets of bus macros
Technical AspectsScenario 1
removed bus macro
Scenario 2
Static elements
in original design as part of hard macro
Bus macros
Reconfiguration Speed� Achievable maximum speed dependent on• Memory bandwidth• Allowable clock-ratios between system components� Fraction of theoretically possible speed
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering22
Evaluation/Evaluation/ DemonstratorDemonstrator (1/2)(1/2)Demonstrator Structure
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering23
Evaluation/Evaluation/ DemonstratorDemonstrator (2/2)(2/2)
FlexPath NP� NP with reconfigurable data-path� Virtex-4 FX 60
DynaCORE� reconfigurable processing modules (HAs)� Virtex-4 FX 60
stimulusstimulus
analysis,analysis,visualisationvisualisation
FlexPath and DynaCORE Demonstrator
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering24
PublicationsPublications[PKA09] Pionteck, T.: Koch, R.; Albrecht, C.; Maehl e, E.: A Design Technique for Adapting Number and Boundaries of Reconfigurable Modules at Runtime. Inter national Journal of Reconfigurable Computing, vol. 2009, Article ID 942930,, Hindawi Publishing Corporati on , New York 2009
[PAK08a] Pionteck, T.; Albrecht, C.; Koch, R,; Maeh le, E,: Adaptive Communication Architectures for RuntimeReconfigurable System-on-Chips. Parallel Processing Letters, 2008
[AFK09] Albrecht, C.; Foag, J.; Koch, R.; Maehle, E .; Pionteck, T.: DynaCORE – Dynamically Reconfigurable Coprocessor for NetworkProcessors. To Appear: Dynamically Reconfigurable Sys tems Architectures: Design Methods and Applications, Springer, 2009
[AKP09] Albrecht, C.; Koch, R.; Pionteck, T.; Glöse kötter, P.: Towards a Flexible Fault-Tolerant System- on-Chip. 22th International Conference on Architecture of Computing Systems - Works hop Proceedings, 83-90, VDE Verlag GmbH, Berlin 200 9
[KAP09] Koch, R.; Albrecht, C.; Pionteck, T.: Adapt ive Health Monitoring in a Reconfigurable Network-on- Chip. Workshop on Diagnostic Services in Network-on-Chips (DSNOC), Nice 2009
[AOP08] Albrecht, C.; Osterloh, Ch.; Pionteck, T.; Koch, R.; Maehle, E.: An Application-Oriented Synthe tic Network Traffic Generator. European Conference on Modelling and Simulation 2008, 299-305, ECMS, Nicosia, Cyprus 2008
[ARK08] Albrecht, C.; Roß, P.; Koch, R. ; Pionteck, T. ; Maehle, E.: Performance Analysis of Bus-Based Interconnects for a Run-TimeReconfigurable Co-Processor Platform. PDP 08, 200-205, IEEE Computer Society, Toulouse, France 2008
[AWP08] Albrecht, C.; Werner, M.; Pionteck, T.; Fuc hsen, R.; Koch, R.; Maehle, E.: WCET Determination Tool for Embedded Systems Software. SIMUTools08 Proceedings, 1, ICST, Marseille, Fran ce 2008
[PAK08] Pionteck, T.; Albrecht, C.; Koch, R.; Brix, T.; Maehle, E.: Design and Simulation of Runtime Reconfigurable Systems. IEEE Workshop on Design and Diagnostics of Electronic Cir cuits and Systems (DDECS 2008 ), 2008
[PAK08b] Pionteck, T.; Albrecht, C.; Koch, R.; Maeh le, E.: Performance and Reliability Monitoring in Network-on-Chips. To Appear: Workshop on Diagnostic Services in Network-on-Chips ( DSNOC), 2008
[PAK08c] Pionteck, T.; Albrecht, C.; Koch, R.; Maeh le, E.: On the Design Parameters of Runtime Reconfigurable Systems. Accepted for: International Conference on Field Programmable Logic and Applications (FPL 2008), Heidelberg, Ger many 2008[AKP07] Albrecht, C.; Koch, R.; Pionteck, T.; Maehl e, E.: Simulation System for Run-Time Reconfigurable Networks-on-Chip. Proceedings of the 6th EUROSIM Congress on Modelling and Simulation, ARGESIM - ARGE Simulation News, Wiedner Hauptstrasse 8-10, 1040 Vie nna 2007
[APK07]Albrecht, C.; Pionteck, T.; Koch, R.; Maehle , E.: Modelling Tile-Based Run-Time Reconfigurable Systems Using SystemC. European Conference on Modellingand Simulation 2007, Prague, Czech Republic 2007
• • •
T T T I I I IIIUniversität zu LübeckUniversity of Lübeck
Institut für Technische InformatikInstitute of Computer Engineering25
SummarySummary
DynaCORE-specific aspects:� Interconnect performance analysis • Bus versus NoC• based on a formally derived simulation model � Synthetic traffic generator� Performance enhancement compared to software based systems� Proof of concept by means of demonstrator• In cooperation with FlexPath / TU Munich
Universal aspects� SystemC simulation methodology for runtime reconfigurable systems• SystemC kernel needs not to be adapted� Reconfiguration Management• Determining point of reconfiguration� NoC for runtime adaptable systems� Tile-based design methodology for runtime reconfigurable designs• Merging/separating reconfigurable regions