Upload
ing-kovacs-levente-kalman
View
221
Download
0
Embed Size (px)
Citation preview
8/13/2019 BL Eloadas3 4
1/56
Page 1
1
Begyazott processzor architektrk
2
Embedded CPUs PowerPC 405 (hard core)
32 bit embedded PowerPC RISCarchitecture
Up to 450 MHz
2x16 kB instruction and data caches
Memory management unit (MMU)
Embedded in Virtex-II Pro and Virtex-4/5/6
ARM CortexA9 (hard core) 32 bit multicore processor
Up to 900 MHz
Xilinx Zynq 7000 Processing platform
Device is processor based attached toFPGA
High level of performance
Reduces power, cost, size
MicroBlaze (soft core)
32 bit RISC architecture
2x64 kB instruction and data caches
Hardware multiply and divide
OPB and LMB bus interfaces...
8/13/2019 BL Eloadas3 4
2/56
Page 2
3
Embedded Processors
Embedded
Processor
Core
Type
Max Clock
FrequencySlices CLBs
Block
RAMs
PowerPC Hard 222 MHz 1000 250 9
Microblaze Soft 180 MHz 940 235 9
Picoblaze Soft 221 MHz 333 84 3
Picoblaze
(optimized)Soft 233 MHz 274 69 3
Hard core Faster Fixed position Few devices
Virtex-4 Processors:
Soft core Slower Can be placed anywhere Applicable to many devices
PowerPCMicroBlazeMicroBlazePicoBlaze
4
MicroBlaze Core
RISC Architecture
3/5 stage single-issue pipe
Separate Data and Ins
32 32-bit GP registers 32-bit instructions
3-operand/2-addressmodes
Optional MMU
Optional Buses:
LMB (local memory)
OPB (on-chip peripheral)
PLB (Processor LocalBus)
AXI Interconnect
PLB from IBM PowerPC
8/13/2019 BL Eloadas3 4
3/56
Page 3
5
Performance
Soft architecture trades configurability forlimited performance
Due to limited performance, softwareoptimization offers great potential
Need to monitor software algorithms forefficiency to achieve the most performance for
a given logic area
Logic could be added to improve performance Designer must decide if this is necessary
6
Core Options
OPB (Data or Ins)
LMB (Data or Ins)
PLB (Data or Ins)
Divider/Barrel Shifter
HW Debug
FSL links (Multi-processor)
Data and Ins Caches
Exception Support
FPU
HW Floating PointConvert
MMU
Each option adds to theprocessor footprint onthe FPGA
Special Registers:
MSR (Machine Status)(1)
EAR (ExceptionAddress) (3)
ESR (Exception Status)(5)
PC (Program Counter)(0)
FSR (FPU Status) (7)
BTR (Branch Target)(11)
All via SPR[x]
E.g. PC is SPR[0]
8/13/2019 BL Eloadas3 4
4/56
Page 4
7
Data Layout
Word Bit-reversed big-endian
Half Word
Byte
Byte n Byte n+1 Byte n+2 Byte n+3
MSByte LSByte0 31
MSBit LSBit
8
Instruction Format
3-operand Instructions (5-bit field)
16-bit Immedate Operands
Load/Store *(Ra+Rb) and *(Ra+Immediate)
8/13/2019 BL Eloadas3 4
5/56
Page 5
9
Pipeline Latency
Branch instructions
Division
Multiply
Any Load/Store orFSL transaction
Instruction Break orSubroutine return
1 cycle (if branch is not taken)
2 cycles (taken with delay slot)
3 cycles (taken without delay slot)
2 cycles (if register A = 0)
34 cycles
3 cycles
2 cycles
2 cycles
10
GP Registers
8/13/2019 BL Eloadas3 4
6/56
Page 6
11
Processor Version Register
11 32-bit status registers describing theprocessor options and a unique identifier aswell as cache sizes TLB options and targetFPGA design.
Required because there are dozens ofoptional processor components allowingsoftware to configure for hardware options
12
3 or 5 state Pipeline
Choice of pipeline depth
5-stage offers faster clock, but longer latency
Branch requires 3-cycles in the Execution step
Delay Slots
Like the MIPS design, only flush the fetch on taken branch
Decode stage instruction will complete (branch delay slot) Cannot have IMM, branch or break ins in delay slot.
Recoverable exceptions are allowed in Branch Delay Slot
8/13/2019 BL Eloadas3 4
7/56
Page 7
13
Harvard Memory Architecture
Separate Data and Memory interfaces andaddress spaces Can overlap if desired (debug: user modifiable code)
All I/O is memory Mapped Bus selection is mapped into address ranges
Cache line is 4 or 8 words
14
Privileged Instructions
GET, PUT, NGET, NPUT.. MTS, MSRCLR,MSRSET, BRK, RTID are all privileged.
Will raise protection exception in user code Exception: BRKI 0x8, or BRKI 0x18 perform user vector
exception
Hardware Exceptions, Interrupts and SoftwareBreaks cause entry to privileged mode. Need Prolog and Epilog code to protect user mode
registers
RTED (Return from Exception or Interrupt) goes back touser or virtual mode.
8/13/2019 BL Eloadas3 4
8/56
Page 8
15
Exceptions
1. Reset
2. Hardware Exception
3. NMI
4. Break
5. Interrupt
6. User Vector (exception)
Exceptions are prioritizedfrom top
Vectors in low addressspace
Register File ReturnAddresses
16
Reset
PC
8/13/2019 BL Eloadas3 4
9/56
8/13/2019 BL Eloadas3 4
10/56
Page 10
19
Caches
Optional hardware caches for Instructions orData 1-way direct mapped cache
Cachable address range is user settable
Variable Size (set during configuration) 64B-64kB
Disable bits in MSR (ACE and DCE)
WIC, WDC instructions to allow software invalidation ofcache lines
Cache lines 4 or 8 words (configurable)
Caches use BRAM of Spartan for both cacheand tags Be wary of physical memory constraints!
20
FPU
IEEE 754 Standard Single-Precision FloatingPoint
ADD, SUB, MUL, DIV, Comp, Conv, SQRT Nan is supported (quiet exception)
Overflow returns signed
32-bit float 8-bit exponent, 23-bit mantissa
Vaules from and returned to GP register set ofprocessor
Exceptions (when enabled) are regularHardware Exceptions (FSR keeps the bits) result register not overwritten if exception
8/13/2019 BL Eloadas3 4
11/56
8/13/2019 BL Eloadas3 4
12/56
Page 12
23
Data Types
Byte 8-bit, Short 16-bit, and Long 32-bit
C-types: char 8-bit
short 16-bit
long or int 32-bit
float 32-bit
enum 32-bit
Pointers can be 16 or 32 bit, depending on data area size
24
Register Conventions (GCC)
8/13/2019 BL Eloadas3 4
13/56
Page 13
25
Register Conventions II
26
Register Use Notes
R3-R12 are volatile not retained in over function calls
R3, R4 are function return values
R5-R10 used to pass parameters R19-R31 are stable across function calls (non-volatile)
Called function needs to save these to stack in prologue and returnthem in epilogue code
R14-R17 store return addresses from interrupts, subroutines,traps, exceptions
Subroutine Call: BRL (Branch and Link) saves PC at R15
Short pointers (SDA) use R2 and R13 as address anchors for read-only and read/write small data areas respectively
R1 is the stack pointer
R18 is the assembler operation temporary register
8/13/2019 BL Eloadas3 4
14/56
Page 14
27
Stack Convention
Stack grows towardlower addresses
Caller passesparameters using R5-R10 or by adding astack frame
Caller Returns values
via R3-R4 or bywriting to caller stackframe
28
Memory
Types: SDA (small data area), Data Area, Common Area, Literals(Constants)
SDA Globally initiallized variables
Max size object threshold in mbgcc: 8-bytes
R13 + 16-bit immediate offset, also absolute (32-bit address)
Data Area
Larger initialized variables (also could be SDA access < 64kB)
Common
Uninitialized global space
Literals
R2 Read-Only Data anchor (hardware enforced)
Could be overwritten by absolute address
8/13/2019 BL Eloadas3 4
15/56
Page 15
29
Performance Monitoring Plan
Detect cache parameters for given algorithm
Monitor Instruction side memory bus for accesses
Store accesses into VHDL counter
Read counter upon completion of micro-benchmark
30
Cache Operation
1. Detect if address is cacheable
2. If cacheable, lookup in tag memory
3. If tag matches and valid bit is set, drive
the ready signal (Cache Hit)
On cache miss, thecache waits for theOPB to fetch the datafrom memory
does not assert readysignal
8/13/2019 BL Eloadas3 4
16/56
8/13/2019 BL Eloadas3 4
17/56
Page 17
33
Stream Processor
Stream Processor Stream Collector
Detector
FIFO
Hash Table
IDIDValid
NewStreamValid
Start Address
Length
ClockResetPC
34
System Outline
8/13/2019 BL Eloadas3 4
18/56
Page 18
35
FPGA
Next Step...
CLKCLK
CLKcustom
IF-logic
SDRAM SDRAMSRAM SRAMSRAM
Memory
Controller
UART
Display
Controller
Timer
Power Supply
L
C
Audio
Codec
CPU(uP / DSP) Co-
Proc.
GP I/O
Address
Decode
Unit
Ethernet
MAC
Interrupt
Controller
36
Config urable System on a Chip (CSoC)
Power Supply
SDRAM SDRAMSRAM SRAMSRAM
L
C
Audio
Codec EPROM
8/13/2019 BL Eloadas3 4
19/56
Page 19
37
Soft CPU Core: MicroBlaze (Xil inx Inc.)
38
MicroBlaze-based Embedded Design
Flexible Soft IPMicroBlaze32-Bit RISC Core
UART10/100
E-Net
On-Chip
Peripheral
Off-Chip
MemoryFLASH/SRAM
LocalLink
FIFO Channels
0,1.32
Custom
FunctionsCustom
Functions
BRAMLocal Memory
Bus
D-Cache
BRAM
I-Cache
BRAM
Configurable
Sizes
Arbiter
Processor Local Bus
Instruction Data
PLBBus
Bridge
PowerPC
405 Core
Dedicated Hard IP
Arbiter
Processor Local Bus
Instruction Data
PLBBus
BridgeBus
Bridge
PowerPC
405 Core
Dedicated Hard IP
PowerPC
405 Core
Dedicated Hard IP
PowerPC
405 Core
Dedicated Hard IPPossible in
Virtex-II Pro
Hi-SpeedPeripheral
GBE-Net
e.g.MemoryController
Hi-SpeedPeripheralHi-SpeedPeripheral
GBE-NetGBE-Net
e.g.MemoryController
e.g.MemoryController
Arbiter OPB
On-Chip Peripheral Bus
8/13/2019 BL Eloadas3 4
20/56
Page 20
39
PowerPC
405 Core
Dedicated Hard IP
Flexible Soft IP
RocketIO
PowerPC-based Embedded Design
Full system customization to meetperformance, functionality, andcost goals
DCR Bus
UART GPIOOn-Chip
Peripheral
Hi-Speed
Peripheral
GB
E-Net
e.g.
MemoryController
Arbiter
On-Chip Peripheral Bus
OPB
Arbiter
Processor Local Bus
Instruction Data
PLB
DSOCM
BRAM
ISOCM
BRAM
Off-Chip
MemoryZBT SRAM
DDR SDRAMSDRAM
Bus
Bridge
IBM CoreConnect
on-chip bus standard
PLB, OPB, and DCR
40
MicroBlaze: Architecture & Features
RISC
Thirty-two 32-bit general purpose registers
32-bit instruction word with three operands and two addressing modes
Separate 32-bit instruct ion and data buses OPB (On-chip Peripheral Bus)
Separate 32-bit instruct ion and d ata buses LMB (Local Memory Bus)
Archi tecture
Features
OPB
LMB
8/13/2019 BL Eloadas3 4
21/56
Page 21
41
MicroBlaze: Bus Conf igurat ions
1.
2.
3.
4.
5.
6.
MicroBlaze core
LMB: Memory Controller (BRAMs)
OPB: Ext. Memory Ctrl., Interrupt Ctrl., UART, Timer,
Watchdog, SPI, JTAG-UART, etc.
42
AXI is an Interface Specification
Processor
Peripherals
PLB46
Arbiter
AXI Slaves
Interconnect
AXI AXI
AXI
AXI
AXI
Shared Access Bus
AXI Interconnect IP
Implementation is not
described in the spec
Several companies build
and sell AXI interconnect
IP
Xilinx is building its ownArrows indicate master/slave
relationship, not direction of dataflow
Master Slave
AXI
AXI
AXI
PLB
PLB
PLB
PLB
AXI is an interface
specification, not a
bus specification
AXI Masters
AXI AXI
8/13/2019 BL Eloadas3 4
22/56
Page 22
43
AXI is Part of AMBA
AMBA
APB AHB AXI
AXI-4Memory Map
AXI-4Stream
AXI-4Lite
ATBAMBA 3.0
(2003)
AMBA 4.0
(Just Announced)
Same Spec
Enhancements for FPGAs
Interface Features Similar to
Memory Map /Full
Traditional Address/Data Burst(single address, multiple data)
PLBv46, PCI
Streaming Data-Only, Burst Local Link / DSP Interfaces
/ FIFO / FSL
Lite Traditional Address/DataNo Burst
(single address, single data)
PLBv46-single
OPB
44
Embedded DevelopmentTool Flow Overview
Compiler/Linker
(Simulator)
C Code
Debugger
Standard Embedded SW
Development Flow
CPU code in
on-chip
memory
?
CPU code in
off-chip
memory
Download to Board & FPGA
Object Code
Standard FPGA HW
Development Flow
Synthesizer
Place & Route
Simulator
VHDL/Verilog
?
Download to FPGA
8/13/2019 BL Eloadas3 4
23/56
Page 23
45
EDK
The Embedded Development Kit (EDK) consists of thefollowing:
Xilinx Platform Studio XPS
Base System Builder BSB
Create and Import Peripheral Wizard
Hardware generation tool PlatGen
Library generation tool LibGen
Simulation generation tool SimGen
GNU software development tools
System verification tool XMD
Virtual Platform generation tool - VPgen
Software Development Kit (Eclipse)
Processor IP
Drivers for IP
Documentation
Use the GUI or the shell command tool to run EDK
46
EDK Files
MHS = Microprocessor HardwareSpecification
MSS = Microprocessor Software Specification MPD = Microprocessor Peripheral Description
PAO = Peripheral Analyze Order
BBD = Black-Box Definition
MDD = Microprocessor Driver Description
BMM = BRAM Memory Map
8/13/2019 BL Eloadas3 4
24/56
Page 24
47
Design Flow: Combine HW + SW
Generate
Netlist
ISE
Platform Ext.Proj.Nav. / VHDL
*.mhs
*.elf
*.c *.asm
Compile
&
Link
UpdateBitstrea
m
*.bit
*.h
Gen.
Libs
Platform Definition(peripherals, configuration,
connectivity, address
space)
EDK: Embedded Development Kit XPS: Xilinx Platform Studio ISE: Integrated Software EnvironmentMHS: Microprocessor Hardware Specification
*.bit
XPS
Generate
Bitstream
*.ucf
Hardware So ftware
*.bmm
48
Xilinxvs
ltera
8/13/2019 BL Eloadas3 4
25/56
Page 25
49
Head-to-Head
Xilinx Virtex-II Pro
1.5v 130nmcopper
125,136 logiccells
10Mb RAM
556 18x18multipliers
Up to fourPowerPC 405cores
Altera Stratix
1.5v 130nmcopper
114,140 logicelements
10Mb RAM
224 9x9multipliers
No hardprocessor cores(Excalibur, basedon Apex 20k)
50
Xilinx Virtex-II Pro
8/13/2019 BL Eloadas3 4
26/56
Page 26
51
Altera Stratix
52
Xilinx Virtex CLB
8/13/2019 BL Eloadas3 4
27/56
Page 27
53
Virtex Slice
54
54
Half Slice
8/13/2019 BL Eloadas3 4
28/56
8/13/2019 BL Eloadas3 4
29/56
Page 29
5858
Logic Element
5959
Embedded RAM
Xilinx Block SelectRAM 18Kb dual-port RAM arranged in columns
Altera TriMatrix Dual-Port RAM M512 512 x 1
M4K 4096 x 1
M-RAM 64K x 8
8/13/2019 BL Eloadas3 4
30/56
8/13/2019 BL Eloadas3 4
31/56
Page 31
62
Altera Multiplier Sub-block
6363
Virtex: Active Interconnect
8/13/2019 BL Eloadas3 4
32/56
Page 32
6464
Virtex Hierarchical Interconnect
6565
Altera: MultiTrack Interconnect
Direct link between LABs and adjacent blocks
Row interconnects 4, 8, and 24 blocks left or right
Column interconnects 4, 8, and 16 blocks up or down
8/13/2019 BL Eloadas3 4
33/56
Page 33
6666
Stratix: R4 Interconnect
67
Xilinx MicroBlaze
8/13/2019 BL Eloadas3 4
34/56
Page 34
68
Altera Nios
69
Virtex PowerPC Core
8/13/2019 BL Eloadas3 4
35/56
Page 35
70
Zavarmentestsi eljrsok
alkalmazsa jrakonfigurlhat
begyazott processzorokban
71
System on Programmable Chip
Soft-coreprocessorimplemented inSRAM based
FPGA is veryattractive tospacecraftdesigner.
A completecomputersystem can becreated on asingle FPGAchip.
8/13/2019 BL Eloadas3 4
36/56
Page 36
72
Space application issues
Radiation environment
In space, high energy ionizing particles exist as part of the natural background.
In addition, solar particle events and high energy protons trapped in the Earth'smagnetosphere (Van Allen radiation belts).
These electro-magnetic radiation brings potential threats to electronic devices.
Single Event Upset (SEU)SEU is a change of state caused by ions or electro-magnetic radiation striking a sensitivenode in a micro-electronic device, such as in a microprocessor, semiconductor memory, or
power transistors. The state change is a result of the free charge created by ionization in orclose to an important node of a logic element (e.g. memory "bit").
FPGA is susceptible to SEUdata/instruction stored in block memory
configuration bits stored in distributed RAM
73
Proposed upset mitigation
To ensure reliable space application based onSRAM-FPGA, 3 level of upset mitigation canbe investigated: Functional-block design triplication
Continuous external configuration scrubbing
Independent internal BRAM scrubbing (also triplicated)
8/13/2019 BL Eloadas3 4
37/56
Page 37
74
Tool, device and environment
Tools:
Xilinx TMR: easily trade off maximum radiation effect immunity
against area, pinout, and board layout consideration.
Device:Xilinx Virtex II XQR2 V6000 FPGA
Program running in MicroBlaze:Integer-based FFT
Test environment:Crocker Nuclear Laboratory at University of California at Davis using
a proton beam of 63.3 MeV.
Test boradTwo FPGAs, one is device under test (DUT), the other is serviceFPGA
75
DUT and Service FPGA
Service FPGA performs two functions:
1) configuration readback and scrubbing DUT whenthere is readback error
2) control and monitoring of the functional operation
of the MicroBlaze running the FFT program
Program (FFT) is stored in internal BRAM each timethe DUT is configured
Data is sent to DUT internal BRAM by service FPGA.
The result of FFT program are returned to serviceFPGA and compared to the expected result.
Service FPGA DUT
uBlaze
BRAM
8/13/2019 BL Eloadas3 4
38/56
Page 38
76
TMR
Triple Module Redundancy
3 modules performing the same task, only themajority will be pick up as output by the Voter.
If any one of the three systems fails, the other twosystems can correct and mask the fault. If the voterfails then the complete system will fail. However, in agood TMR system the voter is a critical componentand should be much more reliable than the othercomponents.
TMR
77
Xilinx TMR
8/13/2019 BL Eloadas3 4
39/56
Page 39
78
External Configuration Scrubbing Configuration scrubbing is the process of rewriting the configuration
memory of an FPGA for the purpose of correcting any errors that mayhave accumulated since the device was last configured.
Service FPGA will detect readback error, and scrub the configurationby reloading bitstream to correct upsets.
Transparent process
normal device operation runs concurrently and
without interruption
Configuration scrubbing frequency: 16 MHz, i.e. 4 scrub-cycles per sec
79
Independent internal BRAM scrubbing
8/13/2019 BL Eloadas3 4
40/56
8/13/2019 BL Eloadas3 4
41/56
Page 41
82
Testing Two mitigated versions of the MicroBlaze design architecture can
be implemented and tested: with the BRAM scrubber.
without the BRAM scrubber.
Error types:
Type 1 errors: FFT outputs were wrong.
Type 1a: Corrected after a configuration scrub cycle
Type 1b: Not corrected after a scrub cycle, even after a reset of theDUT design
Type 2 errors: Nonresponsiveness of the DUT, requiring a reset andsynchronization
Type 2a: Corrected by scrubbing and hence referred to as arecovering reset
Type 2b: Not corrected by scrubbing and referred to as a runawayreset.
This type of error (runaway reset) is an uncorrected error condition that causes thefunctional monitor to continually attempt to reset the MicroBlaze processor eachtime the watchdog timer set for the handshaking between the two FPGAs reachesits limit value.
Type 3 errors: Occurrence of an exception or interrupt detection.
This is what the
emphasis is on!
83
Standalone test
To make sure that the BRAM code corruptionis likely to be the cause of these runaway
resets, the BRAM mitigation design can beimplemented in standalone mode and testedunder proton beams at similar fluxes and atthe same facility.
8/13/2019 BL Eloadas3 4
42/56
Page 42
84
Runaway Resets Caused by BRAM Corruption
At a flux (1.70108), at least 17% (1.2110-11/6.8210-11) of the runaway resets are due to errors in the
BRAM code, while at a (1.70109
) flux, 23% of themare caused by code corruption.
85
Exceptions Caused by BRAM Runaway Resets
Design 1: An average of 64% of the unrecovered resets (due to BRAMcode corruption) has been detected by exceptions (64% at the flux 1and 80% at the flux 2).
Design 2: exceptions were observed only after an increase of twoorders of magnitude of the flux (1.70109) and only 25% of the runawayresets have been detected.
Not all the illegal states are detected by the exception mechanism. At a lower flux (1.70108) , although seven resets have been observed, no exceptions
have been detected
The MicroBlaze was optimized to fit in the Xilinx FPGAs and theexception circuitry has been designed to detect only major illegal
operations.
8/13/2019 BL Eloadas3 4
43/56
8/13/2019 BL Eloadas3 4
44/56
8/13/2019 BL Eloadas3 4
45/56
Page 45
90
91
8/13/2019 BL Eloadas3 4
46/56
Page 46
92
93
8/13/2019 BL Eloadas3 4
47/56
Page 47
94
95
8/13/2019 BL Eloadas3 4
48/56
Page 48
96
97
8/13/2019 BL Eloadas3 4
49/56
Page 49
98
99
8/13/2019 BL Eloadas3 4
50/56
Page 50
100
101
8/13/2019 BL Eloadas3 4
51/56
Page 51
102
103
8/13/2019 BL Eloadas3 4
52/56
8/13/2019 BL Eloadas3 4
53/56
Page 53
106
107
8/13/2019 BL Eloadas3 4
54/56
Page 54
108
109
8/13/2019 BL Eloadas3 4
55/56
Page 55
110
111
8/13/2019 BL Eloadas3 4
56/56
112