Upload
girithik14
View
225
Download
0
Embed Size (px)
Citation preview
7/29/2019 Kepner
1/27
Slide-1
Quicklook
MIT Lincoln Laboratory
Deployment of SAR and GMTI Signal Processingon a Boeing 707 Aircraft using pMatlab and a
Bladed Linux Cluster
Jeremy Kepner, Tim Currie, Hahn Kim, Bipin Mathew,Andrew McCabe, Michael Moore, Dan Rabinkin, Albert
Reuther, Andrew Rhoades, Lou Tella and Nadya Travinin
September 28, 2004
This work is sponsored by the Department of the Air Force under Air Force contract F19628-00-C-002.
Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily
endorsed by the United States Government.
7/29/2019 Kepner
2/27
MIT Lincoln LaboratorySlide-2
Quicklook
LiMIT
Technical Chal lenge
pMatlab
QuickLook Concept
Outline
Introduction
System
Software
Results
Summary
7/29/2019 Kepner
3/27
MIT Lincoln LaboratorySlide-3
Quicklook
LiMIT
Lincoln Multifunction Intelligence, Surveillance andReconnaissance Testbed
Boeing 707 aircraft
Fully equipped with sensors and networking
Airborne research laboratory for development, testing, andevaluation of sensors and processing algorithms
Employs Standard Processing Model for Research Platform
Collect in the air/process on the ground
7/29/2019 Kepner
4/27
MIT Lincoln LaboratorySlide-4
Quicklook
Processing Challenge
Can we process radar data (SAR & GMTI) in flight and
provide feedback on sensor performance in flight?
Requirements and Enablers
Record and playback data
High speed RAID disk system
High speed network
High density parallel computing
Ruggedized bladed Linux cluster
Rapid algorithm development
pMatlab
SGI
RAID DiskRecorder
14x2 CPU
IBM Blade Cluster
7/29/2019 Kepner
5/27
MIT Lincoln LaboratorySlide-5
Quicklook
High Performance Matlab Applications
pMatlab: Parallel Matlab Toolbox
DoD Sensor
Processing
Parallel MatlabToolbox
DoD Decision
SupportScientific
SimulationCommercial
Applications
User
Interface
Hardware
Interface
Parallel Computing Hardware
Goals
Matlab speedup throughtransparent parallelism Near-real-time rapid
prototyping
Lab-Wide Usage
Ballistic Missile Defense Laser Propagation Simulation Hyperspectral Imaging Passive Sonar
Airborne Ground MovingTarget Indicator (GMTI)
Airborne Synthetic ApertureRadar (SAR)
Matlab*P
PVL
MatlabMPI
7/29/2019 Kepner
6/27
Slide-6
Quicklook
MIT Lincoln Laboratory
QuickLook Concept
Analyst Workstation
Running Matlab
Streaming
Sensor Data
Data Files SAR
GMTI
(new)
RAID Disk
Recorder
28 CPU Bladed Cluster
Running pMatlab
7/29/2019 Kepner
7/27
MIT Lincoln LaboratorySlide-7
Quicklook
ConOps
Ruggedizat ion
Integration
Outline
Introduction
System
Software
Results
Summary
7/29/2019 Kepner
8/27
MIT Lincoln LaboratorySlide-8
Quicklook
Concept of Operations
Analyst Workstation
Running Matlab
Streaming
Sensor Data
Split files,
Copy w/rcp SARGMTI
(new)
RAID DiskRecorder Bladed Cluster
Running pMatlab
600 MB/s
(1x RT)
Gbit Ethernet
(1/4x RT rate)
(1 TB local storage ~ 20 min data)
Xwindows
over Lan
To Other
Systems
Record Streaming Data
Copy to Bladed Cluster
Process on Bladed Cluster
Process on SGI
~1 seconds = 1 dwell
~30 Seconds
2 Dwell ~2 minutes1st CPI ~ 1 minutes
2 Dwells ~1 hour1st CPI ~ 2 minutes
Timeline
Net benefit: 2 Dwells in 2 minutes vs. 1 hour
7/29/2019 Kepner
9/27
MIT Lincoln LaboratorySlide-9
Quicklook
X-axis, 13 CPU/13 HD
0
10
20
30
40
50
60
16 128
1,0
24
8,1
92
65,53
6
524,2
88
4,1
94,3
04
33,55
4,4
32
Message Sizes (Bytes)
Thro
ughput(MBps)
No Vibration
~0.7G (-6dB)
~1.0G (-3dB)
1.4G (0dB)
Vibration Tests
Tested only at operational (i.e.in-flight) levels:
0dB = 1.4G (above normal)
-3dB = ~1.0G (normal)
-6dB = ~0.7G (below normal)
Tested in all 3 dimensions
Ran MatlabMPI file based
communication test up 14CPUs/14 Hard drives
Throughput decreases seen at1.4 G
7/29/2019 Kepner
10/27
MIT Lincoln LaboratorySlide-10
Quicklook
Thermal Tests
Temperature ranges
Test range: -20C to 40C Bladecenter spec: 10C to 35C
Cooling tests
Successfully cooled to -10C Failed at -20C Cargo bay typically 0C
Heating tests
Used duct to draw outside air tocool cluster inside oven
Successfully heated to 40C Outside air cooled cluster to 36C
7/29/2019 Kepner
11/27
MIT Lincoln LaboratorySlide-11Quicklook
Mitigation Strategies
IBM Bladecenter is not designedfor 707s operationalenvironment
Strategies to minimize risk ofdamage:
1. Power down during takeoff/landing
Avoids damage to hard drives
Radar is also powered down
2. Construct duct to draw cabin airinto cluster
Stabilizes cluster temperature Prevents condensation of cabin air moisture within cluster
7/29/2019 Kepner
12/27
MIT Lincoln LaboratorySlide-12Quicklook
Integration
SGI
RAID
SGI RAID System
Scan catalog files, select dwells andCPIs to process (C/C shell)
Assign dwells/CPIs to nodes, package
up signature / aux data, one CPI per
file.Transfer data from SGI to each
processors disk (Matlab)
IBM Bladed Cluster
Nodes process CPIs in parallel, write
results onto node 1s disk. Node 1
processor performs final
processing
Results displayed locally
pMatlab allows integration to occur while algorithm is being finalized
VP1
VP2P1
NODE
1
VP1
VP2P2
Bladed Cluster
GigabitConnection
VP1
VP2P1
NODE
14
VP1
VP2P2
rcp
7/29/2019 Kepner
13/27
MIT Lincoln LaboratorySlide-13Quicklook
pMatlab architecture GMTI
SAR
Outline
Introduction
Hardware
Software
Results
Summary
7/29/2019 Kepner
14/27
MIT Lincoln LaboratorySlide-14Quicklook
Can build applications with a fewparallel structures and functions pMatlab provides parallel arrays
and functionsX = ones(n,mapX);
Y = zeros(n,mapY);Y(:,:) = fft(X);
Library Layer (pMatlab)
MatlabMPI & pMatlab Software Layers
Vector/Matrix CompTask
Conduit
Application
Parallel
LibraryParallel
Hardware
Input Analysis Output
User
InterfaceHardware
InterfaceKernel Layer
Math (Matlab)Messaging (MatlabMPI)
Can build a parallel library with afew messaging primitives MatlabMPI provides this
messaging capability:
MPI_Send(dest,comm,tag,X);X = MPI_Recv(source,comm,tag);
7/29/2019 Kepner
15/27
MIT Lincoln LaboratorySlide-15Quicklook
LiMIT GMTI
Demonstrates pMatlab in a large multi-stage application
~13,000 lines of Matlab code Driving new pMatlab features
Parallel sparse matrices for targets (dynamic data sizes)Potential enabler for a whole new class of parallel algorithms
Applying to DARPA HPCS GraphTheory and NSA benchmarks
Mapping functions for system integration
Needs expert components!
DWELLAUX
Input SIG
Data
Input Aux
Data
Input LOD
Data
Deal to
Nodes
SIG
LOD
Dwell Detect
Processing
Angle/Param Est
Geolocate
Display
INS
SUBBAND (1,12, or 48)
PC Range Walk
Correction
Crab
Correct
Doppler
Process
Adaptive
BeamformSIG
LOD
STAP
CPI (N/dwell, parallel)
Recon Range WalkCorrection
Beam ResteerCorrection
CPI DetectionProcessing
INS
Process
EQ SubbandSIG SIG
EQ SubbandLODINS LOD
EQCEQC
GMTI Block Diagram Parallel Implementation
Approach
Deal out CPIs to different CPUs
Performance
TIME/NODE/CPI ~100 sec
TIME FOR ALL 28 CPIS ~200 sec
Speedup ~14x
7/29/2019 Kepner
16/27
MIT Lincoln LaboratorySlide-16Quicklook
GMTI pMatlab Implementation
GMTI pMatlab code fragment
% Create distribution spec: b = block, c = cyclic.
dist_spec(1).dist = 'b';
dist_spec(2).dist = 'c';
% Create Parallel Map.
pMap = map([1 MAPPING.Ncpus],dist_spec,0:MAPPING.Ncpus-1);
% Get local indices.
[lind.dim_1_ind lind.dim_2_ind] = global_ind(zeros(1,C*D,pMap));
% loop over local part
for index = 1:length(lind.dim_2_ind)
...end
pMatlab primarily used for determining which CPIs to work on CPIs dealt out using a cyclic distribution
7/29/2019 Kepner
17/27
MIT Lincoln LaboratorySlide-17Quicklook
LiMIT SAR
Most complex pMatlab application built (at that time) ~4000 lines of Matlab code
CornerTurns of ~1 GByte data cubes
Drove new pMatlab features Improving Corner turn performance
Working with Mathworks to improve
Selection of submatricesWill be a key enabler for parallel linear algebra (LU, QR, )
Large memory footprint applicationsCan the file system be used more effectively
A/D FFTSelect
FFT Bins(downsample)
EqualizationCoefficients
Chirp
Coefficients Polar
Remap
SAR
w.
Autofocus
Real Samples
@480MS/Sec
Complex Samples
@180MS/Sec
Collect
Pulse
Buffer
Buffer
Collect
Cube
Histogram
&
Register
Image
DisplayIMUOutput
8 8 8 8 8
SAR Block Diagram
7/29/2019 Kepner
18/27
MIT Lincoln LaboratorySlide-18Quicklook
SAR pMatlab Implementation
SAR pMatlab code fragment
% Create Parallel Maps.
mapA = map([1 Ncpus],0:Ncpus-1);
mapB = map([Ncpus 1],0:Ncpus-1);
% Prepare distributed Matrices.
fd_midc=zeros(mw,TotalnumPulses,mapA);
fd_midr=zeros(mw,TotalnumPulses,mapB);
% Corner Turn (columns to rows).
fd_midr(:,:) = fd_midc;
Cornerturn Communication performed by overloaded = operator Determines which pieces of matrix belongs where
Executes appropriate MatlabMPI send commands
7/29/2019 Kepner
19/27
MIT Lincoln LaboratorySlide-19Quicklook
Scal ing Results Mission Results
Future Work
Outline
Introduction
Implementation
Results
Summary
7/29/2019 Kepner
20/27
MIT Lincoln LaboratorySlide-20Quicklook
Parallel Performance
0
10
20
30
0 10 20 30
GMTI (1 per node)
GMTI (2 per node)
SAR (1 per node)
Linear
Number of Processors
ParallelSpeedup
7/29/2019 Kepner
21/27
MIT Lincoln LaboratorySlide-21Quicklook
SAR Parallel Performance
Application memory requirements too large for 1 CPU
pMatlab a requirement for this application
Corner Turn performance is limiting factor
Optimization efforts have improved time by 30%
Believe additional improvement is possible
Corner Turn bandwidth
7/29/2019 Kepner
22/27
MIT Lincoln LaboratorySlide-22Quicklook
July Mission Plan
Final Integration
Debug pMatlab on plane Working ~1 week before mission (~1 week after first flight)
Development occurred during mission
Flight Plan Two data collection flights
Flew a 50 km diameter box
Six GPS-instrumented vehicles
Two 2.5T trucks
Two CUCV'sTwo M577's
7/29/2019 Kepner
23/27
MIT Lincoln LaboratorySlide-23Quicklook
July Mission Environment
Stressing desert environment
7/29/2019 Kepner
24/27
MIT Lincoln LaboratorySlide-24Quicklook
July Mission GMTI results
GMTI successfully run on 707 in flight Target reports
Range Doppler images
Plans to use QuickLook for streamingprocessing in October mission
7/29/2019 Kepner
25/27
MIT Lincoln LaboratorySlide-25Quicklook
Embedded Computing Alternatives
Embedded Computer Systems
Designed for embedded signal processing
Advantages
1. Rugged - Certified Mil Spec
2. Lab has in-house experience
Disadvantage
1. Proprietary OS No Matlab
Octave Matlab clone
Advantage
1. MatlabMPI demonstrated using Octaveon SKY computer hardware
Disadvantages
1. Less functionality2. Slower?
3. No object-oriented support NopMatlab support Greater coding effort
7/29/2019 Kepner
26/27
MIT Lincoln LaboratorySlide-26Quicklook
Petascale pMatlab
pMapper: automatically finds best parallel mapping
pOoc: allows disk to be used as memory
pMex: allows use of optimized parallel libraries (e.g. PVL)
MULTFFTFFTA B C D E
Parallel Computer Optimal Mapping
~1 GByte
RAM
Matlab~1 GByte
RAM
pMatlab (N x GByte)~1 GByte
RAM
~1 GByte
RAM
Petascale pMatlab (N x TByte)~1 GByte
RAM
~1 TByte
RAID disk
~1 TByte
RAID disk
0
25
50
75
100
10 100 1000 10000
in core FFT
out of core FFT
Matrix Size (MBytes)Performance(MFlops)
pMatlab User Interface
Matlab*P Client/Server pMatlab ToolboxpMex
dmat/ddens
translatorParallel Libraries:PVL, ||VSIPL++, ScaLapack
Matlab math libraries
7/29/2019 Kepner
27/27
MIT Lincoln LaboratorySlide-27
Summary
Airborne research platforms typically collect and processdata later
pMatlab, bladed clusters and high speed disks enableparallel processing in the air
Reduces execution time from hours to minutes
Uses rapid prototyping environment required for research
Successfully demonstrated in LiMIT Boeing 707 First ever in flight use of bladed clusters or parallel Matlab
Planned for continued use Real Time streaming of GMTI to other assets
Drives new requirements for pMatlab Expert mapping
Parallel Out-of-Core
pmex