Kepner

7/29/2019 Kepner

1/27

Slide-1

Quicklook

MIT Lincoln Laboratory

Deployment of SAR and GMTI Signal Processingon a Boeing 707 Aircraft using pMatlab and a

Bladed Linux Cluster

Jeremy Kepner, Tim Currie, Hahn Kim, Bipin Mathew,Andrew McCabe, Michael Moore, Dan Rabinkin, Albert

Reuther, Andrew Rhoades, Lou Tella and Nadya Travinin

September 28, 2004

This work is sponsored by the Department of the Air Force under Air Force contract F19628-00-C-002.

Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily

endorsed by the United States Government.

7/29/2019 Kepner

2/27

MIT Lincoln LaboratorySlide-2

Quicklook

LiMIT

Technical Chal lenge

pMatlab

QuickLook Concept

Outline

Introduction

System

Software

Results

Summary

7/29/2019 Kepner

3/27


Quicklook

LiMIT

Lincoln Multifunction Intelligence, Surveillance andReconnaissance Testbed

Boeing 707 aircraft

Fully equipped with sensors and networking

Airborne research laboratory for development, testing, andevaluation of sensors and processing algorithms

Employs Standard Processing Model for Research Platform

Collect in the air/process on the ground

7/29/2019 Kepner

4/27


Quicklook

Processing Challenge

Can we process radar data (SAR & GMTI) in flight and

provide feedback on sensor performance in flight?

Requirements and Enablers

Record and playback data

High speed RAID disk system

High speed network

High density parallel computing

Ruggedized bladed Linux cluster

Rapid algorithm development

pMatlab

SGI

RAID DiskRecorder

14x2 CPU

IBM Blade Cluster

7/29/2019 Kepner

5/27


Quicklook

High Performance Matlab Applications

pMatlab: Parallel Matlab Toolbox

DoD Sensor

Processing

Parallel MatlabToolbox

DoD Decision

SupportScientific

SimulationCommercial

Applications

User

Interface

Hardware

Interface

Parallel Computing Hardware

Goals

Matlab speedup throughtransparent parallelism Near-real-time rapid

prototyping

Lab-Wide Usage

Ballistic Missile Defense Laser Propagation Simulation Hyperspectral Imaging Passive Sonar

Airborne Ground MovingTarget Indicator (GMTI)

Airborne Synthetic ApertureRadar (SAR)

Matlab*P

PVL

MatlabMPI

7/29/2019 Kepner

6/27

Slide-6

Quicklook

MIT Lincoln Laboratory

QuickLook Concept

Analyst Workstation

Running Matlab

Streaming

Sensor Data

Data Files SAR

GMTI

(new)

RAID Disk

Recorder

28 CPU Bladed Cluster

Running pMatlab

7/29/2019 Kepner

7/27


Quicklook

ConOps

Ruggedizat ion

Integration

Outline

Introduction

System

Software

Results

Summary

7/29/2019 Kepner

8/27


Quicklook

Concept of Operations

Analyst Workstation

Running Matlab

Streaming

Sensor Data

Split files,

Copy w/rcp SARGMTI

(new)

RAID DiskRecorder Bladed Cluster

Running pMatlab

600 MB/s

(1x RT)

Gbit Ethernet

(1/4x RT rate)

(1 TB local storage ~ 20 min data)

Xwindows

over Lan

To Other

Systems

Record Streaming Data

Copy to Bladed Cluster

Process on Bladed Cluster

Process on SGI

~1 seconds = 1 dwell

~30 Seconds

2 Dwell ~2 minutes1st CPI ~ 1 minutes

2 Dwells ~1 hour1st CPI ~ 2 minutes

Timeline

Net benefit: 2 Dwells in 2 minutes vs. 1 hour

7/29/2019 Kepner

9/27


Quicklook

X-axis, 13 CPU/13 HD

0

10

20

30

40

50

60

16 128

1,0

24

8,1

92

65,53

6

524,2

88

4,1

94,3

04

33,55

4,4

32

Message Sizes (Bytes)

Thro

ughput(MBps)

No Vibration

~0.7G (-6dB)

~1.0G (-3dB)

1.4G (0dB)

Vibration Tests

Tested only at operational (i.e.in-flight) levels:

0dB = 1.4G (above normal)

-3dB = ~1.0G (normal)

-6dB = ~0.7G (below normal)

Tested in all 3 dimensions

Ran MatlabMPI file based

communication test up 14CPUs/14 Hard drives

Throughput decreases seen at1.4 G

7/29/2019 Kepner

10/27


Quicklook

Thermal Tests

Temperature ranges

Test range: -20C to 40C Bladecenter spec: 10C to 35C

Cooling tests

Successfully cooled to -10C Failed at -20C Cargo bay typically 0C

Heating tests

Used duct to draw outside air tocool cluster inside oven

Successfully heated to 40C Outside air cooled cluster to 36C

7/29/2019 Kepner

11/27

MIT Lincoln LaboratorySlide-11Quicklook

Mitigation Strategies

IBM Bladecenter is not designedfor 707s operationalenvironment

Strategies to minimize risk ofdamage:

1. Power down during takeoff/landing

Avoids damage to hard drives

Radar is also powered down

2. Construct duct to draw cabin airinto cluster

Stabilizes cluster temperature Prevents condensation of cabin air moisture within cluster

7/29/2019 Kepner

12/27


Integration

SGI

RAID

SGI RAID System

Scan catalog files, select dwells andCPIs to process (C/C shell)

Assign dwells/CPIs to nodes, package

up signature / aux data, one CPI per

file.Transfer data from SGI to each

processors disk (Matlab)

IBM Bladed Cluster

Nodes process CPIs in parallel, write

results onto node 1s disk. Node 1

processor performs final

processing

Results displayed locally

pMatlab allows integration to occur while algorithm is being finalized

VP1

VP2P1

NODE

1

VP1

VP2P2

Bladed Cluster

GigabitConnection

VP1

VP2P1

NODE

14

VP1

VP2P2

rcp

7/29/2019 Kepner

13/27


pMatlab architecture GMTI

SAR

Outline

Introduction

Hardware

Software

Results

Summary

7/29/2019 Kepner

14/27


Can build applications with a fewparallel structures and functions pMatlab provides parallel arrays

and functionsX = ones(n,mapX);

Y = zeros(n,mapY);Y(:,:) = fft(X);

Library Layer (pMatlab)

MatlabMPI & pMatlab Software Layers

Vector/Matrix CompTask

Conduit

Application

Parallel

LibraryParallel

Hardware

Input Analysis Output

User

InterfaceHardware

InterfaceKernel Layer

Math (Matlab)Messaging (MatlabMPI)

Can build a parallel library with afew messaging primitives MatlabMPI provides this

messaging capability:

MPI_Send(dest,comm,tag,X);X = MPI_Recv(source,comm,tag);

7/29/2019 Kepner

15/27


LiMIT GMTI

Demonstrates pMatlab in a large multi-stage application

~13,000 lines of Matlab code Driving new pMatlab features

Parallel sparse matrices for targets (dynamic data sizes)Potential enabler for a whole new class of parallel algorithms

Applying to DARPA HPCS GraphTheory and NSA benchmarks

Mapping functions for system integration

Needs expert components!

DWELLAUX

Input SIG

Data

Input Aux

Data

Input LOD

Data

Deal to

Nodes

SIG

LOD

Dwell Detect

Processing

Angle/Param Est

Geolocate

Display

INS

SUBBAND (1,12, or 48)

PC Range Walk

Correction

Crab

Correct

Doppler

Process

Adaptive

BeamformSIG

LOD

STAP

CPI (N/dwell, parallel)

Recon Range WalkCorrection

Beam ResteerCorrection

CPI DetectionProcessing

INS

Process

EQ SubbandSIG SIG

EQ SubbandLODINS LOD

EQCEQC

GMTI Block Diagram Parallel Implementation

Approach

Deal out CPIs to different CPUs

Performance

TIME/NODE/CPI ~100 sec

TIME FOR ALL 28 CPIS ~200 sec

Speedup ~14x

7/29/2019 Kepner

16/27


GMTI pMatlab Implementation

GMTI pMatlab code fragment

% Create distribution spec: b = block, c = cyclic.

dist_spec(1).dist = 'b';

dist_spec(2).dist = 'c';

% Create Parallel Map.

pMap = map([1 MAPPING.Ncpus],dist_spec,0:MAPPING.Ncpus-1);

% Get local indices.

[lind.dim_1_ind lind.dim_2_ind] = global_ind(zeros(1,C*D,pMap));

% loop over local part

for index = 1:length(lind.dim_2_ind)

...end

pMatlab primarily used for determining which CPIs to work on CPIs dealt out using a cyclic distribution

7/29/2019 Kepner

17/27


LiMIT SAR

Most complex pMatlab application built (at that time) ~4000 lines of Matlab code

CornerTurns of ~1 GByte data cubes

Drove new pMatlab features Improving Corner turn performance

Working with Mathworks to improve

Selection of submatricesWill be a key enabler for parallel linear algebra (LU, QR, )

Large memory footprint applicationsCan the file system be used more effectively

A/D FFTSelect

FFT Bins(downsample)

EqualizationCoefficients

Chirp

Coefficients Polar

Remap

SAR

w.

Autofocus

Real Samples

@480MS/Sec

Complex Samples

@180MS/Sec

Collect

Pulse

Buffer

Buffer

Collect

Cube

Histogram

&

Register

Image

DisplayIMUOutput

8 8 8 8 8

SAR Block Diagram

7/29/2019 Kepner

18/27


SAR pMatlab Implementation

SAR pMatlab code fragment

% Create Parallel Maps.

mapA = map([1 Ncpus],0:Ncpus-1);

mapB = map([Ncpus 1],0:Ncpus-1);

% Prepare distributed Matrices.

fd_midc=zeros(mw,TotalnumPulses,mapA);

fd_midr=zeros(mw,TotalnumPulses,mapB);

% Corner Turn (columns to rows).

fd_midr(:,:) = fd_midc;

Cornerturn Communication performed by overloaded = operator Determines which pieces of matrix belongs where

Executes appropriate MatlabMPI send commands

7/29/2019 Kepner

19/27


Scal ing Results Mission Results

Future Work

Outline

Introduction

Implementation

Results

Summary

7/29/2019 Kepner

20/27


Parallel Performance

0

10

20

30

0 10 20 30

GMTI (1 per node)

GMTI (2 per node)

SAR (1 per node)

Linear

Number of Processors

ParallelSpeedup

7/29/2019 Kepner

21/27


SAR Parallel Performance

Application memory requirements too large for 1 CPU

pMatlab a requirement for this application

Corner Turn performance is limiting factor

Optimization efforts have improved time by 30%

Believe additional improvement is possible

Corner Turn bandwidth

7/29/2019 Kepner

22/27


July Mission Plan

Final Integration

Debug pMatlab on plane Working ~1 week before mission (~1 week after first flight)

Development occurred during mission

Flight Plan Two data collection flights

Flew a 50 km diameter box

Six GPS-instrumented vehicles

Two 2.5T trucks

Two CUCV'sTwo M577's

7/29/2019 Kepner

23/27


July Mission Environment

Stressing desert environment

7/29/2019 Kepner

24/27


July Mission GMTI results

GMTI successfully run on 707 in flight Target reports

Range Doppler images

Plans to use QuickLook for streamingprocessing in October mission

7/29/2019 Kepner

25/27


Embedded Computing Alternatives

Embedded Computer Systems

Designed for embedded signal processing

Advantages

1. Rugged - Certified Mil Spec

2. Lab has in-house experience

Disadvantage

1. Proprietary OS No Matlab

Octave Matlab clone

Advantage

1. MatlabMPI demonstrated using Octaveon SKY computer hardware

Disadvantages

1. Less functionality2. Slower?

3. No object-oriented support NopMatlab support Greater coding effort

7/29/2019 Kepner

26/27


Petascale pMatlab

pMapper: automatically finds best parallel mapping

pOoc: allows disk to be used as memory

pMex: allows use of optimized parallel libraries (e.g. PVL)

MULTFFTFFTA B C D E

Parallel Computer Optimal Mapping

~1 GByte

RAM

Matlab~1 GByte

RAM

pMatlab (N x GByte)~1 GByte

RAM

~1 GByte

RAM

Petascale pMatlab (N x TByte)~1 GByte

RAM

~1 TByte

RAID disk

~1 TByte

RAID disk

0

25

50

75

100

10 100 1000 10000

in core FFT

out of core FFT

Matrix Size (MBytes)Performance(MFlops)

pMatlab User Interface

Matlab*P Client/Server pMatlab ToolboxpMex

dmat/ddens

translatorParallel Libraries:PVL, ||VSIPL++, ScaLapack

Matlab math libraries

7/29/2019 Kepner

27/27


Summary

Airborne research platforms typically collect and processdata later

pMatlab, bladed clusters and high speed disks enableparallel processing in the air

Reduces execution time from hours to minutes

Uses rapid prototyping environment required for research

Successfully demonstrated in LiMIT Boeing 707 First ever in flight use of bladed clusters or parallel Matlab

Planned for continued use Real Time streaming of GMTI to other assets

Drives new requirements for pMatlab Expert mapping

Parallel Out-of-Core

pmex

Documents

Kepner