Kepner

Embed Size (px)

Citation preview

  • 7/29/2019 Kepner

    1/27

    Slide-1

    Quicklook

    MIT Lincoln Laboratory

    Deployment of SAR and GMTI Signal Processingon a Boeing 707 Aircraft using pMatlab and a

    Bladed Linux Cluster

    Jeremy Kepner, Tim Currie, Hahn Kim, Bipin Mathew,Andrew McCabe, Michael Moore, Dan Rabinkin, Albert

    Reuther, Andrew Rhoades, Lou Tella and Nadya Travinin

    September 28, 2004

    This work is sponsored by the Department of the Air Force under Air Force contract F19628-00-C-002.

    Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily

    endorsed by the United States Government.

  • 7/29/2019 Kepner

    2/27

    MIT Lincoln LaboratorySlide-2

    Quicklook

    LiMIT

    Technical Chal lenge

    pMatlab

    QuickLook Concept

    Outline

    Introduction

    System

    Software

    Results

    Summary

  • 7/29/2019 Kepner

    3/27

    MIT Lincoln LaboratorySlide-3

    Quicklook

    LiMIT

    Lincoln Multifunction Intelligence, Surveillance andReconnaissance Testbed

    Boeing 707 aircraft

    Fully equipped with sensors and networking

    Airborne research laboratory for development, testing, andevaluation of sensors and processing algorithms

    Employs Standard Processing Model for Research Platform

    Collect in the air/process on the ground

  • 7/29/2019 Kepner

    4/27

    MIT Lincoln LaboratorySlide-4

    Quicklook

    Processing Challenge

    Can we process radar data (SAR & GMTI) in flight and

    provide feedback on sensor performance in flight?

    Requirements and Enablers

    Record and playback data

    High speed RAID disk system

    High speed network

    High density parallel computing

    Ruggedized bladed Linux cluster

    Rapid algorithm development

    pMatlab

    SGI

    RAID DiskRecorder

    14x2 CPU

    IBM Blade Cluster

  • 7/29/2019 Kepner

    5/27

    MIT Lincoln LaboratorySlide-5

    Quicklook

    High Performance Matlab Applications

    pMatlab: Parallel Matlab Toolbox

    DoD Sensor

    Processing

    Parallel MatlabToolbox

    DoD Decision

    SupportScientific

    SimulationCommercial

    Applications

    User

    Interface

    Hardware

    Interface

    Parallel Computing Hardware

    Goals

    Matlab speedup throughtransparent parallelism Near-real-time rapid

    prototyping

    Lab-Wide Usage

    Ballistic Missile Defense Laser Propagation Simulation Hyperspectral Imaging Passive Sonar

    Airborne Ground MovingTarget Indicator (GMTI)

    Airborne Synthetic ApertureRadar (SAR)

    Matlab*P

    PVL

    MatlabMPI

  • 7/29/2019 Kepner

    6/27

    Slide-6

    Quicklook

    MIT Lincoln Laboratory

    QuickLook Concept

    Analyst Workstation

    Running Matlab

    Streaming

    Sensor Data

    Data Files SAR

    GMTI

    (new)

    RAID Disk

    Recorder

    28 CPU Bladed Cluster

    Running pMatlab

  • 7/29/2019 Kepner

    7/27

    MIT Lincoln LaboratorySlide-7

    Quicklook

    ConOps

    Ruggedizat ion

    Integration

    Outline

    Introduction

    System

    Software

    Results

    Summary

  • 7/29/2019 Kepner

    8/27

    MIT Lincoln LaboratorySlide-8

    Quicklook

    Concept of Operations

    Analyst Workstation

    Running Matlab

    Streaming

    Sensor Data

    Split files,

    Copy w/rcp SARGMTI

    (new)

    RAID DiskRecorder Bladed Cluster

    Running pMatlab

    600 MB/s

    (1x RT)

    Gbit Ethernet

    (1/4x RT rate)

    (1 TB local storage ~ 20 min data)

    Xwindows

    over Lan

    To Other

    Systems

    Record Streaming Data

    Copy to Bladed Cluster

    Process on Bladed Cluster

    Process on SGI

    ~1 seconds = 1 dwell

    ~30 Seconds

    2 Dwell ~2 minutes1st CPI ~ 1 minutes

    2 Dwells ~1 hour1st CPI ~ 2 minutes

    Timeline

    Net benefit: 2 Dwells in 2 minutes vs. 1 hour

  • 7/29/2019 Kepner

    9/27

    MIT Lincoln LaboratorySlide-9

    Quicklook

    X-axis, 13 CPU/13 HD

    0

    10

    20

    30

    40

    50

    60

    16 128

    1,0

    24

    8,1

    92

    65,53

    6

    524,2

    88

    4,1

    94,3

    04

    33,55

    4,4

    32

    Message Sizes (Bytes)

    Thro

    ughput(MBps)

    No Vibration

    ~0.7G (-6dB)

    ~1.0G (-3dB)

    1.4G (0dB)

    Vibration Tests

    Tested only at operational (i.e.in-flight) levels:

    0dB = 1.4G (above normal)

    -3dB = ~1.0G (normal)

    -6dB = ~0.7G (below normal)

    Tested in all 3 dimensions

    Ran MatlabMPI file based

    communication test up 14CPUs/14 Hard drives

    Throughput decreases seen at1.4 G

  • 7/29/2019 Kepner

    10/27

    MIT Lincoln LaboratorySlide-10

    Quicklook

    Thermal Tests

    Temperature ranges

    Test range: -20C to 40C Bladecenter spec: 10C to 35C

    Cooling tests

    Successfully cooled to -10C Failed at -20C Cargo bay typically 0C

    Heating tests

    Used duct to draw outside air tocool cluster inside oven

    Successfully heated to 40C Outside air cooled cluster to 36C

  • 7/29/2019 Kepner

    11/27

    MIT Lincoln LaboratorySlide-11Quicklook

    Mitigation Strategies

    IBM Bladecenter is not designedfor 707s operationalenvironment

    Strategies to minimize risk ofdamage:

    1. Power down during takeoff/landing

    Avoids damage to hard drives

    Radar is also powered down

    2. Construct duct to draw cabin airinto cluster

    Stabilizes cluster temperature Prevents condensation of cabin air moisture within cluster

  • 7/29/2019 Kepner

    12/27

    MIT Lincoln LaboratorySlide-12Quicklook

    Integration

    SGI

    RAID

    SGI RAID System

    Scan catalog files, select dwells andCPIs to process (C/C shell)

    Assign dwells/CPIs to nodes, package

    up signature / aux data, one CPI per

    file.Transfer data from SGI to each

    processors disk (Matlab)

    IBM Bladed Cluster

    Nodes process CPIs in parallel, write

    results onto node 1s disk. Node 1

    processor performs final

    processing

    Results displayed locally

    pMatlab allows integration to occur while algorithm is being finalized

    VP1

    VP2P1

    NODE

    1

    VP1

    VP2P2

    Bladed Cluster

    GigabitConnection

    VP1

    VP2P1

    NODE

    14

    VP1

    VP2P2

    rcp

  • 7/29/2019 Kepner

    13/27

    MIT Lincoln LaboratorySlide-13Quicklook

    pMatlab architecture GMTI

    SAR

    Outline

    Introduction

    Hardware

    Software

    Results

    Summary

  • 7/29/2019 Kepner

    14/27

    MIT Lincoln LaboratorySlide-14Quicklook

    Can build applications with a fewparallel structures and functions pMatlab provides parallel arrays

    and functionsX = ones(n,mapX);

    Y = zeros(n,mapY);Y(:,:) = fft(X);

    Library Layer (pMatlab)

    MatlabMPI & pMatlab Software Layers

    Vector/Matrix CompTask

    Conduit

    Application

    Parallel

    LibraryParallel

    Hardware

    Input Analysis Output

    User

    InterfaceHardware

    InterfaceKernel Layer

    Math (Matlab)Messaging (MatlabMPI)

    Can build a parallel library with afew messaging primitives MatlabMPI provides this

    messaging capability:

    MPI_Send(dest,comm,tag,X);X = MPI_Recv(source,comm,tag);

  • 7/29/2019 Kepner

    15/27

    MIT Lincoln LaboratorySlide-15Quicklook

    LiMIT GMTI

    Demonstrates pMatlab in a large multi-stage application

    ~13,000 lines of Matlab code Driving new pMatlab features

    Parallel sparse matrices for targets (dynamic data sizes)Potential enabler for a whole new class of parallel algorithms

    Applying to DARPA HPCS GraphTheory and NSA benchmarks

    Mapping functions for system integration

    Needs expert components!

    DWELLAUX

    Input SIG

    Data

    Input Aux

    Data

    Input LOD

    Data

    Deal to

    Nodes

    SIG

    LOD

    Dwell Detect

    Processing

    Angle/Param Est

    Geolocate

    Display

    INS

    SUBBAND (1,12, or 48)

    PC Range Walk

    Correction

    Crab

    Correct

    Doppler

    Process

    Adaptive

    BeamformSIG

    LOD

    STAP

    CPI (N/dwell, parallel)

    Recon Range WalkCorrection

    Beam ResteerCorrection

    CPI DetectionProcessing

    INS

    Process

    EQ SubbandSIG SIG

    EQ SubbandLODINS LOD

    EQCEQC

    GMTI Block Diagram Parallel Implementation

    Approach

    Deal out CPIs to different CPUs

    Performance

    TIME/NODE/CPI ~100 sec

    TIME FOR ALL 28 CPIS ~200 sec

    Speedup ~14x

  • 7/29/2019 Kepner

    16/27

    MIT Lincoln LaboratorySlide-16Quicklook

    GMTI pMatlab Implementation

    GMTI pMatlab code fragment

    % Create distribution spec: b = block, c = cyclic.

    dist_spec(1).dist = 'b';

    dist_spec(2).dist = 'c';

    % Create Parallel Map.

    pMap = map([1 MAPPING.Ncpus],dist_spec,0:MAPPING.Ncpus-1);

    % Get local indices.

    [lind.dim_1_ind lind.dim_2_ind] = global_ind(zeros(1,C*D,pMap));

    % loop over local part

    for index = 1:length(lind.dim_2_ind)

    ...end

    pMatlab primarily used for determining which CPIs to work on CPIs dealt out using a cyclic distribution

  • 7/29/2019 Kepner

    17/27

    MIT Lincoln LaboratorySlide-17Quicklook

    LiMIT SAR

    Most complex pMatlab application built (at that time) ~4000 lines of Matlab code

    CornerTurns of ~1 GByte data cubes

    Drove new pMatlab features Improving Corner turn performance

    Working with Mathworks to improve

    Selection of submatricesWill be a key enabler for parallel linear algebra (LU, QR, )

    Large memory footprint applicationsCan the file system be used more effectively

    A/D FFTSelect

    FFT Bins(downsample)

    EqualizationCoefficients

    Chirp

    Coefficients Polar

    Remap

    SAR

    w.

    Autofocus

    Real Samples

    @480MS/Sec

    Complex Samples

    @180MS/Sec

    Collect

    Pulse

    Buffer

    Buffer

    Collect

    Cube

    Histogram

    &

    Register

    Image

    DisplayIMUOutput

    8 8 8 8 8

    SAR Block Diagram

  • 7/29/2019 Kepner

    18/27

    MIT Lincoln LaboratorySlide-18Quicklook

    SAR pMatlab Implementation

    SAR pMatlab code fragment

    % Create Parallel Maps.

    mapA = map([1 Ncpus],0:Ncpus-1);

    mapB = map([Ncpus 1],0:Ncpus-1);

    % Prepare distributed Matrices.

    fd_midc=zeros(mw,TotalnumPulses,mapA);

    fd_midr=zeros(mw,TotalnumPulses,mapB);

    % Corner Turn (columns to rows).

    fd_midr(:,:) = fd_midc;

    Cornerturn Communication performed by overloaded = operator Determines which pieces of matrix belongs where

    Executes appropriate MatlabMPI send commands

  • 7/29/2019 Kepner

    19/27

    MIT Lincoln LaboratorySlide-19Quicklook

    Scal ing Results Mission Results

    Future Work

    Outline

    Introduction

    Implementation

    Results

    Summary

  • 7/29/2019 Kepner

    20/27

    MIT Lincoln LaboratorySlide-20Quicklook

    Parallel Performance

    0

    10

    20

    30

    0 10 20 30

    GMTI (1 per node)

    GMTI (2 per node)

    SAR (1 per node)

    Linear

    Number of Processors

    ParallelSpeedup

  • 7/29/2019 Kepner

    21/27

    MIT Lincoln LaboratorySlide-21Quicklook

    SAR Parallel Performance

    Application memory requirements too large for 1 CPU

    pMatlab a requirement for this application

    Corner Turn performance is limiting factor

    Optimization efforts have improved time by 30%

    Believe additional improvement is possible

    Corner Turn bandwidth

  • 7/29/2019 Kepner

    22/27

    MIT Lincoln LaboratorySlide-22Quicklook

    July Mission Plan

    Final Integration

    Debug pMatlab on plane Working ~1 week before mission (~1 week after first flight)

    Development occurred during mission

    Flight Plan Two data collection flights

    Flew a 50 km diameter box

    Six GPS-instrumented vehicles

    Two 2.5T trucks

    Two CUCV'sTwo M577's

  • 7/29/2019 Kepner

    23/27

    MIT Lincoln LaboratorySlide-23Quicklook

    July Mission Environment

    Stressing desert environment

  • 7/29/2019 Kepner

    24/27

    MIT Lincoln LaboratorySlide-24Quicklook

    July Mission GMTI results

    GMTI successfully run on 707 in flight Target reports

    Range Doppler images

    Plans to use QuickLook for streamingprocessing in October mission

  • 7/29/2019 Kepner

    25/27

    MIT Lincoln LaboratorySlide-25Quicklook

    Embedded Computing Alternatives

    Embedded Computer Systems

    Designed for embedded signal processing

    Advantages

    1. Rugged - Certified Mil Spec

    2. Lab has in-house experience

    Disadvantage

    1. Proprietary OS No Matlab

    Octave Matlab clone

    Advantage

    1. MatlabMPI demonstrated using Octaveon SKY computer hardware

    Disadvantages

    1. Less functionality2. Slower?

    3. No object-oriented support NopMatlab support Greater coding effort

  • 7/29/2019 Kepner

    26/27

    MIT Lincoln LaboratorySlide-26Quicklook

    Petascale pMatlab

    pMapper: automatically finds best parallel mapping

    pOoc: allows disk to be used as memory

    pMex: allows use of optimized parallel libraries (e.g. PVL)

    MULTFFTFFTA B C D E

    Parallel Computer Optimal Mapping

    ~1 GByte

    RAM

    Matlab~1 GByte

    RAM

    pMatlab (N x GByte)~1 GByte

    RAM

    ~1 GByte

    RAM

    Petascale pMatlab (N x TByte)~1 GByte

    RAM

    ~1 TByte

    RAID disk

    ~1 TByte

    RAID disk

    0

    25

    50

    75

    100

    10 100 1000 10000

    in core FFT

    out of core FFT

    Matrix Size (MBytes)Performance(MFlops)

    pMatlab User Interface

    Matlab*P Client/Server pMatlab ToolboxpMex

    dmat/ddens

    translatorParallel Libraries:PVL, ||VSIPL++, ScaLapack

    Matlab math libraries

  • 7/29/2019 Kepner

    27/27

    MIT Lincoln LaboratorySlide-27

    Summary

    Airborne research platforms typically collect and processdata later

    pMatlab, bladed clusters and high speed disks enableparallel processing in the air

    Reduces execution time from hours to minutes

    Uses rapid prototyping environment required for research

    Successfully demonstrated in LiMIT Boeing 707 First ever in flight use of bladed clusters or parallel Matlab

    Planned for continued use Real Time streaming of GMTI to other assets

    Drives new requirements for pMatlab Expert mapping

    Parallel Out-of-Core

    pmex