19
COCOA (1/19) Real Time Systems LAB. COCOA MAY 31, 2001 김김김 , 김김김

COCOA(1/19) Real Time Systems LAB. COCOA MAY 31, 2001 김경임, 박성호

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

COCOA (1/19) Real Time Systems LAB.

COCOACOCOA

MAY 31, 2001김경임 , 박성호

COCOA (2/19) Real Time Systems LAB.

Contents• Background• COCOA Overview• System Architecture• Key Technologies• Application Area• Evaluation• Conclusion• References

COCOA (3/19) Real Time Systems LAB.

Background• A Thesis in Aerospace engineering, Pennsylvania

State Univ. by Anirudh Modi, 1999– “Unsteady separated flow simulations using a

cluster of workstations”

• Need to a suitable platform for the efficiency & accuracy of PUMA(a parallel flow solver) – Resolving several steady solutions– A fully three-dimensional unsteady separated flow

around a sphere

• PUMA : the Parallel Unstructured Maritime Aerodynamics

• Financial support : the Rotorcraft Center of Excellence(RCOE) at Penn State

COCOA (4/19) Real Time Systems LAB.

COCOA Overview• The COst effective COmputing Array(COCOA)

• A Beowulf cluster that have 50 processors

• To bring low cost parallel computing– The whole system cost approximately $100,000 (1998 US dollars)

• Performance – the benchmark shows that was almost twice as fast

as the Penn State IBM SP (older RS/6000-370 nodes) supercomputer for this applications

COCOA (5/19) Real Time Systems LAB.

System Architecture• Computing Node(26 WS-410 Dell W/S )

– Dual 400MHz Intel Pentium II Processors w/512K L2 Cache

– 512MB SDRAM– 4GB UW-SCSI2 Disk– 3com 3c509B 100Mbits/sec Fast Ethernet Card– 32x SCSI CD-ROM Drive– 1.44MB FDD– Cables

• In addition,– One Baynetworks 450T 24-way 100Mbits/sec Switch– Two 16-way Monitor/keyboard/mouse Switches– Four 500 kVa APC UPS – For one server : one monitor, keyboard, mouse and

54GB extra UW-SCSI2 HDD

COCOA (6/19) Real Time Systems LAB.

System Architecture cont.

• Setting up H/W

Node1 Node2 Node3 Node25

Switch

Server

...

COCOA (7/19) Real Time Systems LAB.

System Architecture cont.• Operating System

– RedHat Linux 5.1

• Software– Base packages from RedHat Linux 5.1,

Kernel#2.0.36– Freeware GNU C/C++ compiler(gcc, pgcc)– Fortran77/90 compiler & Debugger by Portland

Group– Freeware MPI libraries for parallel programming in

C/C++/Fortran77/90– ssh-1.2.26 for secure access– DQS v3.0, a queueing system– Scientific Visualization Software TECPLOT from

Amtec Corp.

COCOA (8/19) Real Time Systems LAB.

Key Technologies• Beowulf Cluster

– A system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other fast network

– Developed for large scale computing, such as aerodynamics, atmosphere, physics, etc.

– First Developed at 1994 in NASA

– Low price supercomputing is possible• High performance/low price processors• High speed network devices available

– Numerous Beowulf clusters developed• Used in various computational science fields

COCOA (9/19) Real Time Systems LAB.

Key Technologies cont.

• DQS (Distributed Queuing System)– Developed to experiment batch queuing system at the Super-computer Computations Research Institute,

Florida State Univ.– Provide a single coherent allocation and management

• MPI (Message Passing Interface)– Standard for parallel programming

• SSH (Secure Shell)– Program for logging & executing commands into/on a

remote machine– Provides secure encrypted communication inter-un-

trusted hosts over an insecure network

COCOA (10/19) Real Time Systems LAB.

Application Area• Analysis maritime aerodynamics

– Analysis flows over complex configurations (like ships and helicopter fuselages)

– Use PUMA

– Details of problem:Helicopter can safely land on frigate in the North Sea only 10 percent of the time in winter

COCOA (11/19) Real Time Systems LAB.

PUMA (Parallel Unstructured Maritime

Aerodynamics)

• Program for analysis of internal and external non-reacting compressible flows over arbitrarily complex 3D geometries

• Written entirely in ANSI C using MPI library for message passing and hence highly portable giving good performance

COCOA (12/19) Real Time Systems LAB.

PUMA (Parallel Unstructured Maritime

Aerodynamics) cont.• Use domain decomposition

– Domain decomposition• Distribute data across processes, and each process

performing approximately same operation on the data• Problem level parallelism, but loop level (not SIMD)• Minimize communications cost

– Functional decomposition• Divides a problem into several distinct tasks that may be

executed in parallel

• Parallelization in PUMA– Each compute node read its own portion of the grid file

at startup– Each compute node generate the flow solution over the

given grid, parallelly

COCOA (13/19) Real Time Systems LAB.

PUMA (Parallel Unstructured Maritime

Aerodynamics) cont.CAD Package,

GridTool, VGrid ...

Make grid

PUMA

Generates the flow solution over the given grid, parallellyand

combine the solution into a single file

toTecplot utility,Tecplot

Display the solution

COCOA (14/19) Real Time Systems LAB.

PUMA (Parallel Unstructured Maritime

Aerodynamics) cont.• Modifications to PUMA

– Modify PUMA to read several hundred lines at a time and broadcasting the combined data to every processor using a reasonably sized buffer

– Modify MPI to combine several small messages into one before starting communication

Mbits/sec vs Packet size on COCOA

for MPI_Send/Recv test

COCOA (15/19) Real Time Systems LAB.

PUMA (Parallel Unstructured Maritime

Aerodynamics) cont.

Improvement in PUMA performance after combining several small MPI messages into

one

COCOA (16/19) Real Time Systems LAB.

Evaluation

Total Mflops vs Number of Processors on COCOA for PUMA

test case

Speed-up vs Number of Processors on COCOA for PUMA

test case

COCOA (17/19) Real Time Systems LAB.

Evaluation cont.

NAS Parallel Benchmark on COCOA:comparison with other machines for Class “C” LU

test

COCOA (18/19) Real Time Systems LAB.

Conclusion• Beowulf class supercomputer (PC, Linux, MPI, DQS, SSH)

• Cost effective supercomputer for numerical simulations– Almost twice as fast compared to the Penn State IBM-SP

supercomputer,for our production codes including PUMA, given the same number of processors, while being built at a fraction of the cost ($100,000(1998 US dollars)).

• Be suitable for only numerical simulation part (weather, fluid...) that doesn’t have high communication to computation ratios, because of the high communication latency.

• Good scalability with most of the MPI applications used

• The Object, to build Cost effective supercomputer for numerical simulations dealt with at Penn State has been fulfilled.

COCOA (19/19) Real Time Systems LAB.

References• COCOA : http://cocoa.ihpca.psu.edu

• NAS Parallel Benchmarks : http://science.nas.nasa.gov/Software/NPB

• Beowulf : http://www.beowulf.org

• RedHat : http://www.redhat.com

• MPI : http://www.mcs.anl.gov/mpi

• DQS : http://www.scri.fsu.edu/~pasko/dqs.html

• Tons of references…