View
214
Download
1
Embed Size (px)
Citation preview
COCOA (2/19) Real Time Systems LAB.
Contents• Background• COCOA Overview• System Architecture• Key Technologies• Application Area• Evaluation• Conclusion• References
COCOA (3/19) Real Time Systems LAB.
Background• A Thesis in Aerospace engineering, Pennsylvania
State Univ. by Anirudh Modi, 1999– “Unsteady separated flow simulations using a
cluster of workstations”
• Need to a suitable platform for the efficiency & accuracy of PUMA(a parallel flow solver) – Resolving several steady solutions– A fully three-dimensional unsteady separated flow
around a sphere
• PUMA : the Parallel Unstructured Maritime Aerodynamics
• Financial support : the Rotorcraft Center of Excellence(RCOE) at Penn State
COCOA (4/19) Real Time Systems LAB.
COCOA Overview• The COst effective COmputing Array(COCOA)
• A Beowulf cluster that have 50 processors
• To bring low cost parallel computing– The whole system cost approximately $100,000 (1998 US dollars)
• Performance – the benchmark shows that was almost twice as fast
as the Penn State IBM SP (older RS/6000-370 nodes) supercomputer for this applications
COCOA (5/19) Real Time Systems LAB.
System Architecture• Computing Node(26 WS-410 Dell W/S )
– Dual 400MHz Intel Pentium II Processors w/512K L2 Cache
– 512MB SDRAM– 4GB UW-SCSI2 Disk– 3com 3c509B 100Mbits/sec Fast Ethernet Card– 32x SCSI CD-ROM Drive– 1.44MB FDD– Cables
• In addition,– One Baynetworks 450T 24-way 100Mbits/sec Switch– Two 16-way Monitor/keyboard/mouse Switches– Four 500 kVa APC UPS – For one server : one monitor, keyboard, mouse and
54GB extra UW-SCSI2 HDD
COCOA (6/19) Real Time Systems LAB.
System Architecture cont.
• Setting up H/W
Node1 Node2 Node3 Node25
Switch
Server
...
COCOA (7/19) Real Time Systems LAB.
System Architecture cont.• Operating System
– RedHat Linux 5.1
• Software– Base packages from RedHat Linux 5.1,
Kernel#2.0.36– Freeware GNU C/C++ compiler(gcc, pgcc)– Fortran77/90 compiler & Debugger by Portland
Group– Freeware MPI libraries for parallel programming in
C/C++/Fortran77/90– ssh-1.2.26 for secure access– DQS v3.0, a queueing system– Scientific Visualization Software TECPLOT from
Amtec Corp.
COCOA (8/19) Real Time Systems LAB.
Key Technologies• Beowulf Cluster
– A system which usually consists of one server node, and one or more client nodes connected together via Ethernet or some other fast network
– Developed for large scale computing, such as aerodynamics, atmosphere, physics, etc.
– First Developed at 1994 in NASA
– Low price supercomputing is possible• High performance/low price processors• High speed network devices available
– Numerous Beowulf clusters developed• Used in various computational science fields
COCOA (9/19) Real Time Systems LAB.
Key Technologies cont.
• DQS (Distributed Queuing System)– Developed to experiment batch queuing system at the Super-computer Computations Research Institute,
Florida State Univ.– Provide a single coherent allocation and management
• MPI (Message Passing Interface)– Standard for parallel programming
• SSH (Secure Shell)– Program for logging & executing commands into/on a
remote machine– Provides secure encrypted communication inter-un-
trusted hosts over an insecure network
COCOA (10/19) Real Time Systems LAB.
Application Area• Analysis maritime aerodynamics
– Analysis flows over complex configurations (like ships and helicopter fuselages)
– Use PUMA
– Details of problem:Helicopter can safely land on frigate in the North Sea only 10 percent of the time in winter
COCOA (11/19) Real Time Systems LAB.
PUMA (Parallel Unstructured Maritime
Aerodynamics)
• Program for analysis of internal and external non-reacting compressible flows over arbitrarily complex 3D geometries
• Written entirely in ANSI C using MPI library for message passing and hence highly portable giving good performance
COCOA (12/19) Real Time Systems LAB.
PUMA (Parallel Unstructured Maritime
Aerodynamics) cont.• Use domain decomposition
– Domain decomposition• Distribute data across processes, and each process
performing approximately same operation on the data• Problem level parallelism, but loop level (not SIMD)• Minimize communications cost
– Functional decomposition• Divides a problem into several distinct tasks that may be
executed in parallel
• Parallelization in PUMA– Each compute node read its own portion of the grid file
at startup– Each compute node generate the flow solution over the
given grid, parallelly
COCOA (13/19) Real Time Systems LAB.
PUMA (Parallel Unstructured Maritime
Aerodynamics) cont.CAD Package,
GridTool, VGrid ...
Make grid
PUMA
Generates the flow solution over the given grid, parallellyand
combine the solution into a single file
toTecplot utility,Tecplot
Display the solution
COCOA (14/19) Real Time Systems LAB.
PUMA (Parallel Unstructured Maritime
Aerodynamics) cont.• Modifications to PUMA
– Modify PUMA to read several hundred lines at a time and broadcasting the combined data to every processor using a reasonably sized buffer
– Modify MPI to combine several small messages into one before starting communication
Mbits/sec vs Packet size on COCOA
for MPI_Send/Recv test
COCOA (15/19) Real Time Systems LAB.
PUMA (Parallel Unstructured Maritime
Aerodynamics) cont.
Improvement in PUMA performance after combining several small MPI messages into
one
COCOA (16/19) Real Time Systems LAB.
Evaluation
Total Mflops vs Number of Processors on COCOA for PUMA
test case
Speed-up vs Number of Processors on COCOA for PUMA
test case
COCOA (17/19) Real Time Systems LAB.
Evaluation cont.
NAS Parallel Benchmark on COCOA:comparison with other machines for Class “C” LU
test
COCOA (18/19) Real Time Systems LAB.
Conclusion• Beowulf class supercomputer (PC, Linux, MPI, DQS, SSH)
• Cost effective supercomputer for numerical simulations– Almost twice as fast compared to the Penn State IBM-SP
supercomputer,for our production codes including PUMA, given the same number of processors, while being built at a fraction of the cost ($100,000(1998 US dollars)).
• Be suitable for only numerical simulation part (weather, fluid...) that doesn’t have high communication to computation ratios, because of the high communication latency.
• Good scalability with most of the MPI applications used
• The Object, to build Cost effective supercomputer for numerical simulations dealt with at Penn State has been fulfilled.
COCOA (19/19) Real Time Systems LAB.
References• COCOA : http://cocoa.ihpca.psu.edu
• NAS Parallel Benchmarks : http://science.nas.nasa.gov/Software/NPB
• Beowulf : http://www.beowulf.org
• RedHat : http://www.redhat.com
• MPI : http://www.mcs.anl.gov/mpi
• DQS : http://www.scri.fsu.edu/~pasko/dqs.html
• Tons of references…