View
39
Download
0
Category
Preview:
DESCRIPTION
Dilemma of Parallel Programming. Xinhua Lin ( 林新华 ) HPC Lab of SJTU @XJTU, 17 th Oct 2011 . Disclaimers. I am not funded by CRAY S lides marked with Chapel logo are taken from Brad Chamberlain’s talk ‘ The Mother of All Chapel Talks ’, with permission from himself - PowerPoint PPT Presentation
Citation preview
Dilemma of Parallel Programming
Xinhua Lin (林新华 ) HPC Lab of SJTU
@XJTU, 17th Oct 2011
Disclaimers
• I am not funded by CRAY
• Slides marked with Chapel logo are taken from Brad Chamberlain’s talk ‘The Mother of All Chapel Talks’, with permission from himself
• Funny pictures are from Internet
About me and HPC Lab in SJTU
• Directing HPC Lab• Co-translator of PPP• Co-founder of HMPP CoC for AP&Japan
• As MS HPC Invitation institutes @SH• Support For HPC Center of SJTU• Hold SJTU HPC Seminar monthly
http://itis.grid.sjtu.edu.cn/blog
Three Challenges for ParaProg in multi/many core era
• Revolution V.S. Evolution
• Low level V.S. High level– Performance V.S. Programmable
• Performance V.S. Performance Portability
For more detail:Paper Version: <中国教育网络 > Special issue for HPC and Cloud, Sep 2011Online Version: http://itis.grid.sjtu.edu.cn/blog
Outline
• Right Level to expose Parallel
• ParaProg languages Reviews
• Multiresolution and Chapel
Right Level to Expose Parallel
Can we stop water/parallel ?
Hardware
ISA
OS
Library
Language
Performance V.S. Programmable
Target Machine
MPI
OpenMP
pthreads
ExposeImplementingMechanisms
“Why is everything so tedious?”
Target MachineTarget Machine
ZPL
HPF
Higher-Level Abstractions
“Why don’t I have more control?”
Low Level High Level
ParaProg Education • Tired of teaching yet another specific lang.
– MPI for Cluster – OpenMP for SMP then Multi-core CPU– CUDA for GPU, and now OpenCL – More on the way…
• Had to explain concepts by different tools– Single lang. to explain them all?
• Similar in OS education– Production OS: Linux, Unix and Window– OS only for education: Minix
ParaProg languages Reviews
Hybrid Programming Model• MPI is insufficient in multi/many core era
– OpenMP for multi-core– CUDA/OpenCL for many-core*
• So called Hybrid Programming was invented as a temporary solution, workable but ugly– MPI+OpenMP for Multi-core cluster– MPI+CUDA/OpenCL for GPU cluster like Tianhe-1A
• Similar idea used in CUDA for thread and thread-block, OpenCL for work-item and work-group* We will wait and see how OpenMP works on Intel MIC
ParaProg from different ways
• Low Level (expose implementation mechanism )– MPI, CUDA and OpenCL– OpenMP
• High Level– PGAS: CAF, UPC and Tianuim – Global View: NESL, ZPL– APGAS: Chapel, X10
• Directive Based – HMPP, PGI, CRAY-directive
Mulutiesolution and Chapel
What is Mulutiesolution?Structure the language in a layered manner, permitting it to be
used at multiple levels as required/desired– support high-level features and automation for convenience– provide the ability to drop down to lower, more manual levels– use appropriate separation of concerns to keep these layers clean
DistributionsData parallelismTask ParallelismLocality Control
Target Machine
Base Language
language concepts
Where Chapel was born: HPCSHPCS: High Productivity Computing Systems (DARPA et al.)
– Goal: Raise productivity of high-end computing users by 10– Productivity = Performance + Programmability + Portability + Robustness
• Phase II: Cray, IBM, Sun (July 2003 – June 2006)– Evaluated the entire system architecture’s impact on productivity…
• processors, memory, network, I/O, OS, runtime, compilers, tools, …• …and new languages:
Cray: Chapel IBM: X10 Sun: Fortress
• Phase III: Cray, IBM (July 2006 – 2010)– Implement the systems and technologies resulting from phase II– (Sun also continues work on Fortress, without HPCS funding)
Global-view V.S. FragmentedProblem: “Apply 3-pt stencil to vector”global-view
=
+
(
)/2
fragmented
=
+
=
+
=
)/2 + )/2)/2
( ( (
Global-view V.S. SPMD Code
Global-Viewdef main() { var n: int = 1000; var a, b: [1..n] real;
forall i in 2..n-1 { b(i) = (a(i-1) + a(i+1))/2; }}
SPMDdef main() { var n: int = 1000; var locN: int = n/numProcs; var a, b: [0..locN+1] real;
if (iHaveRightNeighbor) { send(right, a(locN)); recv(right, a(locN+1)); } if (iHaveLeftNeighbor) { send(left, a(1)); recv(left, a(0)); } forall i in 1..locN { b(i) = (a(i-1) + a(i+1))/2; }}
Chapel Overview• A design principle for HPC
– “Support the general case, optimize for the common case”
• Data Parallel (ZPL) + Task Parallel(CRAY MTA) + Script Lang.
• Latest version 1.3.0 is available in as OSS:• http://sourceforge.net/projects/chapel
DistributionsData parallelismTask ParallelismLocality Control
Target Machine
Base Language
language concepts
Chapel example: Heat TransferA:
1.0
n
n
4
repeat until max change <
Chapel Code For Heat Transfer
Chapel as Minix in ParaProg
• If I were to offer a ParaProg class, I’d want to teach about:– data parallelism– task parallelism– concurrency– synchronization– locality/affinity– deadlock, livelock, and other pitfalls– performance tuning– …
Conclusion—Major Points
• Programmable and Performance are always the dilemma of ParaProg
• Multiresolution sounds perfect in theory but not mature enough for production
• However, Chapel could be used as Minix in ParaProg
Q&A
Recommended