Download pdf - Calculating πin Parallel Using MPI - University of Southern …cacs.usc.edu/education/cs596/MPI-Pi.pdf · · 2017-09-20... (“PI = %f\n”,pi);}! "# 4 1+#’ =) * 5 6 4 1+# 7

$Page 1: Calculating πin Parallel Using MPI - University of Southern …cacs.usc.edu/education/cs596/MPI-Pi.pdf · · 2017-09-20... (“PI = %f\n”,pi);}! "# 4 1+#’ =) * 5 6 4 1+# 7$
Calculating π in ParallelUsing MPIAiichiro Nakano

Collaboratory for Advanced Computing & SimulationsDepartment of Computer Science

Department of Physics & AstronomyDepartment of Chemical Engineering & Materials Science

Department of Biological SciencesUniversity of Southern California

Email: [email protected]. Task decomposition (parallel programming = who does what)2. Scalability analysis

Integral Representation of p

! 𝑑𝑥4

1 + 𝑥' =)

*!

𝑑𝜃cos'𝜃

41 + tan'𝜃 =

2/4

*! 4𝑑𝜃2/4

*= 𝜋

Numerical Integration of p• Integration

• Discretization:D = 1/N: step = 1/NBINxi = (i+0.5)D (i = 0,…,N-1)

#include <stdio.h>#define NBIN 10000000void main() {int i; double step,x,sum=0.0,pi;step = 1.0/NBIN;for (i=0; i<NBIN; i++) {x = (i+0.5)*step;sum += 4.0/(1.0+x*x);

}pi = sum*step;printf(“PI = %f\n”,pi);

}

! 𝑑𝑥4

1 + 𝑥' =)

*𝜋

64

1 + 𝑥7'∆

9:)

7;*

≅ 𝜋

Parallelization: Who Does What?

...for (i=myid; i<NBIN; i+=nprocs){x = (i+0.5)*step;partial += 4.0/(1.0+x*x);

}partial *= step;pi = global_sum(partial);...

Interleaved assignment of quadrature points (bins) to MPI processes

myid = MPI ranknprocs = Number of MPI processes

• Use double MPI_Wtime()to measure the running time in seconds

MPI_Comm_rank(MPI_COMM_WORLD,&myid)

MPI_Comm_size(MPI_COMM_WORLD,&nprocs)

Single program multiple data (SPMD)

Parallel Running Timeglobal_pi.c: NBIN = 107, on HPC

How Efficient Is the Parallel Program?

#PBS -l nodes=16:ppn=1,arch=x86_64...np=$(cat $PBS_NODEFILE | wc -l)mpirun -np $np -machinefile $PBS_NODEFILE ./global_pimpirun -np 8 -machinefile $PBS_NODEFILE ./global_pimpirun -np 4 -machinefile $PBS_NODEFILE ./global_pimpirun -np 2 -machinefile $PBS_NODEFILE ./global_pimpirun -np 1 -machinefile $PBS_NODEFILE ./global_pi

Scalability Analysis

• Parallel computing = Solving a big problem (W) in a short time (T) using many processors (P)

• How W, T & P scale with each other?• How to define the efficiency of a parallel program?

Parallel Efficiency

• Execution time: T(W,P)W: WorkloadP: Number of processors

• Speed:

• Speedup:

• Efficiency:

How to scale WP with P?

Fixed Problem-Size ScalingWP = W—constant (strong scaling)

• Speedup:

• Efficiency:

• Amdahl’s law: f (= sequential fraction of the workload) limits the asymptotic speedup

Isogranular Scaling

WP = Pw (weak scaling)w = constant workload per processor (granularity)

• Speedup:

• Efficiency:

Analysis of Global_Pi Program• Workload µ Number of quadrature points, N (or NBIN in

the program)• Parallel execution time on P processors:

> Local computation µ N/P

> Butterfly computation/communication in global() µ logP

for (i=myid; i<N; i+=P){x = (i+0.5)*step; partial += 4.0/(1.0+x*x);

}

for (l=0; l<log2P; ++l) {partner = myid XOR 2l;send mydone to partner;receive hisdone from partner;mydone += hisdone

}

𝑇 𝑁, 𝑃 = 𝑇ABCD 𝑁, 𝑃 + 𝑇EFBGHF 𝑃

= 𝛼𝑁𝑃 + 𝛽log𝑃

Fixed Problem-Size (Strong) Scaling• Speedup:

• Efficiency:

global_pi.c: N = 107, on HPC

𝑆N =𝑇 𝑁, 1𝑇 𝑁, 𝑃 =

𝛼𝑁

𝛼𝑁𝑃 + 𝛽log𝑃 =

𝑃

1 + 𝛽𝛼𝑃log𝑃𝑁

𝐸N =𝑆N𝑃 =

1


Fixed Problem-Size Scaling

• Speedup model:

global_pi.c: N = 107, on HPC

𝐸N =𝑆N𝑃 =

1


Computation ~ nsCommunication ~ ms

Runtime Variance among Ranks

Isogranular (Weak) Scaling• n = N/P = constant• Efficiency:

global_pi_iso.c: N/P = 107, on HPC

𝐸N =𝑇(𝑛, 1)𝑇(𝑛𝑃, 𝑃) =

𝛼𝑛𝛼𝑛 + 𝛽log𝑃 =

1

1 + 𝛽𝛼𝑛 log𝑃

Fitting Fixed Problem-Size Scaling

global_pi_breakdown.c: N = 107, on 3.0 GHz Xeon

𝑇 𝑁, 𝑃 = 𝛼𝑁𝑃 + 𝛽log'𝑃

Scalability Analysis: Example