38
Parallel Programming in MPI part 1 1 1 情情情情情情情情情情 情情 情情

Parallel Programming in MPI part 1 1 1 情報ネットワーク特論 南里 豪志

Embed Size (px)

Citation preview

Parallel Programming in MPIpart 1

11

情報ネットワーク特論

南里 豪志

2

Preparation:  Request for your account of the server• もし、公開鍵暗号方式の鍵を作っていなければ、作成する

If you haven't created a pair of keys of public key cryptosystem, create them.

• http://okaweb.ec.kyushu-u.ac.jp/lectures/in-ng/2014/pki-2014.pptx

• メールに「公開鍵」を添付し、本文に学籍番号と名前を書いて以下に送付Send a mail to the following address with your public key attached and your student ID and name are written in the body of it.

[email protected]

• Your account information will be sent later.

3

「並列プログラミング実習」の目的Purpose of “parallel programming tutorial”

• 「通信」 = 複数のプロセス間でのデータの送受信 “ Communication” = Data transfer among multiple processes.

• 複数のプロセスが並行してそれぞれのプログラムを実行 Each of the processes executes its own program concurrently.

• これが並列処理   This is “parallel processing”.

• 並列処理には「並列プログラム」が必要 “ Parallel program” is required for parallel processing

Learn how to write “parallel programs”.

どうやって、プログラムに通信を記述するか?How to Describe Communications in a Program?

• TCP, UDP ?• Good:

- 多くのネットワークに実装されており,可搬性が高い. Portable: Available on many networks.

• Bad:- 接続やデータ転送の手続きが複雑 Protocols for connections and data-transfer are complicated.

4

記述可能だが,プロトコル関連の記述が必要。Possible. But require additional descriptions for protocols.

MPI (Message Passing Interface)

• 並列計算向けに設計された通信関数群A set of communication functions designed for parallel processing

• C, C++, Fortran のプログラムから呼び出しCan be called from C/C++/Fortran programs.

• "Message Passing" = Send + Receive• 実際には, Send, Receive 以外にも多数の関数を利用可能.

Actually, more functions other than Send and Receive are available.

• ともかく、プログラム例を見てみましょうLet's see a sample program, first.

5

666

#include <stdio.h>#include "mpi.h"

int main(int argc, char *argv[]){ int myid, procs, ierr, i; double myval, val; MPI_Status status; FILE *fp; char s[64];

MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs); if (myid == 0) { fp = fopen("test.dat", "r"); fscanf(fp, "%lf", &myval); for (i = 1; i < procs; i++){ fscanf(fp, "%lf", &val); MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD); } fclose(fp); } else MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);

printf("PROCS: %d, MYID: %d, MYVAL: %e\n", procs, myid, myval); MPI_Finalize();

return 0;}

If my ID is 0

input data for this process and keep it in myval

use MPI_Send to send value in val to process i

processes with ID other than 0use MPI_Recv to receive data from process 0 and keep it in myval

end of parallel computing

Setup MPI environment

Get own ID (= rank) of the process

i = 1~ procs -1

input data and keep it in val

Get total number of processes

print-out its own myval

77

プログラム例の実行の流れFlow of the sample program.

• 複数の " プロセス " が,自分の番号(ランク)に応じて実行Multiple "Processes" execute the program according to their number (= rank).

7

read data from a file

read datafrom a file

send val to rank 1

read data from a file

send val to rank 2

print myval

receive datafrom rank 0

print myval

myval

print myval

val

val

receive datafrom rank 0

myval

myval

rank 0rank 1

rank 2

wait for the arrival of the data

wait for the arrival of the data

実行例Sample of the Result of Execution

• 各プロセスがそれぞれ勝手に表示するので、表示の順番は毎回変わる可能性がある。The order of the output can be different,since each process proceeds execution independently.

8

PROCS: 4 MYID: 1 MYVAL: 20.0000000000000000PROCS: 4 MYID: 2 MYVAL: 30.0000000000000000PROCS: 4 MYID: 0 MYVAL: 10.0000000000000000PROCS: 4 MYID: 3 MYVAL: 40.0000000000000000

rank 1rank 2rank 0rank 3

MPI インタフェースの特徴Characteristics of MPI Interface

• MPI プログラムは,普通の C 言語プログラムMPI programs are ordinal programs in C-language

• Not a new language

• 各プロセスが同じプログラムを実行するEvery process execute the same program

• ランク(=プロセス番号)を使って,プロセス毎に違う仕事を実行Each process executes its own work according to its rank(=process number)

• 他のプロセスの変数を直接見ることはできない。A process cannot read or write variables on other process directly

99

Read file

Read file

Send

Read file

Send

Print myval

myvalRank 0

Receive

Print myvalmyval

Receive

Print myvalmyval

Rank 1Rank 2val

val

TCP, UDP vs MPI

• MPI: 並列計算に特化したシンプルな通信インタフェースSimple interface dedicated for parallel computing

• SPMD(Single Program Multiple Data-stream) model• 全プロセスが同じプログラムを実行

All processes execute the same program

• TCP, UDP: 各種サーバ等,様々な用途を想定した汎用的な通信インタフェースGeneric interface for various communications,such as internet servers

• Server/Client model• 各プロセスが自分のプログラムを実行

Each process executes its own program.

10

11

#include <stdio.h>#include "mpi.h"

int main(int argc, char *argv[]){ int myid, procs, ierr, i; double myval, val; MPI_Status status; FILE *fp; char s[64];

MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs); if (myid == 0) { fp = fopen("test.dat", "r"); fscanf(fp, "%lf", &myval); for (i = 1; i < procs; i++){ fscanf(fp, "%lf", &val); MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD); } fclose(fp); } else MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);

printf("PROCS: %d, MYID: %d, MYVAL: %e\n", procs, myid, myval); MPI_Finalize();

return 0;}

sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); memset(&echoServAddr, 0, sizeof(echoServAddr)); echoServAddr.sin_family = AF_INET; echoServAddr.sin_addr.s_addr = inet_addr(servIP); echoServAddr.sin_port = htons(echoServPort); connect(sock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr)); echoStringLen = strlen(echoString); send(sock, echoString, echoStringLen, 0);

totalBytesRcvd = 0; printf("Received: "); while (totalBytesRcvd < echoStringLen){ bytesRcvd = recv(sock, echoBuffer, RCVBUFSIZE - 1, 0); totalBytesRcvd += bytesRcvd; echoBuffer[bytesRcvd] = '\0' ; printf(echoBuffer); } printf("\n"); close(sock);

servSock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP); memset(&echoServAddr, 0, sizeof(echoServAddr)); echoServAddr.sin_family = AF_INET; echoServAddr.sin_addr.s_addr = htonl(INADDR_ANY); echoServAddr.sin_port = htons(echoServPort); bind(servSock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr)); listen(servSock, MAXPENDING); for (;;){ clntLen = sizeof(echoClntAddr); clntSock = accept(servSock,(struct sockaddr *)&echoClntAddr, &clntLen); recvMsgSize = recv(clntSock, echoBuffer, RCVBUFSIZE, 0); while (recvMsgSize > 0){ send(clntSock, echoBuffer, recvMsgSize, 0); recvMsgSize = recv(clntSock, echoBuffer, RCVBUFSIZE, 0); } close(clntSock); }

TCP Client

TCP Server

MPI

initialize

initialize

initialize

MPI の位置づけLayer of MPI

• ネットワークの違いを、 MPI が隠ぺいHide the differences of networks

12

Applications

TCP UDP

IP

Ethernet driver, Ethernet card

Sockets …XTI

MPI

High-Speed Interconnect

(InfiniBand, etc.)

MPI プログラムのコンパイルHow to compile MPI programs

• Compile command: mpicc Example) mpicc -O3 test.c -o test

13

optimization option O is not 0

source file to compile

executable file to create

MPI プログラムの実行How to execute MPI programs

• Prepare a script file

• Submit the script file qsub test.sh

• Other commands• qstat (= check status), qdel job_number (= cancel job)

14

#!/bin/sh

#PBS -l nodes=2,walltime=00:01:00#PBS -j oe#PBS -q p4

cd $PBS_O_WORKDIR

/usr/local/bin/mpiexec -f $PBS_NODEFILE -np 8 ./test-mpi

Number of Nodes (maximum: 4)

Maximum Execution Time

Name of Job Queue

Commands to be Executed

Sample:

cd to the directory from where this job is submitted

Run MPI program with specified number (ex: 8) of processes

Store standard output and error in the same file.

Ex 0)   MPI プログラムの実行 Execution of an MPI program

• まず、 133.5.152.195 にログインFirst of all, login to 133.5.152.195

• Windows: Use Putty

• specify your private key

• MacOS X: Use ssh command from terminal

• specify your private key

Ex 0)   MPI プログラムの実行 Execution of an MPI program

• ログイン後、以下を実行しなさい。After login, try the following commands.

• 時間に余裕があったら,プロセス数を変えたり,プログラムを書き換えたりしてみる.Try changing the number of processes,or modifying the source program.

$ cp /tmp/test-mpi.c .$ cp /tmp/test.dat .$ cp /tmp/test.sh .$ cat test-mpi.c$ cat test.dat$ mpicc test-mpi.c –o test-mpi$ qsub test.shwait for a while$ ls (check the name of the result file (test.sh.o????))$ less test.sh.o????

MPI ライブラリMPI Library

• MPI 関数の実体は, MPI ライブラリに格納されているThe bodies of MPI functions are in "MPI Library".

• mpicc が自動的に MPI ライブラリをプログラムに結合するmpicc links the library to the program

17

main(){ MPI_Init(...); ... MPI_Comm_rank(...); ... MPI_Send(...); ...}

MPI_InitMPI_Comm_rank...

mpicc

compile link

source programMPI Library

Executablefile

MPI プログラムの基本構造Basic Structure of MPI Programs

18

Function for start-up

header file "mpi.h"

#include <stdio.h>#include "mpi.h"

int main(int argc, char *argv[]){ ...

MPI_Init(&argc, &argv);

...

MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs);

...

MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD); ... MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);

...

MPI_Finalize();

return 0;}

Functions for finish

You can call MPI functions in this area

Crucial lines

今日の  MPI 関数MPI Functions Today

• MPI_Init• Initialization

• MPI_Finalize• Finalization

• MPI_Comm_size• Get number of processes

• MPI_Comm_rank• Get rank (= Process number) of this process

• MPI_Send & MPI_Recv• Message Passing

• MPI_Bcast & MPI_Gather• Collective Communication ( = Group Communication )

19

MPI_Init

• MPI の並列処理開始Start parallel execution of in MPI

• プロセスの起動やプロセス間通信路の確立等。Start processes and establish connectionsamong them.

• 他の MPI 関数を呼ぶ前に、必ずこの関数を呼ぶ。Most be called once before calling otherMPI functions

• 引数:Parameter:

• main 関数の 2 つの引数へのポインタを渡す。Specify pointers of both of the arguments of 'main' function.

• 各プロセス起動時に実行ファイル名やオプションを共有するために参照。Each process most share the name of the executable file, and the options given to the mpirun command. 20

2020

Usage: int MPI_Init(int *argc, char **argv);

#include <stdio.h>#include "mpi.h"int main(int argc, char *argv[]){   int myid, procs, ierr;   double myval, val; MPI_Status status;    MPI_Init(&argc, &argv);   MPI_Comm_rank(MPI_COMM_WORLD, &myid);   MPI_Comm_size(MPI_COMM_WORLD, &procs); ...

Example

MPI_Finalize

• 並列処理の終了Finishes paralles execution

• このルーチン実行後は MPI ルーチンを呼び出せないMPI functions cannot be calledafter this function.

• プログラム終了前に全プロセスで必ずこのルーチンを実行させる。Every process needs to call this function before exitting the program.

212121

Usage: int MPI_Finalize();

main(){ ...

   MPI_Finalize();}

Example

MPI_Comm_rank

• そのプロセスのランクを取得するGet the rank(= process number) of the process

• 2 番目の引数に格納Returned in the second argument

• 最初の引数 = “ コミュニケータ”1st argument = "communicator"

• プロセスのグループを表す識別子An identifier for the group of processes

• 通常は, MPI_COMM_WORLD を指定In most cases, just specify MPI_COMM_WORLD, here.

• MPI_COMM_WORLD: 実行に参加する全プロセスによるグループa group that consists all of the processes in this execution

• プロセスを複数のグループに分けて、それぞれ別の仕事をさせることも可能Processes can be devided into multiple groups and attached different jobs.

222222

Usage: int MPI_Comm_rank(MPI_Comm comm, int *rank);

...MPI_Comm_rank(MPI_COMM_WORLD, &myid);...

Example

MPI_Comm_size

• プロセス数を取得するGet the number of processes

• 2 番目の引数に格納される

232323

Usage: int MPI_Comm_size(MPI_Comm comm, int *size);

... MPI_Comm_size(MPI_COMM_WORLD, &procs); ...

Example

一対一通信Message Passing

• 送信プロセスと受信プロセスの間で行われる通信Communication between "sender" and "receiver"

• 送信関数と受信関数を, "適切 " に呼び出す.Functions of Sending and Receiving most be called in a correct manner.

• "From" rank and "To" rank are correct• Specified size of the data to be transferred is the same on both side • Same "Tag" is specified on both side

24

Send To: Rank 1 Size: 10 Integer data Tag: 100

Receive From: Rank 0 Size: 10 Integer data Tag: 100

Rank 0 Rank 1

Wait for the message

MPI_Send

• 送信内容Information of the message to send

• start address of the data 開始アドレス ,number of elements 要素数 ,data type データ型 ,rank of the destination 送信先 ,tag,communicator (= MPI_COMM_WORLD, in most cases)

• data types:

• tag : メッセージに付ける番号(整数) The number attached to each message

• 不特定のプロセスから届く通信を処理するタイプのプログラムで使用Used in a kind of programs that handles anonymous messages.

• 通常は、 0 を指定しておいて良い . Usually, you can specify 0.25

2525

Usage: int MPI_Send(void *b, int c, MPI_Datatype d,               int dest, int t, MPI_Comm comm);

...   MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);   ...

Integer MPI_INT

Real(Single) MPI_FLOAT

Real(Double) MPI_DOUBLE

Character MPI_CHAR

Example

Example of MPI_Send

•整数変数 d の値を送信(整数 1個)Send the value of an integer variable 'd'

• 実数配列 mat の最初の要素から 100 番目の要素までを送信Send first 100 elements of array 'mat' (with MPI_DOUBLE type)

•整数配列 data の 10 番目の要素から 50個を送信Send elements of an integer array 'data' from 10th to 59th element

26

MPI_Send(&d, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);

MPI_Send(mat, 100, MPI_DOUBLE, 1, 0, MPI_COMM_WORLD);

MPI_Send(&(data[10]), 50, MPI_INT, 1, 0, MPI_COMM_WORLD);

MPI_Recv

• Information of the message to receive• start address for storing data  受信データ格納用の開始アドレス ,

number of elements  要素数 ,data type  データ型 ,rank of the source 送信元 ,tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases),status

• status: メッセージの情報を格納する整数配列 An integer array for storing the information of arrived message

• 送信元ランクやタグの値を参照可能(通常は、あまり使わない)Consists the information about the source rank and the tag. ( Not be used in most case )

272727

...     MPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD &status); ...

Usage: int MPI_Recv(void *b, int c, MPI_Datatype d, int src, int t, MPI_Comm comm, MPI_Status *st);

Example

集団通信Collective Communications

• グループ内の全プロセスで行う通信Communications among all of the processes in the group

• Examples)• MPI_Bcast

• copy a data to otherprocesses

• MPI_Gather• Gather data from

other processesto an array

• MPI_Reduce• Apply a 'Reduction'

operation to the distributed datato produce one array

28

3 1 8 2

Rank 0

3 1 8 2

Rank 1

3 1 8 2

Rank 2

7 5 9

Rank 0 Rank 1 Rank 2

7 5 9

1 2 3

Rank 0 Rank 1 Rank 2

4 5 6 7 8 9

12 15 18

MPI_Bcast

• あるプロセスのデータを全プロセスにコピーcopy a data on a process to all of the processes

• Parameters:• start address, number of elements, data type,

root rank, communicator• root rank: コピー元のデータを所有するプロセスのランク

rank of the process that has the original data

• Example:   MPI_Bcast(a, 3, MPI_DOUBLE, 0, MPI_COMM_WORLD);

292929

Usage: int MPI_Bcast(void *b, int c, MPI_Datatype d, int root, MPI_Comm comm);

Rank 0 Rank 1 Rank 2 Rank 3

a a a a

MPI_Gather

• 全プロセスからデータを集めて一つの配列を構成Gather data from other processes to construct an array

• Parameters:• send data: start address, number of elements, data type,

receive data: start address, number of elements, data type, (means only on the root rank)root rank, communicator

• root rank: 結果の配列を格納するプロセスのランク rank of the process that stores the result array

• Example:  MPI_Gather(a, 3, MPI_DOUBLE, b, 3, MPI_DOUBLE, 0, MPI_COMM_WORLD);

303030

Usage: int MPI_Gather(void *sb, int sc MPI_Datatype st, void *rb, int rc, MPI_Datatype rt, int root, MPI_Comm comm);

Rank 0 Rank 1 Rank 2 Rank 3

a a a a

b

31

集団通信の利用に当たってUsage of Collective Communications

• 同じ関数を全プロセスが実行するよう、記述する。Every process must call the same function

• 例えば MPI_Bcast は, root rank だけでなく全プロセスで実行For example, MPI_Bcast must be called not only by the root rank but also all of the other ranks

• 送信データと受信データの場所を別々に指定するタイプの集団通信では、送信データの範囲と受信データの範囲が重ならないように指定する。On functions that require information of both send and receive, the specified ranges of the addresses for sending and receiving cannot be overlapped.

• MPI_Gather, MPI_Allgather, MPI_Gatherv, MPI_Allgatherv, MPI_Recude, MPI_Allreduce, MPI_Alltoall, MPI_Alltoallv, etc.

31

まとめSummary

• MPI では、一つのプログラムを複数のプロセスが実行するOn MPI, multiple processes run the same program

• 各プロセスには、そのランク(番号)に応じて仕事を割り当てるJobs are attached according to the rank(the number) of each process

• 各プロセスはそれぞれ自分だけの記憶空間で動作するEach process runs on its own memory space

• 他のプロセスが持っているデータを参照するには、通信するAccesses to the data on other processes can be made only by explicit communication among processes

• MPI functions• MPI_Init, MPI_Finalize, MPI_Comm_rank• MPI_Send, MPI_Recv• MPI_Bcast, MPI_Gather

32

References

• MPI Forumhttp://www.mpi-forum.org/

• specification of "MPI standard"

• MPI 仕様(日本語訳) http://phase.hpcc.jp/phase/mpi-j/ml/

• 理化学研究所の講習会資料http://accc.riken.jp/HPC/training/mpi/mpi_all_2007-02-07.pdf

333333

Ex 1)  乱数を表示するプログラム A program that displays random numbers

• 「各プロセスがそれぞれ自分のランクと整数乱数を一つ表示するプログラム」を作成しなさい。Make a program in which each process displays its own rank with one integer random number

• Sample: #include <stdio.h>#include <stdlib.h>#include <sys/time.h>

int main(int argc, char *argv[]){ int r; struct timeval tv;

gettimeofday(&tv, NULL); srand(tv.tv_usec); r = rand();

printf("%d\n", r);}

Ex 1) (cont.)

• Example of the result of execution

1: 5203910: 9478965003: 17975259402: 5659177804: 16186515065: 2740322936: 12487873507: 828046128

Ex 1) Sample of the answer

#include <stdio.h>#include <stdlib.h>#include <sys/time.h>#include "mpi.h"

int main(int argc, char *argv[]){ int r, myid, procs; struct timeval tv;

MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myid); MPI_Comm_size(MPI_COMM_WORLD, &procs);

gettimeofday(&tv, NULL); srand(tv.tv_usec); r = rand();

printf("%d: %d\n", myid, r); MPI_Finalize();}

表示されるデータの順番がランクと一致しないThe data is not printed out in the order of ranks.

課題: 順番をそろえて表示するReport: Display in order• Ex 1) で作成したプログラムについて、以下の条件を満たすよう

に修正しなさい。 「ランク0からランクの順に、それぞれのプロセスで生成した乱数を表示する。」Modify the program in Ex1), so that: Messages are printed out in the order of the rank of each process

• Example of the result of the execution

0: 15243946311: 9990945012: 9417636043: 5269563784: 1523746435: 11381541176: 19268147547: 156004811

来週の講義で、 2 ~ 3 名の学生を指名し、自分の回答を説明してもらいます。2~3 students will be designated toshow and explain their answer at the next class.

38

Hint•少なくとも 2 つの方法が考えられる

  At least, two methods are possible.

• Method 1) (Easy)• Gather the data to rank 0, first.• Then, let rank 0 to print data in order.

• Method 2) (Little hard)• Before rank i prints its data, receive a message from rank i-1 (i > 0)• After rank i prints its data, send a message to rank i+1 (i<P-1)

• P is the total number of processes.

ともかく、講義資料のプログラム例を実行し、各行の意味を理解しましょう。First of all, try sample programs in this material and understand the meanings, line by line.