Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Parallel Programming in MPIpart 1
1 1
情報ネットワーク特論
南里 豪志
Preparation:Request for your account of the server
• ITOシステムにログインできるか確認するTry if you can login to ITO system.
• Windows: Use MobaXterm• MacOS X: Use "ssh -A -Y [email protected]
• もしユーザIDやパスフレーズを忘れてログインできない場合は、実習時に講師のところへIf you have forgotten your userID or passphrase, and cannot login, come to the teacher.
2
「並列プログラミング実習」の目的Purpose of “parallel programming tutorial”
• 「通信」 = 複数のプロセス間でのデータの送受信“Communication” = Data transfer among multiple processes.
• 複数のプロセスが並行してそれぞれのプログラムを実行Each of the processes executes its own programconcurrently.
• これが並列処理This is “parallel processing”.
• 並列処理には「並列プログラム」が必要“Parallel program” is required for parallel processing
3
Learn how to write “parallel programs”.
どうやって、プログラムに通信を記述するか?How to Describe Communications in a Program?
• TCP, UDP ?• Good:
- 多くのネットワークに実装されており,可搬性が高い.Portable: Available on many networks.
• Bad:- 接続やデータ転送の手続きが複雑
Protocols for connections and data-transfer are complicated.
4
記述可能だが,プロトコル関連の記述が必要。Possible. But require additional descriptions for protocols.
MPI (Message Passing Interface)
• 並列計算向けに設計された通信関数群A set of communication functions designed for parallel processing
• C, C++, Fortranのプログラムから呼び出しCan be called from C/C++/Fortran programs.
• "Message Passing" = Send + Receive• 実際には,Send, Receive 以外にも多数の関数を利用可能.
Actually, more functions other than Send and Receive are available.
• ともかく、プログラム例を見てみましょうLet's see a sample program, first.
5
6 66
#include <stdio.h>#include "mpi.h"
int main(int argc, char *argv[]){
int myid, procs, ierr, i;double myval, val;MPI_Status status;FILE *fp;char s[64];
MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &myid);MPI_Comm_size(MPI_COMM_WORLD, &procs);if (myid == 0) {
fp = fopen("test.dat", "r");fscanf(fp, "%lf", &myval);for (i = 1; i < procs; i++){fscanf(fp, "%lf", &val);MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
}fclose(fp);
} elseMPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);
printf("PROCS: %d, MYID: %d, MYVAL: %e¥n", procs, myid, myval);MPI_Finalize();
return 0;}
If my ID is 0
input data for this process and keep it in myval
use MPI_Send to send value in val to process i
processes with ID other than 0use MPI_Recv to receive data from process 0 and keep it in myval
end of parallel computing
Setup MPI environment
Get own ID (= rank) of the process
i = 1~procs-1
input data and keep it in val
Get total number of processes
print-out its own myval
77
プログラム例の実行の流れFlow of the sample program.
• 複数の"プロセス"が,自分の番号(ランク)に応じて実行Multiple "Processes" execute the program according to their number (= rank).
7
read data from a file
read datafrom a file
send val to rank 1
read data from a file
send val to rank 2
print myval
receive datafrom rank 0
print myval
myval
print myval
val
val
receive datafrom rank 0
myval
myval
rank 0rank 1
rank 2
wait for the arrival of the data
wait for the arrival of the data
7
実行例Sample of the Result of Execution
• 各プロセスがそれぞれ勝手に表示するので、表示の順番は毎回変わる可能性がある。The order of the output can be different,since each process proceeds execution independently.
8
PROCS: 4 MYID: 1 MYVAL: 20.0000000000000000PROCS: 4 MYID: 2 MYVAL: 30.0000000000000000PROCS: 4 MYID: 0 MYVAL: 10.0000000000000000PROCS: 4 MYID: 3 MYVAL: 40.0000000000000000
rank 1rank 2rank 0rank 3
MPIインタフェースの特徴Characteristics of MPI Interface
• MPI プログラムは,普通の C言語プログラムMPI programs are ordinal programs in C-language
• Not a new language
• 各プロセスが同じプログラムを実行するEvery process execute the same program
• ランク(=プロセス番号)を使って,プロセス毎に違う仕事を実行Each process executes its own work according to its rank(=process number)
• 他のプロセスの変数を直接見ることはできない。A process cannot read or write variables on other process directly
99
Read file
Read file
Send
Read file
Send
Print myval
myvalRank 0
Receive
Print myvalmyval
Receive
Print myvalmyval
Rank 1Rank 2val
val
TCP, UDP vs MPI
• MPI:並列計算に特化したシンプルな通信インタフェースSimple interface dedicated for parallel computing
• SPMD(Single Program Multiple Data-stream) model• 全プロセスが同じプログラムを実行
All processes execute the same program
• TCP, UDP: 各種サーバ等,様々な用途を想定した汎用的な通信インタフェースGeneric interface for various communications,such as internet servers
• Server/Client model• 各プロセスが自分のプログラムを実行
Each process executes its own program.
10
11
#include <stdio.h>#include "mpi.h"
int main(int argc, char *argv[]){
int myid, procs, ierr, i;double myval, val;MPI_Status status;FILE *fp;char s[64];
MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &myid);MPI_Comm_size(MPI_COMM_WORLD, &procs);if (myid == 0) {
fp = fopen("test.dat", "r");fscanf(fp, "%lf", &myval);for (i = 1; i < procs; i++){
fscanf(fp, "%lf", &val);MPI_Send(&val, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
}fclose(fp);
} elseMPI_Recv(&myval, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, &status);
printf("PROCS: %d, MYID: %d, MYVAL: %e¥n", procs, myid, myval);MPI_Finalize();
return 0;}
sock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);memset(&echoServAddr, 0, sizeof(echoServAddr));echoServAddr.sin_family = AF_INET; echoServAddr.sin_addr.s_addr = inet_addr(servIP);echoServAddr.sin_port = htons(echoServPort); connect(sock, (struct sockaddr *) &echoServAddr,
sizeof(echoServAddr));echoStringLen = strlen(echoString);send(sock, echoString, echoStringLen, 0);
totalBytesRcvd = 0;printf("Received: ");while (totalBytesRcvd < echoStringLen){
bytesRcvd = recv(sock, echoBuffer, RCVBUFSIZE - 1, 0);totalBytesRcvd += bytesRcvd; echoBuffer[bytesRcvd] = '¥0' ; printf(echoBuffer);
}printf("¥n");close(sock);
servSock = socket(PF_INET, SOCK_STREAM, IPPROTO_TCP);memset(&echoServAddr, 0, sizeof(echoServAddr)); echoServAddr.sin_family = AF_INET; echoServAddr.sin_addr.s_addr = htonl(INADDR_ANY);echoServAddr.sin_port = htons(echoServPort); bind(servSock, (struct sockaddr *) &echoServAddr,
sizeof(echoServAddr));listen(servSock, MAXPENDING);for (;;){
clntLen = sizeof(echoClntAddr);clntSock = accept(servSock,(struct sockaddr *)&echoClntAddr,
&clntLen);recvMsgSize = recv(clntSock, echoBuffer, RCVBUFSIZE, 0);while (recvMsgSize > 0){
send(clntSock, echoBuffer, recvMsgSize, 0);recvMsgSize = recv(clntSock, echoBuffer, RCVBUFSIZE, 0);
}close(clntSock);
}
TCP Client
TCP Server
MPI
initialize
initialize
initialize
MPIの位置づけLayer of MPI
• ネットワークの違いを、MPIが隠ぺいHide the differences of networks
12
Applications
TCP UDP
IP
Ethernet driver, Ethernet card
…
Sockets …XTI
MPI
High-Speed Interconnect
(InfiniBand, etc.)
…
ITOシステムでの MPIプログラムのコンパイルHow to compile MPI programs on ITO
• Preparation:module load openmpi• set up to use an MPI library, "Open MPI".
• Compile command: mpiccExample)mpicc -O3 test.c -o test
13
optimization optionO is not 0
source fileto compile
executable fileto create
ITOシステムでの MPIプログラムの実行How to execute MPI programs on ITO
• Prepare a script file
• Submit the script filepjsub test.sh
• Other commands• pjstat (= check status)• pjdel job_number (= cancel job) 14
#!/bin/bash#PJM -L "vnode=4"#PJM -L "vnode-core=36"#PJM -L "rscunit=ito-a"#PJM -L "rscgrp=ito-s-dbg"#PJM -L "elapse=00:10:00"
module load openmpi
mpirun -np 144 -mca plm_rsh_agent /bin/pjrsh -machinefile ${PJM_O_NODEINF} ./test-mpi
Settings:Number of nodes, Number of cores per node,Resource unit, Resource group, Maximum execution time
Commands to be Executed
Sample:
Run MPI program with 144 (= 4nodes x 36cores) processes
Ex 0) MPIプログラムの実行Execution of an MPI program
• まず、ito.cc.kyushu-u.ac.jp にログインFirst of all, login to ito.cc.kyushu-u.ac.jp
• Windows: Use MobaXterm
• MacOS X: Use ssh command from terminal
15
Ex 0) MPIプログラムの実行Execution of an MPI program
• ログイン後、以下を実行しなさい。After login, try the following commands.
• 時間に余裕があったら,プロセス数を変えたり,プログラムを書き換えたりしてみる.Try changing the number of processes,or modifying the source program.
$ cp /home/tmp/in-ng/* .$ cat test-mpi.c$ cat test.dat$ module load openmpi$ mpicc test-mpi.c –o test-mpi$ pjsub test.shwait for a while$ ls (check the name of the result file (test.sh.o????))$ less test.sh.o????
16
MPIライブラリMPI Library
• MPI関数の実体は,MPIライブラリに格納されているThe bodies of MPI functions are in "MPI Library".
• mpicc が自動的に MPIライブラリをプログラムに結合するmpicc links the library to the program
17
main(){
MPI_Init(...);...MPI_Comm_rank(...);...MPI_Send(...);...
}
MPI_InitMPI_Comm_rank...
mpicc
compile link
source programMPI Library
Executablefile
MPIプログラムの基本構造Basic Structure of MPI Programs
18
Function for start-up
header file "mpi.h"
#include <stdio.h>#include "mpi.h"
int main(int argc, char *argv[]){
...
MPI_Init(&argc, &argv);
...
MPI_Comm_rank(MPI_COMM_WORLD, &myid);MPI_Comm_size(MPI_COMM_WORLD, &procs);
...
MPI_Send(&val, 1, MPI_INT, i, 0, MPI_COMM_WORLD);...MPI_Recv(&myval, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, &status);
...
MPI_Finalize();
return 0;}
Functions for finish
You can callMPI functions in this area
Crucial lines
今日の MPI関数MPI Functions Today
• MPI_Init• Initialization
• MPI_Finalize• Finalization
• MPI_Comm_size• Get number of processes
• MPI_Comm_rank• Get rank (= Process number) of this process
• MPI_Send & MPI_Recv• Message Passing
• MPI_Bcast & MPI_Gather• Collective Communication ( = Group Communication )
19
MPI_Init
• MPIの並列処理開始Start parallel execution of in MPI
• プロセスの起動やプロセス間通信路の確立等。Start processes and establish connectionsamong them.
• 他のMPI関数を呼ぶ前に、必ずこの関数を呼ぶ。Most be called once before calling otherMPI functions
• 引数:Parameter:
• main関数の2つの引数へのポインタを渡す。Specify pointers of both of the arguments of 'main' function.
• 各プロセス起動時に実行ファイル名やオプションを共有するために参照。Each process most share the name of the executable file, and the options given to the mpirun command.
20 2020
Usage:int MPI_Init(int *argc, char **argv);
#include <stdio.h>#include "mpi.h"int main(int argc, char *argv[]){
int myid, procs, ierr;int myval, val;MPI_Status status; MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &myid);MPI_Comm_size(MPI_COMM_WORLD, &procs);...
Example
MPI_Finalize
• 並列処理の終了Finishes paralles execution
• このルーチン実行後はMPIルーチンを呼び出せないMPI functions cannot be calledafter this function.
• プログラム終了前に全プロセスで必ずこのルーチンを実行させる。Every process needs to call this function before exitting the program.
21 2121
Usage:int MPI_Finalize();
main(){
...
MPI_Finalize();}
Example
MPI_Comm_rank
• そのプロセスのランクを取得するGet the rank(= process number) of the process
• 2番目の引数に格納Returned in the second argument
• 最初の引数 = “コミュニケータ”1st argument = "communicator"
• プロセスのグループを表す識別子An identifier for the group of processes
• 通常は,MPI_COMM_WORLD を指定In most cases, just specify MPI_COMM_WORLD, here.
• MPI_COMM_WORLD: 実行に参加する全プロセスによるグループa group that consists all of the processes in this execution
• プロセスを複数のグループに分けて、それぞれ別の仕事をさせることも可能Processes can be devided into multiple groups and attached different jobs.
22 2222
Usage:int MPI_Comm_rank(MPI_Comm comm, int *rank);
...MPI_Comm_rank(MPI_COMM_WORLD, &myid);...
Example
MPI_Comm_size
• プロセス数を取得するGet the number of processes
• 2番目の引数に格納される
23 2323
Usage:int MPI_Comm_size(MPI_Comm comm, int *size);
...MPI_Comm_size(MPI_COMM_WORLD, &procs);...
Example
一対一通信Message Passing
• 送信プロセスと受信プロセスの間で行われる通信Communication between "sender" and "receiver"
• 送信関数と受信関数を,"適切"に呼び出す.Functions of Sending and Receiving most be called in a correct manner.
• "From" rank and "To" rank are correct• Specified size of the data to be transferred is the same on both side • Same "Tag" is specified on both side
24
SendTo: Rank 1Size: 10 Integer dataTag: 100
ReceiveFrom: Rank 0Size: 10 Integer dataTag: 100
Rank 0 Rank 1
Wait for the message
MPI_Send
• 送信内容Information of the message to send
• start address of the data 開始アドレス,number of elements 要素数,data type データ型,rank of the destination 送信先,tag,communicator (= MPI_COMM_WORLD, in most cases)
• data types:
• tag: メッセージに付ける番号(整数)The number attached to each message
• 不特定のプロセスから届く通信を処理するタイプのプログラムで使用Used in a kind of programs that handles anonymous messages.
• 通常は、0 を指定しておいて良い. Usually, you can specify 0.25 2525
Usage:int MPI_Send(void *b, int c, MPI_Datatype d,
int dest, int t, MPI_Comm comm);
...MPI_Send(&val, 1, MPI_INT, i, 0,
MPI_COMM_WORLD);...
Integer MPI_INT
Real(Single) MPI_FLOAT
Real(Double) MPI_DOUBLE
Character MPI_CHAR
Example
Example of MPI_Send
• 整数変数 d の値を送信(整数1個)Send the value of an integer variable 'd'
• 実数配列 mat の最初の要素から100番目の要素までを送信Send first 100 elements of array 'mat' (with MPI_DOUBLE type)
• 整数配列 data の10番目の要素から50個を送信Send elements of an integer array 'data' from 10th to 59th element
26
MPI_Send(&d, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
MPI_Send(mat, 100, MPI_DOUBLE, 1, 0, MPI_COMM_WORLD);
MPI_Send(&(data[10]), 50, MPI_INT, 1, 0, MPI_COMM_WORLD);
MPI_Recv
• Information of the message to receive• start address for storing data 受信データ格納用の開始アドレス,
number of elements 要素数,data type データ型,rank of the source 送信元,tag (= 0, in most cases), communicator (= MPI_COMM_WORLD, in most cases),status
• status: メッセージの情報を格納する整数配列An integer array for storing the information of arrived message
• 送信元ランクやタグの値を参照可能(通常は、あまり使わない)Consists the information about the source rank and the tag.( Not be used in most case )
27 2727
...MPI_Recv(&myval, 1, MPI_INT, 0, 0,
MPI_COMM_WORLD &status);...
Usage:int MPI_Recv(void *b, int c, MPI_Datatype d, int src,
int t, MPI_Comm comm, MPI_Status *st);
Example
集団通信Collective Communications
• グループ内の全プロセスで行う通信Communications among all of the processes in the group
• Examples)• MPI_Bcast
• copy a data to otherprocesses
• MPI_Gather• Gather data from
other processesto an array
• MPI_Reduce• Apply a 'Reduction'
operation to the distributed datato produce one array
28
3 1 8 2
Rank 0
3 1 8 2
Rank 1
3 1 8 2
Rank 2
7 5 9
Rank 0 Rank 1 Rank 27 5 9
1 2 3Rank 0 Rank 1 Rank 2
4 5 6 7 8 9
12 15 18
MPI_Bcast
• あるプロセスのデータを全プロセスにコピーcopy a data on a process to all of the processes
• Parameters:• start address, number of elements, data type,
root rank, communicator• root rank: コピー元のデータを所有するプロセスのランク
rank of the process that has the original data
• Example:MPI_Bcast(a, 3, MPI_DOUBLE, 0, MPI_COMM_WORLD);
29 2929
Usage:int MPI_Bcast(void *b, int c, MPI_Datatype d,
int root, MPI_Comm comm);
Rank 0 Rank 1 Rank 2 Rank 3a a a a
MPI_Gather
• 全プロセスからデータを集めて一つの配列を構成Gather data from other processes to construct an array
• Parameters:• send data: start address, number of elements, data type,
receive data: start address, number of elements, data type, (means only on the root rank)
root rank, communicator• root rank: 結果の配列を格納するプロセスのランク
rank of the process that stores the result array
• Example:MPI_Gather(a, 3, MPI_DOUBLE, b, 3, MPI_DOUBLE, 0, MPI_COMM_WORLD);
30 3030
Usage:int MPI_Gather(void *sb, int sc MPI_Datatype st, void *rb, int rc,
MPI_Datatype rt, int root, MPI_Comm comm);
Rank 0 Rank 1 Rank 2 Rank 3a a a a
b
31
集団通信の利用に当たってUsage of Collective Communications
• 同じ関数を全プロセスが実行するよう、記述する。Every process must call the same function
• 例えば MPI_Bcastは,root rankだけでなく全プロセスで実行For example, MPI_Bcast must be called not only by the root rank but also all of the other ranks
• 送信データと受信データの場所を別々に指定するタイプの集団通信では、送信データの範囲と受信データの範囲が重ならないように指定する。On functions that require information of both send and receive, the specified ranges of the addresses for sending and receiving cannot be overlapped.
• MPI_Gather, MPI_Allgather, MPI_Gatherv, MPI_Allgatherv, MPI_Recude, MPI_Allreduce, MPI_Alltoall, MPI_Alltoallv, etc.
3131
まとめSummary
• MPIでは、一つのプログラムを複数のプロセスが実行するOn MPI, multiple processes run the same program
• 各プロセスには、そのランク(番号)に応じて仕事を割り当てるJobs are attached according to the rank(the number) of each process
• 各プロセスはそれぞれ自分だけの記憶空間で動作するEach process runs on its own memory space
• 他のプロセスが持っているデータを参照するには、通信するAccesses to the data on other processes can be made only by explicit communication among processes
• MPI functions• MPI_Init, MPI_Finalize, MPI_Comm_rank• MPI_Send, MPI_Recv• MPI_Bcast, MPI_Gather
32
References
• MPI Forumhttp://www.mpi-forum.org/
• specification of "MPI standard"
• MPI仕様(日本語訳)http://phase.hpcc.jp/phase/mpi-j/ml/
• 理化学研究所の講習会資料http://accc.riken.jp/HPC/training/mpi/mpi_all_2007-02-07.pdf
33 3333
Ex 1) 乱数を表示するプログラムA program that displays random numbers
• 「各プロセスがそれぞれ自分のランクと整数乱数を一つ表示するプログラム」を作成しなさい。Make a program in which each process displays its own rank with one integer random number
• Sample: #include <stdio.h>#include <stdlib.h>#include <sys/time.h>
int main(int argc, char *argv[]){int r;struct timeval tv;
gettimeofday(&tv, NULL);srand(tv.tv_usec);r = rand();
printf("%d¥n", r);} 34
Ex 1) (cont.)
• Example of the result of execution 1: 5203910: 9478965003: 17975259402: 5659177804: 16186515065: 2740322936: 12487873507: 828046128
35
Ex 1) Sample of the answer
#include <stdio.h>#include <stdlib.h>#include <sys/time.h>#include "mpi.h"
int main(int argc, char *argv[]){
int r, myid, procs;struct timeval tv;
MPI_Init(&argc, &argv);MPI_Comm_rank(MPI_COMM_WORLD, &myid);MPI_Comm_size(MPI_COMM_WORLD, &procs);
gettimeofday(&tv, NULL);srand(tv.tv_usec);r = rand();
printf("%d: %d¥n", myid, r);MPI_Finalize();
}
表示されるデータの順番がランクと一致しないThe data is not printed out in the order of ranks.
36
課題: 順番をそろえて表示するReport: Display in order• Ex 1) で作成したプログラムについて、以下の条件を満たすように修正しなさい。「ランク0からランクの順に、それぞれのプロセスで生成した乱数を表
示する。」Modify the program in Ex1), so that:
Messages are printed out in the order of the rankof each process
• Example of the result of the execution
0: 15243946311: 9990945012: 9417636043: 5269563784: 1523746435: 11381541176: 19268147547: 156004811
37
Hint• 少なくとも 2つの方法が考えられる
At least, two methods are possible.
• Method 1) (Easy)• Gather the data to rank 0, first.• Then, let rank 0 to print data in order.
• Method 2) (Little hard)• Before rank i prints its data, receive a message from rank i-1 (i > 0)• After rank i prints its data, send a message to rank i+1 (i<P-1)
• P is the total number of processes.
38
ともかく、講義資料のプログラム例を実行し、各行の意味を理解しましょう。First of all, try sample programs in this material and understand the meanings, line by line.