מבוא לעיבוד מקבילי הרצאה מס ' 8 10/12/2001. נושאי ההרצאה הבוחן שהיה חישובים נומריים : פרק 10 מספרם של Wilkinson&Allen

מבוא לעיבוד מקבילי

8הרצאה מס'

10/12/2001

נושאי ההרצאה

הבוחן שהיה•

מספרם של 10חישובים נומריים: פרק •Wilkinson&Allen

Intermediateהמשך באותו נושא מהפרק: •MPI :מהספר Using MPI.

פרוייקטי הגמר – שיבוץ סופי של הנושאים•

)אחרון(3תרגיל בית מס' •

הבוחן שהיה…

56מס' התלמידים שרשומים לקורס: •

53מס' התלמידים שהשתתפו בבוחן:•

השתתפות95%-< כמעט •

הבדיקה תארך זמן. נא להתאזר בסבלנות!•

על הלוח2הסבר לגבי שאלה מס' •

חישובים נומריים

Wilkinson&Allen מספרם של 10פרק

(PDF)עבור למצגת

סיכום פרק: חישובים נומריים

Direct, Recursive andגישות שונות לכפל מטריצות: •Mesh

Gaussפתרון מערכת משוואות לינאריות בשיטת •Elimination.ויישומה בחישוב מקבילי

לפתרון מד"חJacobiשיטת •-Gauss-Seidel, Redשיטות להתכנסות מהירה יותר )•

Black ordering, Over-relaxation)

המושגים המרכזיים בפרק

• Matrix Addition• Matrix Multiplication• Matrix-Vector

Multiplication• Linear Equations• Matrix Multiplication

• Recursive Implementation

• Mesh Implementation• 2D pipeline – Systolic

Array• Gauss Elimination• Jacobi Iteration

המושגים המרכזיים בפרק

• Gauss-Seidel Relaxation

• Red-Black Ordering• Over-relaxation• Multi-Grid

Intermediate MPI

Parts from:“Using MPI” book by Gropp, Lusk and Skjellum,

Chapter 4.

The source codes can be downloaded from:http://www-unix.mcs.anl.gov/mpi/usingmpi/

examples/intermediate/main.htm

Topics

• The Poisson Problem

• Topologies

• Jacobi Iterations

1,..,0,1

1,..,0,1

:)grid( mesh squarea Define

boundary theon (,)(,)

interior thein (,)2

njn

jy

nin

ix

yxgyxu

yxfu

i

i

5 point stencil approx. for 2-D Poisson problem

http://www.mcs.anl.gov/dbpp/text/book.html%3EChapter%208%3C/A%3E%3CP%3E%3C/UL%3E%3CP%3E%3CP%3E%3CP%3E%3Cb%3E%3CH5%3E%20The%20Poisson%20Problem%3C/b%3E%3C/H5%3E%20%3CBR%3E%3CP%3E%3CUL%3E%3CLI%3E%20The%20Poisson%20Problem%20is%20a%20simple%20partial%20differential%20equation%20(PDE)%3CP%3E%3CP%3E%3CIMG%20%20ALIGN=BOTTOM%20ALT=

Jacobi Iterations:

Next Slide: Serial Jacobi Iteration

http://www.mcs.anl.gov/dbpp/text/book.html%3EChapter%208%3C/A%3E%3CP%3E%3C/UL%3E%3CP%3E%3CP%3E%3CP%3E%3Cb%3E%3CH5%3E%20The%20Poisson%20Problem%3C/b%3E%3C/H5%3E%20%3CBR%3E%3CP%3E%3CUL%3E%3CLI%3E%20The%20Poisson%20Problem%20is%20a%20simple%20partial%20differential%20equation%20(PDE)%3CP%3E%3CP%3E%3CIMG%20%20ALIGN=BOTTOM%20ALT=

subroutine sweep

integer i, j, n

double precision u(0:n+1, 0:n+1), unew(0,n+1, 0:n+1)

do 10 j = 1, n

do 10 i = 1, n

unew(i,j) = 0.25*((u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i, j+1) - h*h*f(i,j))

10 continue

Jacobi Iteration – Serial Version

Jacobi Iteration for a Slice

integer i, j, n

double precision u(0:n+1, s:e), unew(0,n+1, s:e)

do 10 j = s, e

do 10 i = 1, n

unew(i,j) = 0.25*((u(i-1,j)+u(i+1,j)+u(i,j-1)+u(i, j+1)-h*h*f(i,j))

10 continue

1-D Decomposition of the Domain

Ghost Points

double precision u)0:n+1,s-1:e+1(

Topology

• Virtual Topology

• Cartesian Topology

• In our case: 2-D Cartesian Toloplogy

A 2D Cartesian Decomposition

Domain Decomposition

C bindings:• MPI_Cart_create(MPI_Comm comm_old, int ndims, int *dims, int *isperiodic, int reorder, MPI_Comm *new_comm)

• MPI_Cart_get

MPI_CART_CREATE

integer dims)2(

logical isperiodic)2(, reoeder

dims)1( = 4dims)2( = 3isperiodic)1( = .false.isperiodic)2(= .false.reorder = .true.ndim = 2call MPI_CART_CREATE)MPI_COMM_WORLD, ndim, dims, isperiodic, reorder, comm2d, ierr(

To determine the coordinates of a calling process

call MPI_CART_GET)comm1d, 2, dims, periods, coords, ierr(print *, ')' coords)1(, ',' coords)2(, '('

call MPI_COMM_RANK)comm2d, myrank, ierr(

call MPI_CART_COORDS)comm2d, myrank,2,coords,ierr(

FORTRAN examples:

2-Step Process to Transfer Data

More Exotic MPI Functions

• MPI_Cart_shift(MPI_Comm comm, int direction, int displ, int *src, int *dest)

• MPI_Proc_null For examples:If (source.ne.MPI_PROC_NULL) thenCall MPI_SEND(…,source,…)

endif

MPE_DECOMP1D

• Determine the array limits )s and e in our code(:

Call MPE_DECOMP1D)n, nprocs, myrank, s, e(

Where: nprocs = # of processes in the Cartesian coordinates,myrank = cart. coord. of the calling processn = size of the array )1..n(

MPE_DECOMP1D

• Similar to:

s = 1+myrank*)n/nprocs(

e = s+)n/nprocs( - 1

A code to exchange data for ghost points using blocking send/recv

subroutine exchng1) a, nx, s, e, comm1d, nbrbottom, nbrtop (include "mpif.h"integer nx, s, edouble precision a)0:nx+1,s-1:e+1(integer comm1d, nbrbottom, nbrtopinteger status)MPI_STATUS_SIZE(, ierr

call MPI_SEND)a)1,e(, nx, MPI_DOUBLE_PRECISION, nbrtop, 0, comm1d, ierr(call MPI_RECV)a)1,s-1(, nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, comm1d, ierr(

call MPI_SEND)a)1,s(, nx, MPI_DOUBLE_PRECISION, nbrbottom, 1, comm1d, ierr(call MPI_RECV)a)1,e+1(, nx, MPI_DOUBLE_PRECISION, nbrtop, 1, comm1d, ierr(returnend

• The previous example was simple

• But

It is not necessarily the best way to implement the exchange of ghost points

Sendrecv )exchange data ver. 2(subroutine exchng1) a, nx, s, e, comm1d, nbrbottom, nbrtop (include "mpif.h"integer nx, s, edouble precision a)0:nx+1,s-1:e+1(integer comm1d, nbrbottom, nbrtopinteger status)MPI_STATUS_SIZE(, ierr

call MPI_SENDRECV) $ a)1,e(, nx, MPI_DOUBLE_PRECISION, nbrtop, 0,$ a)1,s-1(, nx, MPI_DOUBLE_PRECISION, nbrbottom, 0, $ comm1d, status, ierr (call MPI_SENDRECV) $ a)1,s(, nx, MPI_DOUBLE_PRECISION, nbrbottom, 1,$ a)1,e+1(, nx, MPI_DOUBLE_PRECISION, nbrtop, 1, $ comm1d, status, ierr (returnend

Implementation of the Jacobi Iteration-1program main include "mpif.h"integer maxnparameter )maxn = 128(double precision a)maxn,maxn(, b)maxn,maxn(, f)maxn,maxn(integer nx, nyinteger myid, numprocs, ierrinteger comm1d, nbrbottom, nbrtop, s, e, itdouble precision diff, diffnorm, dworkdouble precision t1, t2double precision MPI_WTIMEexternal MPI_WTIMEexternal diffcall MPI_INIT) ierr (call MPI_COMM_RANK) MPI_COMM_WORLD, myid, ierr (call MPI_COMM_SIZE) MPI_COMM_WORLD, numprocs, ierr (

Implementation of the Jacobi Iteration-2 if )myid .eq. 0( thenc Get the size of the problemc print *, 'Enter nx'c read *, nx nx = 110 endif call MPI_BCAST)nx,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierr( ny = nx

c Get a new communicator for a decomposition of the domain call MPI_CART_CREATE) MPI_COMM_WORLD, 1, numprocs, .false., .true., comm1d, ierr (

Implementation of the Jacobi Iteration-3cc Get my position in this communicator, and my neighborsc call MPI_COMM_RANK) comm1d, myid, ierr ( call MPI_Cart_shift) comm1d, 0, 1, nbrbottom, nbrtop, ierr (cc Compute the actual decompositionc call MPE_DECOMP1D) ny, numprocs, myid, s, e (c

c Initialize the right-hand-side )f( and the initial solution guess )a(c call onedinit) a, b, f, nx, s, e (

Implementation of the Jacobi Iteration-4c Actually do the computation. Note the use of a collective operation toc check for convergence, and a do-loop to bound the number of itertions. call MPI_BARRIER) MPI_COMM_WORLD, ierr ( t1 = MPI_WTIME)( do 10 it=1, 100

call exchng1) a, nx, s, e, comm1d, nbrbottom, nbrtop (call sweep1d) a, f, nx, s, e, b (call exchng1) b, nx, s, e, comm1d, nbrbottom, nbrtop (call sweep1d) b, f, nx, s, e, a (dwork = diff) a, b, nx, s, e (call MPI_Allreduce) dwork, diffnorm, 1,

$ MPI_DOUBLE_PRECISION, MPI_SUM, comm1d, ierr ( if )diffnorm .lt. 1.0e-5( goto 20 if )myid .eq. 0( print *, 2*it, ' Difference is ', diffnorm

10 continue

Implementation of the Jacobi Iteration-5

if )myid .eq. 0( print *, 'Failed to converge'20 continue t2 = MPI_WTIME)( if )myid .eq. 0( then print *, 'Converged after ', 2*it, ' Iterations in ', t2 - t1, $ ' secs ' endifc call MPI_FINALIZE)ierr( end

c Perform a Jacobi sweep for a 1-d decomposition. c Sweep from a into bsubroutine sweep1d( a, f, nx, s, e, b )

integer nx, s, e double precision a(0:nx+1,s-1:e+1), f(0:nx+1,s-

1:e+1), + b(0:nx+1,s-1:e+1)

integer i, j double precision h h = 1.0d0 / dble(nx+1) do 10 j=s, e do 10 i=1, nx b(i,j) = 0.25 * (a(i-1,j)+a(i,j+1)+a(i,j-1)+a(i+1,j))

– + h * h * f(i,j) 10 continue

return end

c c The rest of the 1-d program

double precision function diff( a, b, nx, s, e ) integer nx, s, e double precision a(0:nx+1, s-1:e+1), b(0:nx+1, s-

1:e+1) double precision sum integer i, j sum = 0.0d0 do 10 j=s,e do 10 i=1,nx sum = sum + (a(i,j) - b(i,j)) ** 2

10 continue diff = sum return end

MPE_DECOMP1D – 1/2(מצ"ב למען השלמות)

C This file contains a routine for producing a decomposition of

C a 1-d array c when given a number of processors.

C It may be used in "direct" product decomposition.

C The values returned assume a "global" domain in [1:n]

subroutine MPE_DECOMP1D( n, numprocs, myid, s, e )

integer n, numprocs, myid, s, e

integer nlocal

integer deficit

MPE_DECOMP1D – 2/2nlocal = n / numprocs

s = myid * nlocal + 1

deficit = mod(n,numprocs)

s = s + min(myid,deficit)

if (myid .lt. deficit) then

nlocal = nlocal + 1

endif

e = s + nlocal - 1

if (e .gt. n .or. myid .eq. numprocs-1) e = n

return

end

Timing for variants of the 1-D decomposition of the Poisson problem

PBlocking Send

Ordered Send

SendrecvBuffered Send

Non Blocking Isend

15.385.545.545.385.40

22.772.882.912.752.77

41.581.561.571.501.51

81.150.9470.9310.8540.849

161.180.5740.5340.5210.545

321.940.4430.4510.4250.397

643.730.4470.3910.3620.391

e~20% (1/14 faster than 1proc)

תזכורת…

עיבוד תמונה בשבוע הבא:

Wilkinson&Allen מספרם של 10פרק

השיעור יסתיים מוקדם מהרגיל,

!10:00עד לשעה

*באותו יום לא תתקיים שעת קבלה*

3תרגיל בית מס'

תרגיל הבית האחרון בקורס•

יש להגישו תוך שבועיים, כלומר עד להרצאה •(.24/12/2001 )10מס'

מטרת התרגיל: פתרון משוואת החום •הבלתי-תלוייה בזמן )מצב עמיד( באמצעות

חישוב מקבילי

3תרגיל בית מס' משוואת החום תיפתר בשני ממדים. תחום הבעיה •

מלבני )ריבועי(. מעלות 100 מעלות בכל השפות ו- 20תנאי השפה הם: •

בשליש האמצעי של אחת השפות ראה ציור בעוד שני שקפים.

.100X100השתמש/י במערך ריבועי בגודל של לפחות •קבע/י לתכנית קריטריון עצירה על-סמך התכנסות, כפי •

שנלמד בשיעור


יש לצרף ציור של פילוג הטמפרטורה הסופי •(contour plot, lego plot, surface plot etc.)

(double)יש לעבוד במשתנים בדיוק כפול •לבצע את שלב חילופי המידע בין המחשבים בשני •

אופנים:Blocking SEND/RECVא. –

Sendrecvב. העברה משולבת –

לבדוק אם יש הבדל בביצועים כפי שתואר בטבלה בשיעור


20

100


Jacobi Iterationsהפתרון יהיה בשיטת •בצע/י פרוק חד-ממדי של מרחב הבעיה•יש לצרף שרטוט של ההתכנסות )השגיאה הגלובלית( •

כפונקציה של מס' האיטרציות. מעבדים ולצרף 8, ו-4, 2, 1יש להריץ את התכנית על •

שרטוט של זמן הריצה כפונקציה של מס' המעבדים שנלמדו בשיעור!MPIיש להשתמש בפקודות ה- •


מלבד תנאי השפה יש להניח פילוג טמפרטורה •כל שהוא כתנאי התחלה.

גראפיקה:•ניתן לבצע את הציורים בכל תכנה שהיא כאשר

)הנתונים נשמרים בקובץ(.Offlineהציור נעשה הנמצאת Gnuplotמומלץ לנסות לעבוד עם תכנת

. Linuxבמחשבי

1פרוייקטי גמר – שיבוץ הנושאים -

Arik Tal PP1 Condor -DES

Lior,Ran&Gilad PP2,21 Special Project: TSP- A larger Project!

Ilya Dimitri PP3 Barnes-Hut algorithm

Uri Ari PP4 Special Project: Image Processing for DSP

Ida Victoria PP5 The Map Cover Problem

Uri Ofer PP6 Merge/Sort

Yuval Haim PP7 DLA

Dan Amir PP8 Special Project: Image Processing: Target Detection

Ziv Chen PP9 Game of Life

Iliya Boris PP10 Parallel Python


Ziv Kobi PP11 Fractal Dimension

Ran Yotam PP12 Solving TSP using Genetic Algorithm

Ronen Gal PP13 Parallel Tennis Game )* I prefer Hough Trans(

Yaron Erez PP14 Point on the Perimeter

Yaniv Alon PP15 Ships and Submarines

Guy Haim PP16 Parallel Bingo game simulation

Roy Amos PP17 Special Project: Packing

Yuval Nimrod PP18 Parallel Random Number Generator

Eitan Ori PP19 Characters Recognition

Tomer Rotem PP20 Condor Protein


Gilad -------- PP21 Will do a project with group 2

Natan Ohad PP22 Fish&Sharks

Eran Liat PP23 Special Project: Hash-Tables

Elad Yariv PP24 Parallel Java

Noam Ron PP25 ???

AmichaySegev PP26 Robo-Soccer

Adi Itai PP27 Special Project: Viterbi Algorithm Using PP

Shaul Yochay PP28 Parallel Prime Numbers Search

סוף

Documents

מבוא לעיבוד מקבילי הרצאה מס ' 8 10/12/2001. נושאי ההרצאה הבוחן שהיה חישובים נומריים : פרק 10 מספרם של Wilkinson&Allen