Author
tamarr
View
30
Download
1
Embed Size (px)
DESCRIPTION
Introductory Parallel Applications. Courtesy: Dr. David Walker, Cardiff University. Example1 – wave equation Courtesy: David Walker, Cardiff University. Problem – Vibrating string of length L fixed at both ends given an initial displacement. To determine Ψ (x, t). displacement. x=L. x=0. - PowerPoint PPT Presentation
Introductory Parallel ApplicationsCourtesy: Dr. David Walker, Cardiff University
Example1 wave equationCourtesy: David Walker, Cardiff UniversityProblem Vibrating string of length L fixed at both ends given an initial displacement.To determine (x, t)
displacementx=0x=L
Vibrating String ProblemTo solve 2/c2t2 - 2/x2 = 0Subject to (0, t) = (L, t) = 0; (x, 0) = u(x)Numerical approximation: i(t+t) = 2i(t) i(t-t) + tau2(i-1(t) -2i(t) + i+1(t)) where tau = c t/ x and u(x) = sinx
Vibrating String Parallel FormulationThe points are divided across processesCommunication between neighboring processorsData structure an array for every processor with sizes local_size+2
Vibrating String Parallel Code/* Set up 1D topology */ MPI_Cart_create(MPI_COMM_WORLD,1,&nprocs,&periods,reorder,&new_comm); MPI_Cart_coords (new_comm, rank, 1, &mypos) MPI_Cart_shift (new_comm, 0, -1, &right, &left);
/* Initialise array */ local_npts = npoints/nprocs; lbound = mypos*local_npts; for(i=0;i
Example 2 Laplace equation problemCourtesy: Dr. David Walker, Cardiff UniversityElectric field around a conducting object in a boxTo solve 2/x2 + 2/y2 = 0Subject to (0, y) = (L1, y) = 0; (x, 0) = (x, L2) = 0; (x, y) =1 on S(x, y)
S(x, y)x=0x=L1y=0y=L2
Laplace equation Numerical approximationWe consider a square problem, L1 = L2 = Li,jk = (i-1,jk-1 + i+1,jk-1 + i,j-1k-1 + i,j+1k-1)
10
Laplace Equation Sequential codedouble phi[NXMAX][NYMAX];double old_phi[NXMAX][NYMAX];int mask[NXMAX][NYMAX];
setup_grid(phi, nptsx, nptsy, mask);
for(k=0; k
Laplace Equation Sequential codesetup_grid(phi, nptsx, nptsy, mask){
for(i=0; i
Laplace Equation - ParallelizationA 2-D topology of processes are consideredThe updates of boundary values for a given processor may need communication from its neighboring processors.Following are main data structures double phi[NYMAX+2][NXMAX+2]; double oldphi[NYMAX+2][NXMAX+2]; int mask[NYMAX+2][NXMAX+2];
LAPLACE Equation - Parallelization/* Work out number of processes in each direction of the process mesh */ dims[0] = dims[1] = 0; MPI_Dims_create (nprocs, 2, dims); npy = dims[0]; npx = dims[1];
/* Set up 2D topology */ periods[0] = periods[1] = 0; MPI_Cart_create (MPI_COMM_WORLD, 2, dims, periods, 1, &new_comm); MPI_Cart_coords (new_comm, rank, 2, coords); myposy = coords[0]; myposx = coords[1];
/* Determine neighbouring processes for communication shifts */ MPI_Cart_shift (new_comm, 0, -1, &down, &up); MPI_Cart_shift (new_comm, 1, -1, &right, &left);
/* Initialise arrays */ setup_grid (phi, npx, npy, myposx, myposy, nptsx, nptsy, &local_nptsx, &local_nptsy, mask);
Laplace Equation - Parallelizationsetup_grid (phi, npx, npy, myposx, myposy, nptsx, nptsy, local_ptrx, local_ptry, mask){
local_nptsx = nptsx/npx; local_nptsy = nptsy/npy; for(j=0;j
Laplace Equation Performance AnalysisFor n2 grid size, N = PxQ processor mesh, tcalc time for a flop, tshift time for communication per grid pointTseq =T(N) = 4n2tcalc(4n2/N)tcalc + (2n/P)tshift + (2n/Q)tshift
A more irregular problem Molecular Dynamics SimulationCourtesy: Dr. David Walker, Cardiff UniversityIn the previous problems, communication requirements are known in advanceIn the current Molecular Dynamics simulation problem, the amount of data that are communicated between processors are not known in advanceThe communication is slightly irregular
The ProblemA domain consisting of number of particles (molecules)Each molecule, i is exerted a force, fij by another molecule, jThe sum of all the forces, Fi = jfij makes the particles assume a new position and velocityParticles that are r distance apart do not influence each otherGiven initial velocities and positions of particles, their movements are followed for discrete time steps
MD - SolutionThe cutoff distance, r is used to reduce the time for summation from O(n2)rrDomain decomposed into cells of size rxrParticles in one cell interact with particles in the neighbouring 8 cells and particles in the same cell
MD - SolutionData structures:An array of all particles. Each element holds A 2D array of linked lists, one for each cell. Each element of a linked list contains pointers to particles.struct particle{ double position[2]; double velocity[2];} Particles[MAX_PARTICLES];
struct list{ particle* part; struct list* next;}*List[MAX_CELLSX][MAX_CELLSY];Linked ListParticles
MD Sequential LogicInitialize Particles and Lists;
for each time step for each particle i Let cell(k, l) hold i F[i] = 0; for each particle j in this cell and neighboring 8 cells, and are r distance from i{ F[i]+= f[i, j]; } update particle[i].{position, velocity} due to F[i]; if new position in new cell (m,n) update Lists[k,l] and Lists[m,n]
MD Parallel VersionrrA 2D array of processors similar to LaplaceEach processor holds a set of cellsDifferences:A processor can communicate with the diagonal neighborsAmount of data communicated varies over time stepsReceiver does not know the amount of data
MDS parallel solutionSteps1. Communication Each processor communicates parameters of the particles on the boundary cells to its 8 neighboring cellsChallenges to communicate diagonal cells2. Update Each processor calculates new particle velocities and positions3. Migration Particles may migrate to cells in other processorsOther challenges:Appropriate packing of data.Particles may have to go through several hops during migrationAssumptions:1. For simplicity, let us assume that particles are transported to only neighboring cells during migration
MDS parallel solution 1st stepCommunication of boundary dataAAAAaaaaaaaaaaaaBBBBbbbbbbbbbbbbCCCCccccccccccccDDDddddddddddddd
MDS parallel solution 1st stepCommunication of boundary dataAAAAaaaaaaaaaaaaBBBBbbbbbbbbbbbbCCCCccccccccccccDDDdddddddddddddAAAAaaaaaaaaaaaaCCcccCCCCccccccccccccBBBBbbbbbbbbbbbbDDDdddddddddddddAAaaaBBbbbDDdddBBbbbAAaaaCCcccDDddd
MDS parallel solution 1st stepCommunication of boundary dataAAAAaaaaaaaaaaaaBBBBbbbbbbbbbbbbCCCCccccccccccccDDDdddddddddddddAAAAaaaaaaaaaaaaCCcccCCCCccccccccccccBBBBbbbbbbbbbbbbDDDdddddddddddddAAaaaBBbbbDDdddBBbbbAAaaaCCcccDDdddDCBACan be achieved by ?Shift left, shift right, shift up, shift down
MDS parallel solution 2nd stepUpdate:Similar to sequential program.A processor has all the required information for calculating Fi for all its particlesThus new position and velocity determined.If new position belongs to the same cell in the same processor, do nothingIf new position belongs to the different cell in the same processor, update link lists for old and new cells.
MDS parallel solution 3rd stepIf new position belongs to the different cell in a different processor particle migrationfor each particle p update {position, velocity} determine new cell if new cell # old cell delete p from list of old cell if(different processor) pack p into appropriate communication buffer remove p from particle array
Shift leftShift rightShift upShift down
MDS parallel solution 3rd stepThis shifting is a bit different from the previous shiftingA processor may just act as a transit point for a particleHence particles have to be packed with careShift left:MPI_Sendrecv(leftbuf, nsend_left, , left rbuf, max_size*4, .., right, &status);MPI_Getcount(status, MPI_DOUBLE, &nrecv);particles = nrecv/4;
for(i=0; i
MDS commentsGeneric solutionA particle can move to any cellForce can be from any distanceLoad balancing
A Dynamical System- WaTorCourtesy: Dr. David Walker, CardiffA 2-D ocean in which sharks and fish survive2 important featuresNeed for dynamic load distributionPotential conflicts due to updates by different processorsFeatures shared by other advanced parallel applications
WaTor The problemOcean divided into gridsEach grid cell can be empty or have a fish or a sharkGrid initially populated with fishes and sharks in a random mannerPopulation evolves over discrete time steps according to certain rules
WaTor - RulesFish:At each time step, a fish tries to move to a neighboring empty cell. If not empty, it staysIf a fish reaches a breeding age, when it moves, it breeds, leaving behind a fish of age 0. Fish cannot breed if it doesnt move.Fish never starves
WaTor - RulesShark:At each time step, if one of the neighboring cells has a fish, the shark moves to that cell eating the fish. If not and if one of the neighboring cells is empty, the shark moves there. Otherwise, it stays.If a shark reaches a breeding age, when it moves, it breeds, leaving behind a shark of age 0. shark cannot breed if it doesnt move.Sharks eat only fish. If a shark reaches a startvation age (time steps since last eaten), it dies.
Inputs and Data StructuresInputs:Size of the gridDistribution of sharks and fishesShark and fish breeding agesShark starvation ageData structures:A 2-D grid of cells struct ocean{ int type /* shark or fish or empty */ struct swimmer* occupier; }ocean[MAXX][MAXY]A linked list of swimmers struct swimmer{ int type; int x,y; int age; int last_ate; int iteration; swimmer* prev; swimmer* next; } *List;Sequential Code LogicInitialize ocean array and swimmers listIn each time step, go through the swimmers in the order in which they are stored and perform updates
Towards a Parallel Code2-D data distribution similar to Laplace and molecular dynamics is used. Each processor holds a grid of ocean cells.For communication, each processor needs data from 4 neighboring processors.2 new challenges potential for conflicts, load balancing
1st Challenge Potential for ConflictsUnlike previous problems, border cells may change during updates due to fish or shark movementBorder cells need to be communicated back to the original processor. Hence update step involves communicationIn the meantime, the original processor may have updated the border cell. Hence potential conflictsTime TTime T+1SFSFFFFFFFSFFSFFFFFS
2 TechniquesRollback updates for those particles (fish or shark) that have crossed processor boundary and are in potential conflicts.May lead to several rollbacks until a free space is found.2nd technique is synchronization during updates to avoid conflicts in the first place.
2 TechniquesDuring update, a processor x sends its data first to processor y, allows y to perform its updates, get the updates from y, and then performs its own updates.Synchronization can be done by sub-partitioning.Divide a grid owned by a processor into sub-grids.12341234update
Load ImbalanceThe workload distribution changes over time2-D block distribution is not optimalTechniques:Dynamic load balancerStatic load balancing by a different data distribution
Dynamic load balancingPerformed at each time stepOrthogonal recursive bisectionProblems: Complexity in finding the neighbors and data for communication
Static Data DistributionCyclic0,0 0,1 0,2 0,3 0,0 0,1 0,2 0,3 1,0 1,1 1,2 1,3 1,0 1,1 1,2 1,3 2,0 2,1 2,2 2,3 2,0 2,1 2,2 2,3 3,0 3,1 3,2 3,3 3,0 3,1 3,2 3,3Problems: Increase in boundary data; increase in communication