32
A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTE D SYSTEM, VOL. 17, NO. 12, DECEMBER 20 06 Presented by 張張張

A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

A Parallel Computational Model for Heterogeneous Clusters

Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO. 12, DECEMBER 2006Presented by 張肇烜

Page 2: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Outline

Introduction Heterogeneous LogGP HLogGP Validation Experimental Results Conclusions

Page 3: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Introduction

During the last decade, Beowulf clusters have had tremendous dissemination and acceptance.

However the design and implementation of efficient parallel algorithms for clusters is still a problematic issue.

Page 4: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Introduction (cont.)

In this paper, a new heterogeneous parallel computational model based on the LogGP model is proposed.

Page 5: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Heterogeneous LogGP

Reasons for selecting LogGP modelThe architecture is very similar to a cluster.LogGP removes the synchronization points ne

eded in other models.LogGP allows considering both short and long

messages.

Page 6: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Heterogeneous LogGP (cont.)

LogGP assumes finite network capacity, avoiding situation where the network becomes a bottleneck.

This model encourages techniques that yield good results in practice, such as designing algorithms with balanced communication patterns.

Page 7: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Heterogeneous LogGP (cont.)

HLogGP Definition:Latency, L: Communication latency depends o

n both network technology and topology.The Latency Matrix of a heterogeneous cluste

r can be defined as a square matrix L={l1,1, …, lm,m}.

Page 8: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Heterogeneous LogGP (cont.)

Overhead, o: the time needed by a processor to send or receive a message is referred to as overhead.

Sender overhead vector, Os={os1,…,osm}.

Receiver overhead vector, Or={or1,…,orm}.

Gap between message, g: this parameter reflects each node’s proficiency at sending consecutive short messages.

A gap vector g={g1,…,gm} .

Page 9: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Heterogeneous LogGP (cont.)

Gap per byte, G: The Gap per byte depends on network technology.

In a heterogeneous network, a message can cross different switches with different bandwidths.

Gap matrix G={G1,1,…,Gm,m}.

Page 10: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Heterogeneous LogGP (cont.)

Computational power, Pi: The number of nodes cannot be used in a heterogeneous model for measuring the system’s computational power.

A computational power vector P={P1,…,Pm}.

Page 11: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation

Cluster Description:

100Mbps

10Mbps(slow, S)

(fast, F)

Page 12: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation (cont.)

Benchmark 1:

Page 13: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation (cont.)

Benchmark 2: Source code of the benchmark for measuring the gap between messages.

Page 14: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation (cont.)

Overhead:

Page 15: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation (cont.)

Overhead:

Page 16: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation (cont.)

Latency:

Switch-switchhub-hub

Switch-hub

Page 17: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation (cont.)

Gap between messages:

Page 18: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation (cont.)

Gap per Byte:

Page 19: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

HLogGP Validation (cont.)

Computational power:

Page 20: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results

Three objectives were pursued in the tests presented here.To verify HLogGP is accurate enough to predict

the response time of a parallel program. To verify that heterogeneity has a strong impac

t on system performance.To show how the cluster parametrization may b

e used for determining the performance of a parallel program on a real application environment.

Page 21: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

A volumetric magnetic resonance image compression application was selected.

The sequential process may be divided into the following stages.Data acquisition.Data read and memory allocation.Computation of the 3D Harr wavelet transform.

Page 22: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

Thresholding.Encoding of the subbands using the run-lengt

h encoding compression algorithm.Write back of the compressed image.

Page 23: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

A theoretical analysis of the application’s response time is presented.First stage: The master distributes the raw

data among the slave processors.

The number of total slices.

The cluster’s total computational power.

Slices of each slave i will receive

Page 24: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

The total time for this stage is :

Cycles of sending overhead to get the first byte into the network

Subsequent bytes take G cycles to be sent

Each byte travels through the network for cycles

The receiving processor spends in receiving overhead

Page 25: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

Second stage: In this case, the response time is the time spent by the last slave to finish its work.

The total response time for the second stage is estimated as the response time of a generic slave processor:

Page 26: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

Third stage: The master process has to first gather the partial results produced by all of the slave processes.

The total response time of the third phase is calculated as a sumatory:

Page 27: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

Fourth stage: The master process has to send an image subband to each of the slave processes.

The total time for this stage is:

Page 28: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

Fifth stage: This stage is similar to the second, the amount of work is not distributed according to the nodes’ computational power.

This time could be given approximately by the following expression :

Page 29: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

Sixth stage: This stage is similar to the third stage, the message’s sizes cannot be determined a priori.

K is determined by the subband size

Page 30: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

Execution Results

Page 31: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Experimental Results (cont.)

Execution Results

Page 32: A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO

Conclusion

In this paper, HlogGP model for heterogeneous clusters has been proposed and validated.

The model can be applied to heterogeneous clusters where either the nodes, the interconnection network, or both are heterogeneous.