Grid Computing

Preview:

Citation preview

1/27

Grid Computing

李孝治

2/27

Outline

Introduction to Grid Computing

Standards for Grid Computing

Scheduling for Grid Systems

3/27

1. Introduction to Grid Computing “Grid Services for Distributed System

Integration”Foster, Kesselman, Nick, and Tuecke,

IEEE Computer, June 2002.

Grid technologies and infrastructures support the sharing and coordinated use of diverse resources in dynamic, distributed virtual organizations (VO)

4/27

Virtual Organizations (VO) includeCreation of virtual computing systemsGeographically distributed components operated by

distinct organizations with differing policies, Sufficiently integrated to deliver the desired QoS.

Resources:Computational devices, Networks, Online instruments, Storage archives, etc.

5/27

“What is Grid? A Three Point Checklist” http://www-ip.mcs.anl.gov/~foster

/Articles/WhatIsTheGrid.pdf

1) coordinates resources that are not subject to centralized control …

2)… using standard, open, general-purpose protocols and interfaces…

3)… to deliver nontrivial 重要 qualities of service.

6/27

“The Anatomy 解剖 of the Grid” http://www.globus.org/research/papers/anatomy.pdf

“Grid” computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation.

7/27

Grid Applications Application service providersStorage service providersConsultants engaged by a car manufacturer to perform

scenario evaluation during planning for a new factory

Members of an industrial consortium 財團 bidding on a new aircraft

A crisis management team and the databases and simulation systems that they use to plan a response to an emergency situation

Members of a large, international, multiyear high-energy physics collaboration

8/27

A de facto standard for Grid systems: OGSA (Open Grid Services Architecture) Globus Toolkit version 3 (GT3)Defines a uniform exposed service semantics

(the Grid service); Defines standard mechanisms for creating, naming,

and discovering transient Grid service instances; Provides location transparency and multiple protocol

bindings for service instances; Supports integration with underlying native platform

facilities.

2. Standards for Grid Computing

9/27

Core Grid service interfaces

Base resource management data transfer information services reservation monitoring

Data data management

Grid workload management diagnostics

10/27

Figure 1. OGSA Grid service.

11/27

Figure 2. Three different VO structures

(1) A simple hosting environmentA set of resources locatedwithin a single administrative domain

(2) A virtual hosting environmentA set of resources span two B2B domains

(3) Collective servicesa set of resources span two E2E domains

12/27

3. Scheduling in Grid Systems

“Scheduling strategies for mixed data and task parallelism on heterogeneous clusters and grids” O. Beaumont, A. Legrand, and Y. RobertProceedings of the Eleventh Euromicro Conference on

Parallel-Distributed and Network-Based Processing (Euro-PDP’03)

13/27

The complex application consists of a suite of identical, independent problems to be solved.

Each problem consists of a set of tasks.

There are dependences (precedence constraints) between these tasks.

Fork graph (task graph) A problem instance: e.g. a loop iteration of 4 tasks Operates on different data

14/27

Platform graph E.g.

4 resources and 5 communication links

Both with different rates Master node:

an initial node

The question for the master is to decidewhich tasks to execute itself, and how many tasks to forward to each of its

neighbors.

Each neighbor faces in turn the same dilemma.

15/27

Due to heterogeneity, the neighbors may receive different amounts of work (maybe none for some of them).

Because the problems are independent, their execution can be pipelined. At a given time-step, different processors may well

compute different tasks belonging to different problem instances.

16/27

ObjectiveTo determine the optimal steady state scheduling

policy for each processorThat is, the fraction of time spent computing, and the

fraction of time spent sending or receiving each type of tasks along each communication link, so that the (averaged) overall number of problems processed at each time-step is maximum.

17/27

The modelThe application

Let P(1), P(2), …, P(n) be the n problems to solve, where n is large.

Each problem P(m) corresponds to a copy G(m) = (V (m), E(m)) of the same task graph (V, E).

The number |V| of nodes in V is the number of task types. In the example of Figure 1, there are four task types,

denoted as T1, T2, T3 and T4.

Overall, there are n * |V| tasks to process, since there are n copies of each task type.

18/27

The architecture The target heterogeneous platform is represented by a

directed graph, the platform graph. There are p nodes P1, P2, … , Pp that represent the

processors. In the example of Figure 2 there are four processors, hence

p = 4. Each edge represents a physical interconnection. Each edge eij : Pi Pj is labeled by a value cij which

represents the time to transfer a message of unit length between Pi and Pj, in either direction.

We assume a full overlap, single-port operation mode, where a processor node can simultaneously receive data from one of its neighbor, perform some (independent) computation, and send data to one of its neighbor.

19/27

Execution time Processor Pi requires wi,k time units to process a task of type

Tk.

Assume that wi,k = wi ×δk, where wi is the inverse of the relative speed of processor Pi, and δk the weight of task Tk.

Communication time Each edge ek,l : Tk Tl in the task graph is weighted by a

communication cost datak,l that depends on the tasks Tk and Tl. It corresponds to the amount of data output by Tk and required as input to Tl.

The time to transfer the data from Pi to Pj is equal to datak,l × ci,j

20/27

Steady-state equations The (average) fraction of time spent each time-unit by Pi to

send to Pj data involved by the edge ek,l.

s(Pi Pj, ek,l) = sent(Pi Pj, ek,l) × (datak,l × ci,j) (1)

where sent(Pi Pj, ek,l) denotes the (fractional) number of files sent per time-unit

The (average) fraction of time spent each time-unit by Pi to process tasks of type Tk:

α(Pi, Tk) = cons(Pi, Tk) × wi,k (2)

where cons(Pi, Tk) denotes the (fractional) number of tasks of type Tk processed per time unit by processor Pi.

21/27

Activities during one time-unit

One-port model for outgoing communications

One-port model for incoming communications

22/27

Full overlap

23/27

Conservation laws

24/27

25/27

Theorem 1. The solution to the previous linear programming problem provides the optimal solution to SSSP(G).

26/27

27/27

ConclusionsWe have shown how to determine the best steady state

scheduling strategy for a general task graph and for a general platform graph, using a linear programming approach.

This work can be extended in the following two directions: On the theoretical side, we could try to solve the problem of

maximizing the number of tasks that can be executed within K time-steps, where K is a given time bound. [Initialization phase]

On the practical side, we need to run actual experiments rather than simulations.

End

28/27

Other scheduling for Grid systems Local grid scheduling techniques using

performance prediction Spooner, D.P.; Jarvis, S.A.; Cao, J.; Saini, S.; Nudd, G.R.; Computers and Digital Techniques, IEE Proceedings- , Volume: 150 Issue: 2 , March 2003

Ant algorithm-based task scheduling in grid computing Zhihong Xu; Xiangdan Hou; Jizhou Sun; Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on , Volume: 2 , May 4-7, 2003

29/27

Near-optimal dynamic task scheduling of independent coarse-grained tasks onto a computational grid Fujimoto, N.; Hagihara, K.; Parallel Processing, 2003. Proceedings. 2003 International Conference on , 6-9 Oct. 2003

Scheduling in a grid computing environment using genetic algorithms Di Martino, V.; Mililotti, M.; Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, Abstracts and CD-ROM , 15-19 April 2002

End