1 VLDB - Data Management in Grids B. Del-Fabbro, D. Laiymani, J.M. Nicod and L. Philippe Laboratoire...

Preview:

DESCRIPTION

3 Introduction: the NES context Example: antenna positioning

Citation preview

1

VLDB - Data Management in Grids VLDB - Data Management in Grids

B. Del-Fabbro, D. Laiymani, J.M. Nicod and L. Philippe

Laboratoire d’Informatique de l’Université de Franche-Comté

Séoul, Koréa, 11 September 2006

Design and experimentations of an Design and experimentations of an efficient data management service for efficient data management service for

NES architecturesNES architectures

2

OutlineOutline

Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

3

Introduction: the NES Introduction: the NES contextcontext

Example: antenna positioning

4

Introduction: the NES Introduction: the NES contextcontext

Exec (EXTRACTION, img1,img2)

Agent (Broker)

RPC-based model

Servers (provide services)

Client

5

Introduction: the NES Introduction: the NES contextcontext

Exec (ANTENNA, img3)

Agent

Data can be reused for further computations

6

Introduction: the NES Introduction: the NES contextcontext

Exec (EXTRACTION, img1,img2)

Agent

It is necessary to allow the storage of some data Data persistency

7

Introduction: the NES Introduction: the NES contextcontext

Exec (ANTENNA, &img3)

Agent

It is necessary to allow the storage of some data Data persistency

8

Introduction: the NES Introduction: the NES contextcontext

Exec(ANTENNA, &img3)Exec(RENDU,&img3)Exec(ANTENNA,&img3)

Agent

It is necessary to take advantage of parallelism due to independant tasks

Data replication

9

GoalGoal

To propose a data management service for NES architectures which

implements datapersistency and data replication

concepts in the most transparent way for end-users

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

10

OutlineOutline

Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

11

Related work: non-NES Related work: non-NES architecturesarchitectures

Data Grid context Separating data physical and logical view European Data Grid…

Grid Computing context Large number of widely distributed nodes GASS, LegionFS…

Stork Pre-placement tool Generally coupled with meta-scheduler

Concepts

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

12

Related work: non-NES Related work: non-NES architecturesarchitectures

Mainly storage and system oriented

Difficult to adapt to NES environments

Data transfers are explicitely performed at the client level

Lack of transparency

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Drawbacks

13

Related work: NES Related work: NES architecturesarchitectures

Decreasing network traffic Between clients and servers Ensuring that no unnecessary data are transmitted

NetSolve Request Sequencing Distributed Storage Infrastructure (DSI)

Drawbacks Data management is performed for only one computation sequence

Data transfers are explicit at client level

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Concepts

14

OutlineOutline

Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

15

IssuesIssues

Replicas consistency For update operations

Do all the replicas have to be updated ? Or all the replicas are independant copies ?

Data Storage To store data as close as possible to servers Physical limitations of storage resources

Security Secure access policy Data can be shared access rights

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

A NES data management service must address the following issues:

16

IssuesIssues

Data localization For data item stored inside the platform To find where a data item is stored

Data identification A data item must be fully identified

a client does not have to know where its data are stored

Data handle = unique reference to a data item

Data redistribution Bandwith is better between servers than between clients and servers

Move data between computational servers

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

A NES data management service must address the following issues:

17

OutlineOutline

Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

18

The data management The data management service: DTMservice: DTM

Data Tree Manager (DTM) Distributed as a part of the DIET platform Flexible enough to be implemented in other platform

Distributed Interactive Engineering Toolbox (DIET) NES CORBA-based platform Hierarchical architecture Master and Local Agents Performance forecasting tool (FAST)

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Basics

19

The data management The data management service: DTMservice: DTM

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Architecture

20

The data management The data management service: DTMservice: DTM

The Logical Data Manager It manages a list of tuples (data handle, owners)

data present in its sub-tree It provides the localization knowledge

The Physical Data Manager It manages a list of persistent data It stores data and provides them to its server It informs its parent when update operations (add, move, delete) occur

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Components

21

The data management The data management service: DTMservice: DTM

The Data Mover It provides mechanisms for data transfers between Data Managers

Data transfer management and data recording are separatedIntegration of different transfer protocols: GridFTP, RFT…

The Replica Manager It sends replication orders to Data Mover It allows the choice of the best replica to be transferred (NWS tool)

It uses a distributed protocol no distinction between the original data and its replicas

Replicas are read-only but the architecture allows the implementation of any consistency technique

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Components

22

The data management The data management service: DTMservice: DTM

Communiation occurs between DIET and DTM components Low bandwith consumption for data management

Updates operations are limited to sub-trees Again low bandwith consumption for data management

DTM minimizes the number of data copy operations (CORBA) Crucial for large data

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Architecture advantages

23

The data management The data management service: DTMservice: DTM

Only end-users have the knowledge of the application they submit Only end-users have the knowledge of the data that must be managed

The persistence mode It allows to choose if data must be persistent or not

The data handleEnd-users do not need to know where data are stored

The API Based on the profile concept

Problem name + data or date handle + persistence mode

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

The end-user point of view

24

OutlineOutline

Introduction: the NES context Related work Motivations and issues The data management service Experimental results Conclusion and future work

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

25

Experimental resultsExperimental results

Previous experiments show:The good scalability and low overhead of DTM

The following tests show:The relevance of the data persistency approachThe performances of the data replication policy

Platform: DTM deployed over two laboratories far from 100 km

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Description

26

Experimental resultsExperimental results

1 MA - 2 LA and 2 servers locally interconnected (100 Mbits/s)

1 client in the remote site (16 Mbits/s)

Linear algebra application

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Data persistence benefits

27

Experimental resultsExperimental results

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Data persistence benefits

28

Experimental resultsExperimental results

1 MA - 6 servers

Computing the occurrences number of a letter in a file

Synchronous requests are sent to the platform

When data item are not present they are replicated

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Replication benefits

29

Experimental resultsExperimental results

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Replication benefits

30

Experimental resultsExperimental results

Medical imagery application

Input files (from 0.1 Mbytes up to 500 Mbytes)

Several extractions parameters are applied

Result = jpeg file

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Use case: Dividing Cubes

31

Experimental resultsExperimental results

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Use case: Dividing Cubes

32

ConclusionConclusion

Feasability for NES environments

Fully implemented and integrated in DIET since version 1.1

Promising experimental results

Normalisation proposition (GGF)

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

33

Future workFuture work

Finalization of the GGF proposal

Tests on the Grid5000 platform

Fault tolerance

Integration of DTM in data grids

VLDB - DMG Workshop - Séoul, Koréa - 11 September 2006

Recommended