13
DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – Title – n° 1 Grid Data Management in Action Experience in Running and Supporting Data Management Services in the EU DataGrid Project Flavia Donno (Former EDG WP2, LCG) [email protected] http://chep03.ucsd.edu/files/249.ppt

Flavia Donno (Former EDG WP2, LCG) [email protected]

  • Upload
    minna

  • View
    61

  • Download
    1

Embed Size (px)

DESCRIPTION

Grid Data Management in Action Experience in Running and Supporting Data Management Services in the EU DataGrid Project. Flavia Donno (Former EDG WP2, LCG) [email protected]. http://chep03.ucsd.edu/files/249.ppt. Talk Outline. Introduction Replication Tools Architecture Overview - PowerPoint PPT Presentation

Citation preview

Page 1: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – Title – n° 1

Grid Data Management in Action Experience in Running and Supporting Data Management Services in the EU DataGrid Project

Flavia Donno (Former EDG WP2, LCG)

[email protected]

http://chep03.ucsd.edu/files/249.ppt

Page 2: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 2

Talk Outline

Introduction

Replication Tools

Architecture Overview

GDMP and edg-replica-manager details

History and Deployment

Summary and Future Work

AuthorsHeinz Stockinger – CERN/EP, CMSFlavia Donno, CERN/IT LCG and INFN PisaErwin Laure, Shahzad Muzaffar – CERN/EPGiuseppe Andronico – INFN CataniaPeter Kunszt - CERN/ITPaul Millar  - PPARC

Page 3: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 3

Introduction

Data management: large amounts of data at distributed sites

SE SE

SE

Assumption: data is read-only

Replication is required between Storage Elements (SEs)

In Grid environment

File transfer from User Interface and Computing Nodes to Storage resources

Upload of files into Grid

Page 4: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 4

Replication Tools

We have designed, developed and deployed two major replication packages:

GDMP - Grid Data Mirroring Package

edg-replica-manager

GDMP was a pioneer effort started initially in the CMS collaboration. It became later a joint project between EDGEDG and PPDGPPDG. It allows for mirroring of data between Storage Elements through a host subscription method.

edg-replica-manager deals with point-to-point single file replication. The tool is built around the Globus Replica Manager and Replica Catalogue/Replica Location Service libraries.

Page 5: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 5

Globus Replica Catalogor

Replica Location Service

Globus Replica Catalogor

Replica Location Service

GDMP in detail

StorageElement1StorageElement1 StorageElement3StorageElement3StorageElement2StorageElement2

GDMP client

Page 6: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 6

Subscription Model

All the sites that subscribe to a particular site get notified whenever there is an update in its catalog.

Site 1

Site 3

Site 2

Subscriberlist

Subscriberlist

subscribe subscribe

Page 7: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 7

SESE

SESE

SESE

Architecture Overview

GDMPGDMP

GDMPGDMP

GridFtpGridFtp

gdmp_replicate_get

fileA

Globus Replica Catalogor

Replica Location Service

Globus Replica Catalogor

Replica Location Service

MSSMSS

MSS

UIUIWNWN

GDMP ClientGDMP Client

GDMP ProsGDMP Pros• Very stable and scalable architecture• Reliable and robust replication

• retries on error• file checksumming• complex logging

• Users can control file transfer via local catalogues• Back-ends available for actions to be performed on replication (MSS hooks, automatic replication, post replication actions,…)• MSS interface

GDMP ConsGDMP Cons• It was designed to handle mirroring among sites and not for point-to-point replication• Several steps involved for replication• Configuration difficult: can be improved, with the introduction of new Grid services• No space management provided since it is responsibility of the SE service• Error messages not always clear• Some time recovery from errors requires manual intervention

Page 8: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 8

edg-replica-manager in detail

Extends the Globus replica manager

Only client side tool

Allows for replication (copy) and registering of files in RC works with LDAP based Globus Replica Catalog and Replica

Location Service

Keeps RC consistent with stored data.

Uses GDMP’s staging interface to stage to MSS

Page 9: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 9

SESE

SESE

SESE

GDMPGDMP

GDMPGDMP

GridFtpGridFtp

Architecture Overview

GDMP ClientGDMP Client

Globus Replica Catalogor

Replica Location Service

Globus Replica Catalogor

Replica Location Service

Edg-replica-managerEdg-replica-manager

fileBedg-rm-creg

fileA

Edg-rm/edg-rc ProsEdg-rm/edg-rc Pros

• User friendly interface• Functional• Third party transfer available• GSI authorization available for RM and RC• Easy configuration

Edg-rm/edg-rc ConsEdg-rm/edg-rc Cons• RM: Error messages not always clear• RM: No roll-back; no transactions• RM: No complete interface to schema• RC: Performance deterioration with number of entries• RC: Centralized, non-scalable• RC: No high level user CLI for browsing• RC: Schema non flexible

MSSMSS

MSS

Page 10: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 10

GDMP vs edg-replica-manager

GDMP Replicates sets of files

Replication between SEs only

Mass storage interface

logical file attributes (size, timestamp, etc. … extensible)

Subscription model

Event notification

CRC file size check

Support for Objectivity/DB

Automatic retries

Support for multiple VOs

Replica Manager Replicates single files

Replication between SEs, UI or CE to SE.

Uses GDMP’s Mass Storage interface at the SE

client-server

client side only

Page 11: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 11

History: Replication tool developmentGDMP 1.x

September 2000

First prototype of basic SE-SE replication of Objectivity files

Based on Globus 1.1.3

GDMP 2.x

October 2001

General file replication tools (not only Objectivity files)

Uses GridFTP + Globus Replica Catalog

Full Mass Storage Support

GDMP 3.x

April 2002

Split into client and server side tool

Improved server functionality/security

Support for multiple VO

Edg-replica-manager 1.x

May 2002

Based on globus-replica-management and globus-replica-catalog libs

Edg-replica-manager 2.x

December 2002

Several improvement – Replica Location Service binding

GDMP 3.2.x

October 2002

RLS + several improvements

GDMP 4.0

October 2002

Globus 2.2.4 + RH 7.3 gcc 2.95.2 + gcc 3.2

Page 12: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 12

Deployment

GDMP first used for High Level Trigger studies (“production”) of HEP experiments in 2000/2001

Replication between SEs

Later introduced also in European DataGrid testbed: Requirements changed:

All user commands needed to be executed from a User Interface machine or from Worker Nodes of Computing Element

Caused some redesign

Both tools (GDMP and edg-replica-manager) are used in European and US testbeds

EDGEDG: ATLAS, CMS, Alice and LHCb stress tests

WorldGridWorldGrid: first transatlantic testbed – interoperable tools

LCG-0LCG-0: deployed and interoperable with WorldGrid and GLUE testbeds

We thank our user community for valuable feedback

Page 13: Flavia Donno  (Former EDG WP2, LCG) Flavia.Donno@cern.ch

CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 13

Summary and Future Work

First generation of EDG replica management tools satisfy basic use case and requirements

Client-only tools are simple to use but no server side logging

Limitations of certain services proved: Globus and EDG working together to design and implement new tools

A lot of experience gained: new software tools under development (see talk “Next-Generation EU DataGrid Data Management Services “)

Thanks to the EU and our national funding agencies for their support of this work