Upload
minna
View
61
Download
1
Embed Size (px)
DESCRIPTION
Grid Data Management in Action Experience in Running and Supporting Data Management Services in the EU DataGrid Project. Flavia Donno (Former EDG WP2, LCG) [email protected]. http://chep03.ucsd.edu/files/249.ppt. Talk Outline. Introduction Replication Tools Architecture Overview - PowerPoint PPT Presentation
Citation preview
DataGrid is a project funded by the European Union CHEP 2003 – 24-28 March 2003 – Title – n° 1
Grid Data Management in Action Experience in Running and Supporting Data Management Services in the EU DataGrid Project
Flavia Donno (Former EDG WP2, LCG)
http://chep03.ucsd.edu/files/249.ppt
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 2
Talk Outline
Introduction
Replication Tools
Architecture Overview
GDMP and edg-replica-manager details
History and Deployment
Summary and Future Work
AuthorsHeinz Stockinger – CERN/EP, CMSFlavia Donno, CERN/IT LCG and INFN PisaErwin Laure, Shahzad Muzaffar – CERN/EPGiuseppe Andronico – INFN CataniaPeter Kunszt - CERN/ITPaul Millar - PPARC
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 3
Introduction
Data management: large amounts of data at distributed sites
SE SE
SE
Assumption: data is read-only
Replication is required between Storage Elements (SEs)
In Grid environment
File transfer from User Interface and Computing Nodes to Storage resources
Upload of files into Grid
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 4
Replication Tools
We have designed, developed and deployed two major replication packages:
GDMP - Grid Data Mirroring Package
edg-replica-manager
GDMP was a pioneer effort started initially in the CMS collaboration. It became later a joint project between EDGEDG and PPDGPPDG. It allows for mirroring of data between Storage Elements through a host subscription method.
edg-replica-manager deals with point-to-point single file replication. The tool is built around the Globus Replica Manager and Replica Catalogue/Replica Location Service libraries.
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 5
Globus Replica Catalogor
Replica Location Service
Globus Replica Catalogor
Replica Location Service
GDMP in detail
StorageElement1StorageElement1 StorageElement3StorageElement3StorageElement2StorageElement2
GDMP client
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 6
Subscription Model
All the sites that subscribe to a particular site get notified whenever there is an update in its catalog.
Site 1
Site 3
Site 2
Subscriberlist
Subscriberlist
subscribe subscribe
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 7
SESE
SESE
SESE
Architecture Overview
GDMPGDMP
GDMPGDMP
GridFtpGridFtp
gdmp_replicate_get
fileA
Globus Replica Catalogor
Replica Location Service
Globus Replica Catalogor
Replica Location Service
MSSMSS
MSS
UIUIWNWN
GDMP ClientGDMP Client
GDMP ProsGDMP Pros• Very stable and scalable architecture• Reliable and robust replication
• retries on error• file checksumming• complex logging
• Users can control file transfer via local catalogues• Back-ends available for actions to be performed on replication (MSS hooks, automatic replication, post replication actions,…)• MSS interface
GDMP ConsGDMP Cons• It was designed to handle mirroring among sites and not for point-to-point replication• Several steps involved for replication• Configuration difficult: can be improved, with the introduction of new Grid services• No space management provided since it is responsibility of the SE service• Error messages not always clear• Some time recovery from errors requires manual intervention
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 8
edg-replica-manager in detail
Extends the Globus replica manager
Only client side tool
Allows for replication (copy) and registering of files in RC works with LDAP based Globus Replica Catalog and Replica
Location Service
Keeps RC consistent with stored data.
Uses GDMP’s staging interface to stage to MSS
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 9
SESE
SESE
SESE
GDMPGDMP
GDMPGDMP
GridFtpGridFtp
Architecture Overview
GDMP ClientGDMP Client
Globus Replica Catalogor
Replica Location Service
Globus Replica Catalogor
Replica Location Service
Edg-replica-managerEdg-replica-manager
fileBedg-rm-creg
fileA
Edg-rm/edg-rc ProsEdg-rm/edg-rc Pros
• User friendly interface• Functional• Third party transfer available• GSI authorization available for RM and RC• Easy configuration
Edg-rm/edg-rc ConsEdg-rm/edg-rc Cons• RM: Error messages not always clear• RM: No roll-back; no transactions• RM: No complete interface to schema• RC: Performance deterioration with number of entries• RC: Centralized, non-scalable• RC: No high level user CLI for browsing• RC: Schema non flexible
MSSMSS
MSS
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 10
GDMP vs edg-replica-manager
GDMP Replicates sets of files
Replication between SEs only
Mass storage interface
logical file attributes (size, timestamp, etc. … extensible)
Subscription model
Event notification
CRC file size check
Support for Objectivity/DB
Automatic retries
Support for multiple VOs
Replica Manager Replicates single files
Replication between SEs, UI or CE to SE.
Uses GDMP’s Mass Storage interface at the SE
client-server
client side only
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 11
History: Replication tool developmentGDMP 1.x
September 2000
First prototype of basic SE-SE replication of Objectivity files
Based on Globus 1.1.3
GDMP 2.x
October 2001
General file replication tools (not only Objectivity files)
Uses GridFTP + Globus Replica Catalog
Full Mass Storage Support
GDMP 3.x
April 2002
Split into client and server side tool
Improved server functionality/security
Support for multiple VO
Edg-replica-manager 1.x
May 2002
Based on globus-replica-management and globus-replica-catalog libs
Edg-replica-manager 2.x
December 2002
Several improvement – Replica Location Service binding
GDMP 3.2.x
October 2002
RLS + several improvements
GDMP 4.0
October 2002
Globus 2.2.4 + RH 7.3 gcc 2.95.2 + gcc 3.2
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 12
Deployment
GDMP first used for High Level Trigger studies (“production”) of HEP experiments in 2000/2001
Replication between SEs
Later introduced also in European DataGrid testbed: Requirements changed:
All user commands needed to be executed from a User Interface machine or from Worker Nodes of Computing Element
Caused some redesign
Both tools (GDMP and edg-replica-manager) are used in European and US testbeds
EDGEDG: ATLAS, CMS, Alice and LHCb stress tests
WorldGridWorldGrid: first transatlantic testbed – interoperable tools
LCG-0LCG-0: deployed and interoperable with WorldGrid and GLUE testbeds
We thank our user community for valuable feedback
CHEP 2003 – 24-28 March 2003 – Grid Data Management in Action – n° 13
Summary and Future Work
First generation of EDG replica management tools satisfy basic use case and requirements
Client-only tools are simple to use but no server side logging
Limitations of certain services proved: Globus and EDG working together to design and implement new tools
A lot of experience gained: new software tools under development (see talk “Next-Generation EU DataGrid Data Management Services “)
Thanks to the EU and our national funding agencies for their support of this work