51
网网网网Grid Computing 网网 Globus 网网网网网网 http://www.globus.org/

网格计算- Grid Computing 肖侬 Globus 数据管理服务

Embed Size (px)

Citation preview

Page 1: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Globus 数据管理服务

http://www.globus.org/

Page 2: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

数据管理服务• 数据传和访问

GASS: Provides services mainly intended for use with GRAM (file staging, I/O redirection)

GridFTP: Provides high-performance, reliable data transfer for modern WANs

• 数据复制和管理复制目录 : Provides a catalog service for keeping tra

ck of replicated datasets复制管理 : Provides services for creating and manag

ing replicated datasets

Page 3: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Globus - Gridftp

Page 4: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

公共数据访问协议目的• 已存在的分布数据存储系统

DPSS, HPSS: focus on high-performance access, utilize parallel data transfer, striping

DFS: focus on high-volume usage, dataset replication, local caching

SRB: connects heterogeneous data collections, uniform client interface, metadata queries

• 问题( 协议不兼容、不公开,各自独立的客户端)不兼容的协议和特性

• Each require custom client

• Partitions available data sets and storage devices每一个协议有所希望功能的一部分

Page 5: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

需要公共安全有效的数据访问协议•公共可扩展的传输协议

意味着可以互操作•从存储服务中分离低级的传输机制•优点 :

新的特殊的存储系统可以自动地和已经存在系统兼容已经存在系统具有丰富的数据传输功能

•和多个存储系统接口HPSS, DPSS, file systemsPlan for SRB integration

Page 6: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Globus 提出统一 GridFTP 协议• 基于 FTP

与大量已有工具兼容已经支持不少数据网格需要的特征,容易扩展 普遍认可、理解与支持

• 已有的规范 RFC 949: File Transfer Protocol RFC 2228: FTP Security Extensions RFC 2389: Feature Negotiation for the File Transfer Protocol

• GridFTP 包括什么?协议协议实现的一套工具

• GridFTP > FTP ,是 FTP 的超集, GridFTP 不仅仅限于文件传输

Page 7: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GridFTP: 基本方法• 从最通用的子集开始

Standard FTP: get/put etc., 3rd-party transfer•实现标准化但不经常使用的特性

GSS binding, extended directory listing, simple restart

•多个方面的扩展,但保持与已存在服务器互操作能力Striped/parallel data channels, partial file, automatic & manual TCP buffer setting, progress monitoring, extended restart

Page 8: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GridFTP 的特点

• 互操作,可扩展• 两个分离

底层数据传输机制和数据存储服务分开 将控制通道和数据通道分离

• 继承 FTP 的通用性和广泛性FTP is defined by several IETF ( Internet

Engineering Task Force ) RFCs

Page 9: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GridFtp 对 FTP 的扩展• 安全 GSI ( Public-key-based Grid Security Infrastruc

ture ) or Kerberos support (via GSS-API)GridFTP provides this capability by implementing the GSSA

PI authentication mechanisms defined by RFC 2228, “FTP Security Extensions”.

• 第三方控制数据传输 a “third-party” user or application at one site to initiate, moni

tor and control a data transfer operation between two other “parties”

• 并行数据传输using multiple TCP streams in parallel (even between the sam

e source and destination) can improve aggregate bandwidth over using a single TCP stream.

Page 10: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GridFtp 对 FTP 的扩展• 分片数据传输

use multiple TCP streams to transfer data that is partitioned among multiple servers. Striped transfers provide further bandwidth improvements over those achieved with parallel transfers.

• 部分文件传输transferring portions of files rather than complete files.

• 自动协商设置 TCP buffer/window 的大小Using optimal settings for TCP buffer/window sizes can hav

e a dramatic impact on data transfer performance.

• 数据可靠传输 Fault recovery methods for handling transient network fail

ures, server outagesrestarting failed transfers

Page 11: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GridFTP 协议实现• globus_ftp_control_library

control channel API• managing a GridFTP connection, including authentication, creation

of control and data channels, and reading and writing data over data channels

separate control and data channels

• globus_ftp_client_libraryGridFTP client API ( provides higher-level client featur

es on top of the globus_ftp_control library )• complete file get and put operations

• calls to set the level of parallelism for parallel data transfers

• partial file transfer operations

• third-party transfers

• set TCP buffer sizes.

Page 12: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬GridFTP at SC’2000: Long-Running Dallas-Chicago T

ransfer

SciNet Power Failure Other demos starting up

(Congestion)

Parallelism Increases (Demos)

Backbone problems on the SC Floor

DNS Problems

Transition between files (not zero due to averaging)

Page 13: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Striped GridFTP Server

Parallel File System (e.g. PVFS, PFS, etc.)

MPI-IO

Plug-in

Control

GridFTP Server Parallel BackendGridFTPservermaster

mpirun

GridFTPclient

Plug-in

Control

Plug-in

Control

Plug-in

Control…MPI (Comm_World)

MPI (Sub-Comm)

To Client or Another Striped GridFTP Server

Controlsocket

GridFTP Control Channel GridFTP Data Channels

Page 14: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

测试环境

Page 15: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

测试结果Striped servers at source location 8

Striped servers at destination location 8

Maximum simultaneous TCP streams per server 4

Maximum simultaneous TCP streams overall 32

Peak transfer rate over 0.1 seconds 1.55 Gbits/sec

Peak transfer rate over 5 seconds 1.03 Gbits/sec

Sustained transfer rate over 1 hour 512.9 Mbits/sec

Total data transferred in 1 hour 230.8 Gbytes

Page 16: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GridFTP 的带宽测试

Page 17: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Globus -GASS 的文件访问

Page 18: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GASS ( Global Access to Secondary Storage)

• GASS : Global Access to Secondary Storage ,是 Globus Toolkit 的一部分

• Remote I/O and Staging GRAM 可以通过 GASS 获取远地可执行程序

从远程访问文件 建立远程位置与 stdin/stdout/stderr 的联系

Page 19: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Global Access to Secondary Storage

(a) GASS file access APIReplace open/close with globus_gass_open/close; r

ead/write calls can then proceed directly

(b) RSL extensions URLs used to name executables, stdout, stderr

(c) Remote cache management utility

(d) Low-level APIs for specialized behaviors

Page 20: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GASS Architecture

CacheCache

GASS Server

HTTP Server

FTP Server

% globus-gass-cache

(c) Remote cache management

GRAM

(a) GASS file access API

&(executable=https://…)

(b) RSL extensions

(d) Low-level APIs for customizing cache & GASS server

main( ) { fd = globus_gass_open(…) … read(fd,…) … globus_gass_close(fd)}

Page 21: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GASS File Naming• 从程序的角度, GASS 的文件打开与关闭函数调用和相应的 Unix 标准 I/O 读写函数几乎一样,只是用 URL 取代了文件名 。

• URL encoding of resource nameshttps://quad.mcs.anl.gov:9991/~bester/myjob

protocol server address file name

• Other exampleshttps://pitcairn.mcs.anl.gov/tmp/input_dataset.1

https://pitcairn.mcs.anl.gov:2222/./output_data

• supports http & https,ftp & gridftp.

Page 22: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

三种访问方式• Read :读取包含稳定数据的整个文件,将整个文件 cache 到本地。有可能是多个用户同时进行

• Write :写入单个文件,对本地 cache 进行操作,文件关闭后才写回到远端。有可能是多个用户同时进行,使最后写的用户生效

• Append :对文件的添加,直接对远端文件的操作 , 远端立刻改变。允许多个用户同时操作,但并发写是隔行进行的

Page 23: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

File cache

• “File open” 将远程文件传输到本地 cache避免多个本地进程对同一文件的重复打开

• Cache 与用户相关,允许用户通过本地资源管理工具对其进行管理 程序通过 cache API 访问文件 cache

• 用户可以通过 GRAM 远程管理 cache一个用户可拥有多个 cache ,每个 Cache 对应一个条目,记录打开的数目,关闭减少数目,当数目为0 ,当该 cache 文件将被删除。

Page 24: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

三种安全认证方式

• 普通的匿名 ftp 、 http 方式,即没有认证

• 进程之间进行的 GSI 认证

• 未来还将发展到基于 SSL 认证的 ftp或 http 访问

Page 25: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

工作流程globus-job-run Local Machine

Remote Machine

DUROC

GASS Server

GRAMClient GSI

GRAMGatekeeper

GSI

GRAMJobManager

GASS Client

AppNexus

GRAMClient GSI

Multi-RSL

Request

Remote Machine

GRAMGatekeeper

GSI

GRAMJobManager

GASS Client

AppNexus

Single

RSL

Single

RSL

Parse

Page 26: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GASS RSL Extensions

• executable, stdin, stdout, stderr can be local files or URLs

• executable and stdin loaded into local cache before job begins (on front-end node)

• stdout, stderr handled via GASS append mode

• Cache cleaned after job completes

Page 27: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GASS/RSL Example

&(executable=https://quad:1234/~/myexe) (stdin=https://quad:1234/~/myin) (stdout=/home/bester/output) (stderr=https://quad:1234/dev/stdout)

Page 28: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GASS API

• GLOBUS GASS_CACHE API

• GLOBUS GASS FILE ACCESS API

• GLOBUS GASS SERVER API

• GLOBUS GASS SERVER_EZ API

• GLOBUS GASS CLIENT API

Page 29: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

GASS File Access API• Minimum changes to application

• globus_gass_open(), globus_gass_close()Same as open(), close() but use URLs instead

of filenamesCaches URL in case of multiple opensReturn descriptors to files in local cache or so

ckets to remote server

• globus_gass_fopen(), globus_gass_fclose()

Page 30: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Remove cachereference

Upload changes

Modified no

yes

globus_gass_open()/close()

Download Fileinto cache

open cached file,add cachereference

URL in cache? no

yes

globus_gass_open()globus_gass_close()

Page 31: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

globus_gass_transfer

• Common API for transferring remote files/data over various protocolshttp and https currently supportedftp will be supported in future release

• Supports put and get operations on an URL

• Allows for efficient transfer to/from files or direct to/from memory

• Allows any application to easily add customized file/data service capabilities

Page 32: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

globus_gass_copy

• Simple API for copying data from a source to a destinationURL used for source and destinationhttp(s), (gsi)ftp, fileWhen transferring from ftp to ftp, it uses 3rd par

ty transfer (I.e. client mediated, direct server-to-server transfer)

• globus-url-copy program is simple wrapper around the globus_gass_copy API

Page 33: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

globus-gass-server• Simple file server

Run by user wherever necessarySecure https protocol, using GSIAPIs for embedding server into other programs

• Exampleglobus-gass-server –r –w -t

-r: Allow files to be read from this server-w: Allow files to be written to this server-t: Tilde expand (~/… $(HOME)/…)-help: For list of all options

Page 34: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

globus_gass_server_ez

• Very simply API for adding file service to any applicationWrapper around globus_gass_transfer

• globusrun uses this module to support executable staging, stdout/err redirection, and remote file access

Page 35: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

1. Derive Contact String2. Build RSL string3. Startup GASS server4. Submit to request5. Return output

jobmanager

gatekeeper

program

GRAM & GASS: Putting It Together

stdout

GASS server

3

4

globus-job-run

Host name

Contactstring

1

RSLstring

2CommandLine Args

4

4

55

55

Page 36: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Globus Components In ActionLocal Machine

mpirun

globusrun

GRAM

ClientGSI

GRAM

ClientGSI

Remote Machine

AppNexus

AIX

PBS

MPI

grid-proxy-initX509UserCert

UserProxyCert

Machines

GRAM Gatekeeper

GSI

GRAM Job Manager

GASS Client

Remote Machine

AppNexus

Solaris

Unix Fork

MPI

GRAM Gatekeeper

GSI

GRAM Job Manager

GASS Client

RSL string

RSL multi-request

RSL single requestDUROC

GASS Server

RSL parser

Page 37: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Globus- Replica Management

Page 38: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

复制管理功能逻辑文件名- > 物理位置

• 创建 /删除复制数据(已有数据集全部或者部分 )

• 注册(新的数据集- >Replica Catalog )• 查询(用户 / 应用程序,特定文件或者文件集合对应的拷贝信息 )

• 选择(最合适的拷贝?信息服务提供存储与网络信息)

• 使用复制目录和 GridFTP 实现数据传输

Page 39: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

复制管理方法• 复制目录( Replica Catalog )

存放位置、文件以及数据集合信息 逻辑文件名到物理文件位置的转换 一个文件集合中有哪些逻辑文件

• Replica ManagementA set of services for registering files in the replica ca

talog, publishing files to locations, and adding/removing replicas at other locationsLocate and select replicas of filesUses Replica Catalog and GridFTP

Page 40: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬Globus Replica Catalog• 特点:将数据复制管理和元数据管理分开

(降低数据复制管理的难度) 支持不同元数据目录 元数据管理为数据复制提供详细信息的支持

• 包括的具体功能 注册

• Registering a list of files as a logical collection• Registering the physical location of a complete or partial r

eplica of a logical collection• Registering information about a particular logical file in a

logical collection 创建与修改

• Modifying the contents of registered entries in the catalog• creating new copies of a complete or partial collection of files

查询• Find all physical locations for a set of logical files in a logi

cal collection• List all the descriptie attributes associated with a registere

d logical collection, location or logical file

Page 41: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Replica Catalog 的作用• 跟踪记录一个逻辑文件的多个物理备份,建立从逻辑文件到多个物理文件的映射

• 维护一组逻辑文件名形成的组 collection

• 定位,从唯一的逻辑文件名到多个物理位置的映射

• 逻辑文件表项,存储单个逻辑文件的信息

Page 42: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Replica Management• Maintain a mapping between logical names for files

and collections and one or more physical locations

• Important for many applicationsExample: CERN HLT data

• Multiple petabytes of data per year

• Copy of everything at CERN (Tier 0)

• Subsets at national centers (Tier 1)

• Smaller regional centers (Tier 2)

• Individual researchers will have copies

Page 43: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Globus Replica Management

• Identify replica cataloging and reliable replication as two fundamental servicesLayer on other Grid services: GSI, transport,

information serviceUse LDAP as catalog format and protocol, for

consistencyUse as a building block for other tools

• AdvantageThese services can be used in a wide variety of

situations

Page 44: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Replica Manager Components• Replica catalog definition

LDAP object classes for representing logical-to-physical mappings in an LDAP catalog

• Low-level replica catalog APIglobus_replica_catalog libraryManipulates replica catalog: add, delete, etc.

• High-level reliable replication APIglobus_replica_manager libraryCombines calls to file transfer operations and calls to

low-level API functions: create, destroy, etc.

Page 45: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Replica Catalog Structure: A Climate Modeling Example

Logical File Parent

Logical File Jan 1998

Logical CollectionC02 measurements 1998

Replica Catalog

Locationjupiter.isi.edu

Locationsprite.llnl.gov

Logical File Feb 1998

Size: 1468762

Filename: Jan 1998Filename: Feb 1998…

Filename: Mar 1998Filename: Jun 1998Filename: Oct 1998Protocol: gsiftpUrlConstructor: gsiftp://jupiter.isi.edu/ nfs/v6/climate

Filename: Jan 1998…Filename: Dec 1998Protocol: ftpUrlConstructor: ftp://sprite.llnl.gov/ pub/pcmdi

Logical CollectionC02 measurements 1999

Page 46: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Replica Catalog Servicesas Building Blocks: Examples

• Combine with information service to build replica selection servicesE.g. “find best replica” using performance info from

NWS and MDSUse of LDAP as common protocol for info and replica

services makes this easier

• Combine with application managers to build data distribution servicesE.g., build new replicas in response to frequent

accesses

Page 47: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Relationship to Metadata Catalogs• Metadata services describe data contents

Have defined a simple set of object classes

• Must support a variety of metadata catalogsMCAT being one important exampleOthers include LDAP catalogs, HDF

• Community metadata catalogsAgree on set of attributesProduce names needed by replica catalog:

•Logical collection name

•Logical file name

Page 48: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

replica catalog 的具体实现

• 目前是用 Lightweight Directory Access Protocol (LDAP) 目录实现的

• 以后可能用数据库实现

Page 49: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Replica Catalog Directions

• Many data grid applications do not require tight consistency semanticsAt any given time, you may not be able to discover

all copiesWhen a new copy is made, it may not be

immediately recognized as available

• Allows for much more scalable designDistributed catalogs: local catalogs which maintain

their own LFN -> PFN mappingSoft-state updates as basis for building various

configurations of global catalogs

Page 50: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Data Transfer APIs• The globus_ftp_control API provides access to lo

w-level GridFTP control and data channel operations.

• The globus_ftp_client API provides typical GridFTP client operations.

• The globus_gass_copy API provides the ability to start and manage multiple data transfers using GridFTP, HTTP, local file, and memory operations.The globus-url-copy program is a thin wrapper around

this API

Page 51: 网格计算- Grid Computing 肖侬 Globus 数据管理服务

网格计算- Grid Computing 肖侬

Replica Management APIs

• The globus_replica_catalog API provides basic Replica Catalog operations.

• The globus_replica_management API (under development) combines GridFTP and the Replica Catalog to manage replicated datasets.