32
Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC Grid Computing at NIC September 2005 Achim Streit + Team [email protected]

Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team [email protected]

Embed Size (px)

Citation preview

Page 1: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülichin der Helmholtz-Gesellschaft

Grid Computing at NICGrid Computing at NIC

September 2005

Achim Streit + [email protected]

Page 2: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

2 Forschungszentrum Jülich

Grid Projects at FZJGrid Projects at FZJ

UNICORE 08/1997-12/1999 UNICORE Plus 01/2000-12/2002 EUROGRID 11/2000-01/2004 GRIP 01/2002-02/2004 OpenMolGRID 09/2002-02/2005

VIOLA 05/2004-04/2007 DEISA 05/2004-04/2009 UniGrids 07/2004-06/2006 NextGrid 09/2004-08/2007 CoreGRID 09/2004-08/2008 D-Grid 09/2005-02/2008

Page 3: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

3 Forschungszentrum Jülich

a vertically integrated Grid middleware system provides seamless, secure, and intuitive access to

distributed resources and data used in production and projects worldwide features

intuitive GUI with single sign-on, X.509 certificates for AA and job/data signing, only one opened port in firewall required, workflow engine for complex multi-site/multi-step workflows, job monitoring extensible application support with plug-ins, production quality, matured job monitoring, interactive access with UNICORE-SSH, integrated secure data transfer, resource management, full control of resources remains, production quality, ...

Page 4: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

4 Forschungszentrum Jülich

UsiteUsiteVsiteVsite

ArchitectureArchitecture

TSI

NJS

RMS

TSI

NJSAuthorization

GatewayAuthentication

opt. Firewall

Gateway

opt. Firewall

Client

Multi-Site Jobs

UUDB

SSL

Abstract

Non-Abstract

Disc RMS Disc

Vsite

TSI

NJS

RMS

UUDB

Disc

IDBIDB IDBIncarnation

opt. Firewall

Authorization

similar to Globus jobmanager fork LoadLeveler, (Open)PBS(Pro),

CCS, LSF, NQE/NQS, ... CONDOR, GT 2.4

similar to /etc/grid-security/grid-mapfile

Workflow-Engine Resource Management Job-Monitoring File Transfer User Management Application Support

Page 5: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

5 Forschungszentrum Jülich

UNICORE ClientUNICORE Client

Page 6: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

6 Forschungszentrum Jülich

UNICORE-SSHUNICORE-SSH uses standard UNICORE security mechanisms to

open a SSH connection through the standard SSH port

UNICORE-SSH button

Page 7: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

7 Forschungszentrum Jülich

Automate, integrate, and speed-up drug discovery in pharmaceutical industry

University of Ulster: Data Warehouse

University of Tartu:Compute Resources

FZ Jülich:Grid Middleware

ComGenex Inc.:Data, User

Instituto di Ricerche Farmacologiche“Mario Negri”:User

Descriptors

QSAR

3D Output

< 2 hoursEPAECOTOXDatabase

Descriptors

QSAR

3D Output

> 5 days2802D

structuresdownloaded

Workflow Automation & Speed-upWorkflow Automation & Speed-up

Page 8: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

8 Forschungszentrum Jülich

Workflow Automation & Speed-upWorkflow Automation & Speed-up

automatic split-up of data-parallel task

Page 9: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

9 Forschungszentrum Jülich

Open Source under BSD license Supported by FZJ

Integration of own results andfrom other projects

Release Management Problem tracking CVS, Mailing Lists Documentation Assistance

Viable basis for many projects DEISA, VIOLA, UniGrids, D-Grid, NaReGI

http://unicore.sourceforge.net

@@

Page 10: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

10 Forschungszentrum Jülich

From Testbed to ProductionFrom Testbed to Production

Success factor: vertical integration

2005 Different communities Different computing resources

(super computers, clusters, …) Know-how in Grid middleware

2002

Page 11: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

11 Forschungszentrum Jülich

National high-performance computing centre “John von Neumann Institute for Computing”

About 650 users in 150 research projects Access via UNICORE to

IBM p690 eSeries Cluster (1312 CPUs, 8.9 TFlops) IBM BlueGene/L (2048 CPUs, 5.7 TFlops) Cray XD1 (72+ CPUs)

116 active UNICORE users 72 external, 44 internal

Resource usage (CPU-hours) Dec 18.4%, Jan 30.4%, Feb 30.5%, Mar 27.1%, Apr 29.7%,

May 39.1%, Jun 22.3%, Jul 20.2%, Aug 29.0%

ProductionProduction

Page 12: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

12 Forschungszentrum Jülich

Grid InteroperabilityGrid InteroperabilityUNICORE – Globus Toolkit

Uniform Interface to Grid ServicesUniform Interface to Grid ServicesOGSA-based UNICORE/GS

WSRF-Interoperability

Page 13: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

13 Forschungszentrum Jülich

Globus 2

UNICORE

TSI

GridFTP Client

Architecture: UNICORE jobs on Architecture: UNICORE jobs on GLOBUS resourcesGLOBUS resources

ClientNJS

UUDB

Uspace

IDB

GRAM Client

GRAM Job-Manager GridFTP Server

RMS

GRAM Gatekeeper

Gateway

MDS

Page 14: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Consortium

Research Center Jülich (project manager)

Consorzio Interuniversitario per il Calcolo Automatico dell’Italia Nord Orientale

Fujitsu Laboratories of Europe

Intel GmbH

University of Warsaw

University of Manchester

T-Systems SfR

Funded by EU grant: IST-2002-004279

Page 15: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Web Services

Unicore/GS ArchitectureUnicore Component

New Component

Web Services Interface

Access Unicore Components as Web Services

Integrate Web Services into the Unicore workflow

NetworkJob

Supervisor

Unicore ClientTSI

OGSA Server A

OGSA Server B

ResourceDatabase

ResourceBroker

UserDatabase

OGSA Client

UniGrids Portal

UnicoreGateway

Page 16: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

UNICORE basic functionsSite Management (TSF/TSS)

♦ Compute Resource Factory♦ Submit, Resource Information

Job Management (JMS)♦ Start, Hold, Abort, Resume

Storage Management (SMS)♦ List directory, Copy,

Make directory,Rename, Remove

File Transfer (FTS)♦ File import, file export

StandardizationJSDL WG Revitalized by UniGrids and NAREGIAtomic Services are input to the OGSA-BES WG

Atomic Services

TSF

WSRF

TSS

WSRF

JMS

WSRF

SMS

WSRF

FTS

Page 17: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Three levels of interoperability

Level 1: Interoperability between WSRF services

UNICORE/GS passed the official WSRF interop test

GPE and JOGSA hosting environments succesfully tested against UNICORE/GS and other endpoints

WSRF specification will be finalized soon!♦ Currently: UNICORE/GS: WSRF 1.3, GTK: WSRF 1.2 draft 1

WSRF Hosting EnvironmentJOGSA-HE

GPE-HEGTK4-HE

UNICORE/GS-HE

WSRF Service API JOGSAGTK4UNICORE/GS GPE

Atomic ServicesCGSP

Advanced services

GPE-WorkflowUoM-Broker

GPE-Registry

GTK4UNICORE/GS

Page 18: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Three levels of interoperability

Level 2: Interoperability between atomic service implementations

Client API hides details about WSRF hosting environment

Client code will work with different WSRF implementations and WSRF versions if different stubs are being used at the moment!

Atomic ServicesCGSP

WSRF Hosting EnvironmentJOGSA-HE

GPE-HE

WSRF Service API

Advanced services

GTK4-HE

GPE-WorkflowUoM-Broker

GPE-Registry

UNICORE/GS-HE

GTK4

JOGSAGTK4UNICORE/GS

Atomic Service Client API GPE

GPE

UNICORE/GS

Clients Higher-level services

Portal

Visit

Apps

Expert

Page 19: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Atomic ServicesCGSP

WSRF Hosting EnvironmentJOGSA-HE

GPE-HE

Three levels of interoperability

Level 3: GridBeans working on top of different Client implementations

Independent of atomic service implementations

Independent of specification versions being used

GridBean run on GTK or UNICORE/GS without modifications

GridBeans survive version changes in the underlying layers and are easy to maintain

Atomic Service Client API

Clients Higher-level services

WSRF Service API

Advanced services

GTK4-HE

Portal

GPE-WorkflowUoM-Broker

GPE-Registry

GridBeans

POVRay

PDBSearch

Compiler

Gaussian

CPMD

UNICORE/GS-HE

GTK4

JOGSAGTK4UNICORE/GS

GPE

GridBean API GPE

GPE

Visit

UNICORE/GS

Apps

Expert

Page 20: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

20

ConsortiumConsortium

DEISA is a consortium of leading national supercomputer centers in Europe

IDRIS – CNRS, France

FZJ, Jülich, Germany

RZG, Garching, Germany

CINECA, Bologna, Italy

EPCC, Edinburgh, UK

CSC, Helsinki, Finland

SARA, Amsterdam, The Netherlands

HLRS, Stuttgart, Germany

BSC, Barcelona, Spain

LRZ, Munich, Germany

ECMWF (European Organization), Reading, UK

Granted by: European Union FP6 Grant period: May, 1st 2004 – April, 30th 2008

Page 21: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

21

DEISA objectivesDEISA objectives

To enable Europe’s terascale science by the integration of Europe’s most powerful supercomputing systems.

Enabling scientific discovery across a broad spectrum of science and technology is the only criterion for success

DEISA is an European Supercomputing Service built on top of existing national services.

DEISA deploys and operates a persistent, production quality, distributed, heterogeneous supercomputing environment with continental scope.

Page 22: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

22

Basic requirements and strategies for the Basic requirements and strategies for the DEISA research InfrastructureDEISA research Infrastructure

Fast deployment of a persistent, production quality, grid empowered supercomputing infrastructure with continental scope.

European supercomputing service built on top of existing national services requires reliability and non disruptive behavior.

User and application transparency

Top-down approach: technology choices result from the business and operational models of our virtual organization. DEISA technology choices are fully open.

Page 23: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

23

The DEISA supercomputing Grid: The DEISA supercomputing Grid: A layered infrastructureA layered infrastructure

Inner layer: a distributed super-cluster resulting from the deep integration of similar IBM AIX platforms at IDRIS, FZ-Jülich, RZG-Garching and CINECA (phase 1) then CSC (phase 2). It looks to external users as a single supercomputing platform.

Outer layer: a heterogeneous supercomputing Grid: IBM AIX super-cluster (IDRIS, FZJ, RZG, CINECA, CSC) close to

24 Tf BSC, IBM PowerPC Linux system, 40 Tf LRZ, Linux cluster (2.7 Tf) moving to SGI ALTIX system (33 Tf in

2006, 70 Tf in 2007) SARA, SGI ALTIX Linux cluster, 2.2 Tf ECMWF, IBM AIX system, 32 Tf HLRS, NEC SX8 vector system, close to 10 Tf

Page 24: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

24

Logical view of the Logical view of the phase 2 DEISA networkphase 2 DEISA network

DFN

RENATER

GARR

GÈANT

SURFnet

UKERNA

RedIRIS

FUnet

Page 25: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

25

AIX Super-Cluster May 2005AIX Super-Cluster May 2005

CSC

ECMWF

ServicesServices:

High performance datagrid via GPFSAccess to remote files use the fullavailable network bandwidth

Job migration across sitesUsed to load balance the global workflow when a huge partition is allocated to a DEISA project in one site

Common Production Environment

Page 26: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

26

Service ActivitiesService Activities SA1 – Network Operation and Support (FZJ)

Deployment and operation of a gigabit per second network infrastructure for an European distributed supercomputing platform. Network operation and optimization during project activity.

SA2 – Data Management with Global File Systems (RZG) Deployment and operation of global distributed file systems, as basic

building blocks of the “inner” super-cluster, and as a way of implementing global data management in a heterogeneous Grid.

SA3 – Resource Management (CINECA) Deployment and operation of global scheduling services for the European

super-cluster, as well as for its heterogeneous Grid extension. SA4 – Applications and User Support (IDRIS)

Enabling the adoption by the scientific community of the distributed supercomputing infrastructure, as an efficient instrument for the production of leading computational science.

SA5 – Security (SARA) Providing administration, authorization and authentication for a

heterogeneous cluster of HPC systems, with special emphasis on single sign-on.

Page 27: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

27

DEISA Supercomputing Grid DEISA Supercomputing Grid servicesservices

Workflow management: based on UNICORE plus further extensions and services coming from DEISA’s JRA7 and other projects (UniGrids, …)

Global data management: a well defined architecture implementing extended global file systems on heterogeneous systems, fast data transfers across sites, and hierarchical data management at a continental scale.

Co-scheduling: needed to support Grid applications running on the heterogeneous environment.

Science Gateways and portals: specific Internet interfaces to hide complex supercomputing environments from end users, and facilitate the access of new, non traditional, scientific communities.

Page 28: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

28

CPU GPFS CPU GPFS CPU GPFS CPU GPFS CPU GPFS

+ NRENs

ClientClient

Job

Data

Job-workflow:1) FZJ2) CINECA3) RZG4) IDRIS5) SARA

Workflow Application with UNICOREWorkflow Application with UNICOREGlobal Data Management with GPFSGlobal Data Management with GPFS

Page 29: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

Forschungszentrum Jülich

29

Workflow Application with UNICOREWorkflow Application with UNICOREGlobal Data Management with GPFSGlobal Data Management with GPFS

Page 30: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

30 Forschungszentrum Jülich

Usage in other ProjectsUsage in other Projects

UNICORE as basic middleware for research and development Development of UNICONDORE interoperability layer (UNICORE

CONDOR) Access to about 3000 CPUs with approx. 17 TFlops peak in the NaReGI

testbed

UNICORE is used in the Core-D-Grid Infrastructure Development of tools for (even) easier installation and configuration of

client and server components

Integration ProjectIntegration Project

Page 31: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

31 Forschungszentrum Jülich

SummarySummary

establishes a seamless access to Grid resources and data designed as a vertically integrated Grid Middleware provides matured workflow capabilities used in production at NIC and in the DEISA infrastructure available as Open Source from

http://unicore.sourceforge.net used in research projects worldwide continuously enhanced by an international expert team of

Grid developers currently transformed in the Web Services world towards

OGSA and WSRF compliance

Page 32: Forschungszentrum Jülich in der Helmholtz-Gesellschaft Grid Computing at NIC September 2005 Achim Streit + Team a.streit@fz-juelich.de

32 Forschungszentrum Jülich

October 11–12, 2005ETSI Headquarters, Sophia Antipolis, France

http://summit.unicore.org/2005

In conjunction withGrids@work: Middleware, Components, Users, Contest and Plugtests

http://www.etsi.org/plugtests/GRID.htm

Supported by