Download ppt - G. Alonso, W. Bausch, C. Pautasso Information and Communication Systems Research Group Department of Computer Science Swiss Federal Institute of Technology,

G. Alonso, W. Bausch, C. Pautasso Information and Communication Systems Research GroupDepartment of Computer ScienceSwiss Federal Institute of Technology, Zurich (ETHZ)

©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.

2

Who are we? Information and Communication Research Group. Our

expertise: High performance, high availability systems (e.g., data

replication, backup architectures, parallel data processing)

Information systems architecture (mainframe or cluster based)

Large data repositories (e.g., HEDC data warehouse for the HESSI satellite)

Large distributed systems (e.g., enterprise middleware, workflow, internet electronic commerce)

Our interests: System architectures for applications with

unconventional requirements performance: cluster computing and parallel

processing availability: long term fault tolerance, hot backups communications: ubiquitous computing, mobile ad-hoc

networks, internet


3

What do we do? Until recently, our work has been in the area of high

performance/cluster/parallel computing in the IT world high performance transaction processing systems

(throughput, availability) very large data repositories (scalability, evolution) data processing applications (large scale data mining)

In the last years we have started several projects in scientific applications:

HEDC: a multi-terabyte data repository for NASA with the capability to store and process astrophysics data; data production rate is 1 GB per day (raw); data goes on line 24 hours after delivery from the satellite, fully preprocessed, cleaned, catalogued, indexed, with relevant events identified (solar flares, gamma ray bursts, non solar events, etc.), and an extended catalogue of analyses for the events of interest. Users can browse, produce more analyses, request batch processing jobs, perform low resolution analyses, retrieve and store data, etc.

BioOpera: a process based development and run time environment for parallel computations over computer clusters for bio-informatic applications


4

Cluster computing Question: “Will cluster computing penetrate the IT

market?” (next talk …) Our answer: it did a long time ago but we do not

benchmark our clusters in terms of Gflops and there is no Top500 list. Examples:

eBay: cluster of several hundred nodes for high performance transaction processing, fully duplicated for availability purposes

Google: several thousand nodes used for parallel data processing (4 weeks to retrieve, sort, classify, index and store the entire contents of the web)

TPC-C results Compaq ProLiant 8500-700-192P• Server: 24 nodes (8 x Pentium III Xeon 700MHz,

8 GB RAM)• Clients: 48 nodes (2 x Pentium III 800 Mhz, 521

MB RAM)• Disks: > 2500 SCSI Drives• Total storage capacity: > 42 TB


5

Grid computing Similarly, we have an IT version of (data or

computing) grids: virtual business processes

Virtual Enterprise Process

WWW CatalogueEnterprise X

Activity, behavior, price

Enterprise YActivity, behavior, price

. . .

Posting

Trading Community

IvyFrame

Definition

WISE EnginePersistenceSecurityQ 0f SExec. GuaranteesAvailabilityAwareness Model

Enactment

Compilation


6

Complex computing environments

Erosion Model

Slope Length

Slope Angle

Elevation Samples

Soil Samples

Satellite Images

Vegetation Samples

Rainfall Records

Vegetation Model

Spatial Interpolation Model

Spatial Rainfall Data

Digital Elevation

Reconstruction

Topo-Map Slope Analysis

Vegetation Cover Vegetation

EvolutionModel

Estimated Rainfall

Precipitation Model

Storm and Discharge Models

VegetationChange

SoilErosion

Hydrograph


7

High level componentsCode-GranularityLarge grain(application level)Program

Medium grain(operating system level)Function (thread or process)

Fine grain(compiler level)Loops

Very fine grain(instruction level)With hardware

Task i-l Task i+1

func1 ( ){........}

func2 ( ){........}

func3 ( ){........}

a ( 0 ) =..b ( 0 ) =..

a ( 1 )=..b ( 1 )=..

a ( 2 )=..b ( 2 )=..

+ x Load

Tasks

Threads/Processes

Compilers

CPU

Task i


8

Process Management Systems

Processes are the way to build distributed computations over clusters

Process Description of resources

(programs, applications, data), specification of control and data flow

Process Management System Management of distributed

resources (scheduling, load balancing, parallelism, data transfer, etc.). An operating system at a coarse granularity?

The use of databases reduces by about 50 % the code of large applications. Processes can do the same for large scale, distributed, systems based on heterogeneous components.


9

Workflow Services Modelling Scheduling Fault tolerance

Dist. Services Ext. Applications Load balancing Process migration

Advanced Services Atomicity Recovery

Kernel based system instead of complete application

Which functionality needs to be implemented in the kernel and which functionality is application specific is an open research area.

Open Process Enginefor Reliable Activities

OPERA

Persistence Logging Query interfaces

Expanded Services Exception handling Event Services

Opera


10

OPERA KERNEL

OPERA

Open Process Enginefor Reliable Activities

Modeling and Compiler

Interface for Data and Applications Metadata and

Tools

Application specific extensions


11

BioOperaTYPICAL COMPUTING ENVIRONMENTS IN BIO-

INFORMATICS Multiple step complex computations over

heterogeneous applications algorithmic complexity (most of them NP-hard, NP-

complete) large input / output data sets (exponential is not

uncommon) complex data dependencies and control flow long lived computations over unreliable platforms

Current programming environments are very primitive: support for single algorithms (complex computations

put together by hand or very low level scripts) it is difficult to specify, run, keep track and

reproduce computations no automated support, computations are manually

maintained


12

Programming toolMonitoring tool

BioOpera

Process front-end Application front-end

Data visualizationSimilarity search

Lineage infoRecovery info

Awareness info

Processoutput

Bioinformatics Application

DARWIN

Process EnactmentProcess MonitoringSystem Awareness

BioOpera: A Process Support SystemBIOPERA programming environment

for multiple step computations over heterogeneous applications:

program invocations, control & data flow

parallelism (data, search space)

failure behaviour Run time environment for

processes navigation distribution / load

balancing persistence / recovery

Monitor and interaction tools

at process level at task level start, stop, resume,

cancel, retry, modify parameters, etc.


13

Instancespace

Dataspace

Data Layer

Execution Layer

Templatespace

Configspace

Navigator Dispatcher

recoveryprocess

interpreter scheduling distributionload

balancing

Interface layer(remote)

Executionclient

Load monitor

Darwin

Interface layer(local)

Activityqueue

Pro

cess

Su

pp

ort

Sys

tem

F

ron

ten

d

Bio

Op

era

clie

nt

Bio

Op

era

serv

er

Cluster on a LAN

process spec.

OCR Compiler

translation

BioOpera: Architecture


14

The All vs. All Problem: Compute a

complete cross-comparison of the SwissProt v38 protein database

SwissProt v38 80‘000 entries

average seq. length: 400 residues

A complete cross-comparison (using state of the art methods) runs 2 years (cpu time) on a single PC

Currently, doing this in a much smaller scale takes several months (just updates)

Task 1User Input

Task 2Parallel

PreProcessing

Task 3All vs. All

Task 4Merge

Database,Output file

InputParams

OutputFiles

MasterResult

Raw Ref

Execution Unit

n times


15

The All vs. All lifecycle

0

10

20

30

Dec

17

Dec

18

Dec

19

Dec

20

Dec

21

Dec

22

Dec

23

Dec

24

Dec

25

Dec

26

Dec

27

Dec

28

Dec

29

Dec

30

Dec

31

Ja

n 0

1

Ja

n 0

2

Ja

n 0

3

Ja

n 0

4

Ja

n 0

5

Ja

n 0

6

Ja

n 0

7

Ja

n 0

8

Ja

n 0

9

Ja

n 1

0

Ja

n 1

1

Ja

n 1

2

Ja

n 1

3

Ja

n 1

4

Ja

n 1

5

Ja

n 1

6

Ja

n 1

7

Ja

n 1

8

Ja

n 1

9

Ja

n 2

0

Ja

n 2

1

0

10

20

30

Nu

mb

er of p

rocess

ors

Available processors

Running jobs

Succesful jobs

Failed jobs

Jobs on CPU

Other user

needs cluster

Disk space

shortage

Cluster busy with other jobs

Server down for maintenance

Cluster failure

All jobs manually

killed


16

Conclusions BioOpera offers:

a better way to program complex computations over clusters,

persistent computations, recoverability, monitoring and querying facilities, a scalable architecture that can be applied both in

clusters and in grids BioOpera is not:

a parallel programming language or a message passing library (too low level),

a scripting language (too inflexible), a workflow tool (too oriented to human activities)

BioOpera is extensible: accepts different process programming models, monitoring and querying facilities can be expanded and

changed, BioOpera servers can be combined to form clusters of

clusters