G. Alonso, W. Bausch, C. Pautasso Information and Communication Systems Research GroupDepartment of Computer ScienceSwiss Federal Institute of Technology, Zurich (ETHZ)
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
2
Who are we? Information and Communication Research Group. Our
expertise: High performance, high availability systems (e.g., data
replication, backup architectures, parallel data processing)
Information systems architecture (mainframe or cluster based)
Large data repositories (e.g., HEDC data warehouse for the HESSI satellite)
Large distributed systems (e.g., enterprise middleware, workflow, internet electronic commerce)
Our interests: System architectures for applications with
unconventional requirements performance: cluster computing and parallel
processing availability: long term fault tolerance, hot backups communications: ubiquitous computing, mobile ad-hoc
networks, internet
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
3
What do we do? Until recently, our work has been in the area of high
performance/cluster/parallel computing in the IT world high performance transaction processing systems
(throughput, availability) very large data repositories (scalability, evolution) data processing applications (large scale data mining)
In the last years we have started several projects in scientific applications:
HEDC: a multi-terabyte data repository for NASA with the capability to store and process astrophysics data; data production rate is 1 GB per day (raw); data goes on line 24 hours after delivery from the satellite, fully preprocessed, cleaned, catalogued, indexed, with relevant events identified (solar flares, gamma ray bursts, non solar events, etc.), and an extended catalogue of analyses for the events of interest. Users can browse, produce more analyses, request batch processing jobs, perform low resolution analyses, retrieve and store data, etc.
BioOpera: a process based development and run time environment for parallel computations over computer clusters for bio-informatic applications
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
4
Cluster computing Question: “Will cluster computing penetrate the IT
market?” (next talk …) Our answer: it did a long time ago but we do not
benchmark our clusters in terms of Gflops and there is no Top500 list. Examples:
eBay: cluster of several hundred nodes for high performance transaction processing, fully duplicated for availability purposes
Google: several thousand nodes used for parallel data processing (4 weeks to retrieve, sort, classify, index and store the entire contents of the web)
TPC-C results Compaq ProLiant 8500-700-192P• Server: 24 nodes (8 x Pentium III Xeon 700MHz,
8 GB RAM)• Clients: 48 nodes (2 x Pentium III 800 Mhz, 521
MB RAM)• Disks: > 2500 SCSI Drives• Total storage capacity: > 42 TB
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
5
Grid computing Similarly, we have an IT version of (data or
computing) grids: virtual business processes
Virtual Enterprise Process
WWW CatalogueEnterprise X
Activity, behavior, price
Enterprise YActivity, behavior, price
. . .
Posting
Trading Community
IvyFrame
Definition
WISE EnginePersistenceSecurityQ 0f SExec. GuaranteesAvailabilityAwareness Model
Enactment
Compilation
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
6
Complex computing environments
Erosion Model
Slope Length
Slope Angle
Elevation Samples
Soil Samples
Satellite Images
Vegetation Samples
Rainfall Records
Vegetation Model
Spatial Interpolation Model
Spatial Rainfall Data
Digital Elevation
Reconstruction
Topo-Map Slope Analysis
Vegetation Cover Vegetation
EvolutionModel
Estimated Rainfall
Precipitation Model
Storm and Discharge Models
VegetationChange
SoilErosion
Hydrograph
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
7
High level componentsCode-GranularityLarge grain(application level)Program
Medium grain(operating system level)Function (thread or process)
Fine grain(compiler level)Loops
Very fine grain(instruction level)With hardware
Task i-l Task i+1
func1 ( ){........}
func2 ( ){........}
func3 ( ){........}
a ( 0 ) =..b ( 0 ) =..
a ( 1 )=..b ( 1 )=..
a ( 2 )=..b ( 2 )=..
+ x Load
Tasks
Threads/Processes
Compilers
CPU
Task i
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
8
Process Management Systems
Processes are the way to build distributed computations over clusters
Process Description of resources
(programs, applications, data), specification of control and data flow
Process Management System Management of distributed
resources (scheduling, load balancing, parallelism, data transfer, etc.). An operating system at a coarse granularity?
The use of databases reduces by about 50 % the code of large applications. Processes can do the same for large scale, distributed, systems based on heterogeneous components.
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
9
Workflow Services Modelling Scheduling Fault tolerance
Dist. Services Ext. Applications Load balancing Process migration
Advanced Services Atomicity Recovery
Kernel based system instead of complete application
Which functionality needs to be implemented in the kernel and which functionality is application specific is an open research area.
Open Process Enginefor Reliable Activities
OPERA
Persistence Logging Query interfaces
Expanded Services Exception handling Event Services
Opera
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
10
OPERA KERNEL
OPERA
Open Process Enginefor Reliable Activities
Modeling and Compiler
Interface for Data and Applications Metadata and
Tools
Application specific extensions
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
11
BioOperaTYPICAL COMPUTING ENVIRONMENTS IN BIO-
INFORMATICS Multiple step complex computations over
heterogeneous applications algorithmic complexity (most of them NP-hard, NP-
complete) large input / output data sets (exponential is not
uncommon) complex data dependencies and control flow long lived computations over unreliable platforms
Current programming environments are very primitive: support for single algorithms (complex computations
put together by hand or very low level scripts) it is difficult to specify, run, keep track and
reproduce computations no automated support, computations are manually
maintained
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
12
Programming toolMonitoring tool
BioOpera
Process front-end Application front-end
Data visualizationSimilarity search
Lineage infoRecovery info
Awareness info
Processoutput
Bioinformatics Application
DARWIN
Process EnactmentProcess MonitoringSystem Awareness
BioOpera: A Process Support SystemBIOPERA programming environment
for multiple step computations over heterogeneous applications:
program invocations, control & data flow
parallelism (data, search space)
failure behaviour Run time environment for
processes navigation distribution / load
balancing persistence / recovery
Monitor and interaction tools
at process level at task level start, stop, resume,
cancel, retry, modify parameters, etc.
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
13
Instancespace
Dataspace
Data Layer
Execution Layer
Templatespace
Configspace
Navigator Dispatcher
recoveryprocess
interpreter scheduling distributionload
balancing
Interface layer(remote)
Executionclient
Load monitor
Darwin
Interface layer(local)
Activityqueue
Pro
cess
Su
pp
ort
Sys
tem
F
ron
ten
d
Bio
Op
era
clie
nt
Bio
Op
era
serv
er
Cluster on a LAN
process spec.
OCR Compiler
translation
BioOpera: Architecture
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
14
The All vs. All Problem: Compute a
complete cross-comparison of the SwissProt v38 protein database
SwissProt v38 80‘000 entries
average seq. length: 400 residues
A complete cross-comparison (using state of the art methods) runs 2 years (cpu time) on a single PC
Currently, doing this in a much smaller scale takes several months (just updates)
Task 1User Input
Task 2Parallel
PreProcessing
Task 3All vs. All
Task 4Merge
Database,Output file
InputParams
OutputFiles
MasterResult
Raw Ref
Execution Unit
n times
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
15
The All vs. All lifecycle
0
10
20
30
Dec
17
Dec
18
Dec
19
Dec
20
Dec
21
Dec
22
Dec
23
Dec
24
Dec
25
Dec
26
Dec
27
Dec
28
Dec
29
Dec
30
Dec
31
Ja
n 0
1
Ja
n 0
2
Ja
n 0
3
Ja
n 0
4
Ja
n 0
5
Ja
n 0
6
Ja
n 0
7
Ja
n 0
8
Ja
n 0
9
Ja
n 1
0
Ja
n 1
1
Ja
n 1
2
Ja
n 1
3
Ja
n 1
4
Ja
n 1
5
Ja
n 1
6
Ja
n 1
7
Ja
n 1
8
Ja
n 1
9
Ja
n 2
0
Ja
n 2
1
0
10
20
30
Nu
mb
er of p
rocess
ors
Available processors
Running jobs
Succesful jobs
Failed jobs
Jobs on CPU
Other user
needs cluster
Disk space
shortage
Cluster busy with other jobs
Server down for maintenance
Cluster failure
All jobs manually
killed
©Gustavo Alonso. Information and Communication Systems Research Group. ETHZ.
16
Conclusions BioOpera offers:
a better way to program complex computations over clusters,
persistent computations, recoverability, monitoring and querying facilities, a scalable architecture that can be applied both in
clusters and in grids BioOpera is not:
a parallel programming language or a message passing library (too low level),
a scripting language (too inflexible), a workflow tool (too oriented to human activities)
BioOpera is extensible: accepts different process programming models, monitoring and querying facilities can be expanded and
changed, BioOpera servers can be combined to form clusters of
clusters