20
Apache Web Services in the Real World, an E-Science Perspective Srinath Perera Architect, WSO2 Inc. Member, Apache Software Foundation Lanka Software Foundation

Apache Web Services in the Real World, an E-Science Perspective

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Apache Web Services in the Real World, an E-Science Perspective

Apache Web Services in the Real World, an E-Science

PerspectiveSrinath Perera

Architect, WSO2 Inc. Member, Apache Software Foundation

Lanka Software Foundation

Page 2: Apache Web Services in the Real World, an E-Science Perspective

Outline

● Linked Environment for Atmospheric Discovery Project (LEAD), the Use Case.

● LEAD Architecture, using SOA to build a Large Scale E-Science Project.

● History: LEAD and Apache Web Service Projects.

● Apache as a Sustainability Model for Academic Projects.

Page 3: Apache Web Services in the Real World, an E-Science Perspective

E-Science ● Continuation of High Performance Computing,

Parallel Computing, and Grid.● Cyber-infrastructures to support Scientific

Research. ● Build around “Computation” as the third Pillar of

Science (along with Analysis and Experimentation).

● Characterized by wide range of computing (CPU minutes to CPU years) and Data (few KB to Pbs of data) requirements.

● Based on Real life usecases.

Page 4: Apache Web Services in the Real World, an E-Science Perspective

Reality is Harder than Fiction● E-Science joins Theory with Real life data● Real Life Applications often go beyond our

experiences. ● Most Weather models are calculated much less

than ideal resolutions, otherwise a 24 hour forecast takes more than 24 hours !!!

● Physics Usecases (e.g. Large Hadron Collider), Telescopes, Genome Analysis generate Tera bytes of data in days if not hours, and moving a 1TB takes hours even in a 10 GB networks of TeraGrid.

● Scale, Geographical Distribution of resources, Heterogeneity makes these usecases Complex.

Page 5: Apache Web Services in the Real World, an E-Science Perspective

Linked Environments for Atmospheric Discovery (LEAD) ● U.S. NSF funded, 10+ Universities, 11M $, 5

Years.● Used for U.S. National Weather forecasts by

NOAA. ● Presented to U.S. Congress as an example to

justify Scientific research spendings by U.S. NSF.

● Have brought the state of the art forecasting capabilities to wider audience ranging from hardcore scientists to high schools students.

Page 6: Apache Web Services in the Real World, an E-Science Perspective

LEAD: Dynamic Weather Analysis in U.S. Wide Scale

Page 7: Apache Web Services in the Real World, an E-Science Perspective

Why is it Hard?

● Geographically Distributed Sensors, Computing Power, Storage, and Expertise.

● Handling Failures and Recovery ● Long Running Jobs (> 1 Hour). ● Large Scale Jobs (10-1000+ processors). ● Large Sized Data (KBs to GB of data). ● Need to serve many Parallel Users. ● Usage Spikes.

Page 8: Apache Web Services in the Real World, an E-Science Perspective

LEAD as an Example

● Assume a Hurricane developed, and 1000 scientists across U.S. come to LEAD portal to run forecasts.

● Lets assume, ● Each user run 3 workflows.● Each Workflow has 6 services, generates about 300

notifications, moves 50 100MB files, generates 50 100MB files, and runs for one hour.

● Each Service needs 5 CPUs Hours .

Page 9: Apache Web Services in the Real World, an E-Science Perspective

Which Means

● 3000 Parallel workflows ● Need 90,000 CPUs per Hour ● 250 TPS for messaging System● Move 8GB/Sec through the network● Generate 15TB data per Hour

LEAD Can not handle these numbers yet, but they give us an idea about the

challenge.

Page 10: Apache Web Services in the Real World, an E-Science Perspective

SOA, E-Science and LEAD● E-Science infrastructures are Distributed, Complex,

and Heterogeneous. ● SOA is designed to handle just the like.● LEAD is based on many SOA Specs

– WSDL, SOAP, WS-Addressing for Communication– WS-BPEL for Workflows– WS-Eventing for Messaging – WSDM for service Management

● LEAD People have closely worked with and contributed to Web Services, pushing its limits to apply it to LEAD.

Page 11: Apache Web Services in the Real World, an E-Science Perspective

LEAD Architecture

Page 12: Apache Web Services in the Real World, an E-Science Perspective

Workflow Subsystem

Page 13: Apache Web Services in the Real World, an E-Science Perspective

Data Subsystem

Page 14: Apache Web Services in the Real World, an E-Science Perspective

Messaging Subsystem

Page 15: Apache Web Services in the Real World, an E-Science Perspective

LEAD & Apache WS History● Few People from LEAD has been major contributors for

Apache Axis, and then Axis2.● LEAD is not based on Axis2. ● LEAD is older than Axis2, and it forked off in Axis era,

mainly because of Async messaging support. ● Five years ago LEAD implemented many tools (e.g.

Registries, Async Messaging, Workflow Engine), that are hot topics now.

● Towards the end, LEAD started looking at Axis2 and other Apache Projects from a Sustainability Perspective.

● Most part are already converted, others are being converted.

Page 16: Apache Web Services in the Real World, an E-Science Perspective

LEAD with Apache Projects● LEAD Switched to Apache ODE for workflow

execution more than a Year ago. ● LEAD data subsystems switched to Axis2 about a

Year ago. ● Job Submission was switched to Axis2 based solution

few months back. ● Service Factory is being converted to Axis2 right now. ● Conversion of Messaging System is in progress

(Through a Indiana University and LSF collaboration).

Page 17: Apache Web Services in the Real World, an E-Science Perspective

Apache as a Sustainability model for Research projects

● Industry values “People”, we (opensource) value “Code”, and Academia values “Ideas”.

● Most NSF Grants, now, ask for a Sustainability Model as part of Proposals.

● One option is a commercial spin off

● Doing it in a opensource way, building a community and users around a project is also a potential Solution.

● Many Challenges: ownership, need to renounce control, active engagement of the community are the key. ● “Source Open” is not good enough!!● “Dump and Run” does not work either.

Page 18: Apache Web Services in the Real World, an E-Science Perspective

Pros & ConsAdvantages Disadvantages

Reach to a wider Audience. Healthy User Community, world debug your project for you.

You have to let go of the ownership, at least to a some extent.

Potential Long Lifetime, Self sustaining community if Successful.

Need for community Consent might slow you down.

To take advantage of Apache Process throughout Project life cycle (Releases, SVN, Jira, Wiki, Culture ).

You have to learn to listen and explain. Some arguments are harder to do in a mailing list.

Better Chances of Attracting external Developers, more inputs. Better chance of avoiding “source open”.

Have to Time Publications.

Take advantage of Apache Infrastructure.

Page 19: Apache Web Services in the Real World, an E-Science Perspective

Conclusion

● Wanted to share a Real Life, Large-Scale SOA Usecase

● Wanted to show LEAD-Apache interactions as a real Life Case Study of interactions between Apache and an Academic Project.

● Wanted to Showcase Apache as a Sustainability Mechanism, if it is done right.

Page 20: Apache Web Services in the Real World, an E-Science Perspective

Questions?