41
The Horus and Ensemble Projects: Accomplishments and Limitations Ken Birman, Robert Constable, Mark Hayden, Jason Hickey, Christoph Kreitz, Robbert van Renesse, Ohul Rodeh, Werner Vogels Department of Computer Science Cornell University

The Horus and Ensemble Projects: Accomplishments and Limitations Ken Birman, Robert Constable, Mark Hayden, Jason Hickey, Christoph Kreitz, Robbert van

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

The Horus and Ensemble Projects: Accomplishments and Limitations

Ken Birman, Robert Constable, Mark Hayden, Jason Hickey, Christoph

Kreitz,Robbert van Renesse, Ohul Rodeh, Werner

Vogels

Department of Computer ScienceCornell University

January, 2000 Cornell Presentation at DISCEX

Reliable Distributed Computing: Increasingly urgent, still unsolved

Distributed computing has swept the world Impact has become revolutionary Vast wave of applications migrating to networks Already as critical a national infrastructure as

water, electricity, or telephones Yet distributed systems remain

Unreliable, prone to inexplicable outages Insecure, easily attacked Difficult (and costly) to program, bug-prone

January, 2000 Cornell Presentation at DISCEX

A National Imperative

Potential for catastrophe cited by DARPA ISAT study commissioned by Anita Jones

(1985, I briefed the findings, became basis for refocusing of much of ITO under Howard Frank)

PCCIP report, PTAC NAS study of trust in cyberspace

Need a quantum improvement in technologies, packaged in easily used, practical forms

January, 2000 Cornell Presentation at DISCEX

Quick Timeline

Cornell has developed 3 generations of reliable group communication technology Isis Toolkit: 1987-1990 Horus System: 1990-1994 Ensemble System: 1994-1999

Today engaged in a new effort reflecting a major shift in emphasis Spinglass Project: 1999-

January, 2000 Cornell Presentation at DISCEX

Questions to consider

Have these projects been successful? What did we do? How can impact be quantified? What limitations did we encounter?

How is industry responding? What next?

January, 2000 Cornell Presentation at DISCEX

Timeline

Isis Horus Ensemble

• Introduced reliability into group computing• Virtual synchrony execution model• Elaborate, monolithic, but adequate speed• Many transition successes

• New York, Swiss Stock Exchanges• French Air Traffic Control console system• Southwestern Bell Telephone network mgt.• Hiper-D (next generation AEGIS)

January, 2000 Cornell Presentation at DISCEX

Virtual Synchrony Model

crash

G0={p,q} G1={p,q,r,s} G2={q,r,s} G3={q,r,s,t}

p

q

r

s

tr, s request to join

r,s added; state xfer

t added, state xfer

t requests to join

p fails

January, 2000 Cornell Presentation at DISCEX

Why a “model”?

Models can be reduced to theory – we can prove the properties of the model, and can decide if a protocol achieves it

Enables rigorous application-level reasoning

Otherwise, the application must guess at possible misbehaviors and somehow overcome them

January, 2000 Cornell Presentation at DISCEX

Virtual Synchrony

Became widely accepted – basis of literally dozens of research systems and products worldwide

Seems to be the only way to solve problems based on replication

Very fast in small systems but faces scaling limitations in large ones

January, 2000 Cornell Presentation at DISCEX

How Do We Use The Model?

Makes it easy to reason about the state of a distributed computation

Allows us to replicate data or computation for fault-tolerance (or because multiple users share same data)

Can also replicate security keys, do load-balancing, synchronization…

January, 2000 Cornell Presentation at DISCEX

French ATC system (simplified)

Controllers

Air Traffic Database (flight plans, etc)

X.500 Directory

Radar

Onboard

January, 2000 Cornell Presentation at DISCEX

A center contains...

Perhaps 50 “teams” of 3-5 controllers each Each team supported by workstation cluster Cluster-style database server has flight plan

information Radar server distributes real-time updates Connections to other control centers (40 or so

in all of Europe, for example)

January, 2000 Cornell Presentation at DISCEX

Process groups arise here:

Cluster of servers running critical database server programs

Cluster of controller workstations support ATC by teams of controllers

Radar must send updates to the relevant group of control consoles

Flight plan updates must be distributed to the “downstream” control centers

January, 2000 Cornell Presentation at DISCEX

Role For Virtual Synchrony?

French government knows requirements for safety in ATC application

With our model, we can reduce their need to a formal set of statements

This lets us establish that our solution will really be safe in their setting

Contrast with usual ad-hoc methodologies...

January, 2000 Cornell Presentation at DISCEX

More Isis Users

New York Stock Exchange Swiss Stock Exchange Many VLSI Fabrication Facilities Many telephony control applications Hiper-D – an AEGIS rebuild prototype Various NSA and military applications Architecture contributed to SC-21/DD-

21

January, 2000 Cornell Presentation at DISCEX

Timeline

Isis Horus Ensemble

• Simpler, faster group communication system• Uses a modular layered architecture. Layers are “compiled,” headers compressed for speed• Supports dynamic adaptation and real-time apps• Partitionable version of virtual synchrony• Transitioned primarily through Stratus Computer

Phoenix product, for telecommunications

January, 2000 Cornell Presentation at DISCEX

Layered Microprotocols in Horus

Interface to Horus is extremely flexible

Horus manages group abstraction

group semantics (membership, actions,events) defined by stack of modules

encryptencryptfilterfiltersignsign

ftolftolEnsemble stacksplug-and-playmodules to givedesign flexibilityto developer

vsyncvsync

January, 2000 Cornell Presentation at DISCEX

Layered Microprotocols in Horus

Interface to Horus is extremely flexible

Horus manages group abstraction

group semantics (membership, actions,events) defined by stack of modules

encryptencryptfilterfiltersignsign

ftolftolEnsemble stacksplug-and-playmodules to givedesign flexibilityto developer

vsyncvsync

January, 2000 Cornell Presentation at DISCEX

Layered Microprotocols in Horus

Interface to Horus is extremely flexible

Horus manages group abstraction

group semantics (membership, actions,events) defined by stack of modules

encryptencryptfilterfiltersignsign

Ensemble stacksplug-and-playmodules to givedesign flexibilityto developer

vsyncvsync

ftolftol

January, 2000 Cornell Presentation at DISCEX

Group Members Use Identical Multicast Protocol Stacks

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

January, 2000 Cornell Presentation at DISCEX

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

With Multiple Stacks, Multiple Properties

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

encryptencrypt

vsyncvsync

ftolftol

January, 2000 Cornell Presentation at DISCEX

Timeline

Isis Horus Ensemble

• Horus-like stacking architecture, equally fast• Includes an group-key mechanism for secure group multicast and key management• Uses high level language, can be formally proved, an unexpected and major success• Many early transition successes

DD-21, Quorum via collaboration with BBN Nortel, STC: commercial users Discussions with MS (COM+): could be basis of standards.

January, 2000 Cornell Presentation at DISCEX

Proving Ensemble Correct

Unlike Isis and Horus, Ensemble is coded in a language with strong semantics (ML)

So we took a spec. of virtual synchrony from MIT’s IOA group (Nancy Lynch)

And are actually able to prove that our code implements the spec. and that the spec captures the virtual synchrony property!

January, 2000 Cornell Presentation at DISCEX

Why is this important?

If we use Ensemble to secure keys, our proof is a proof of security of the group keying protocol…

And the proof extends not just to the algorithm but also to the actual code implementing it

These are the largest formal proofs every undertaken!

January, 2000 Cornell Presentation at DISCEX

Why is this feasible?

Power of the NuPRL system: a fifth generation theorem proving technology

Simplifications gained through modularity: compositional code inspires a style of compositional proof

Ensemble itself is unusually elegant, protocols are spare and clear

January, 2000 Cornell Presentation at DISCEX

Other Accomplishments

An automated optimization technology Often, a simple modular protocol becomes

complex when optimized for high performance

Our approach automates optimization: the basic protocol is only coded once and we work with a single, simple, clear version

Optimizer works almost as well as hand-optimization and can be invoked at runtime

January, 2000 Cornell Presentation at DISCEX

Optimization

encryptencrypt

vsyncvsync

ftolftol

Original code is simple but inefficient

Optimized is provably the same yet inefficiencies are eliminated

January, 2000 Cornell Presentation at DISCEX

Other Accomplishments

Real-Time Fault-Tolerant Clusters Problem originated in AEGIS tracking

server Need a scalable, fault-tolerant parallel

server with rapid real-time guarantees

January, 2000 Cornell Presentation at DISCEX

AEGIS Problem

Emulate this… With this…

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

TrackingServer

100ms deadline

January, 2000 Cornell Presentation at DISCEX

Other Accomplishments

Real-Time Fault-Tolerant Clusters Problem originated in AEGIS tracking server Need a scalable, fault-tolerant parallel server

with rapid real-time guarantees With Horus, we achieved 100ms response time,

even when nodes crash, scalability to 64 nodes or more, load balancing and linear speedups

Our approach emerged as one of the major themes in SC-21, which became DD-21

January, 2000 Cornell Presentation at DISCEX

Other Accomplishments

A flexible, object-oriented toolkit Standardizes the sorts of things

programmers do most often Programmers are able to work with high

level abstractions rather than being forced to reimplement common tools, like replicated data, each time they are needed

Embedding into NT COM architecture

January, 2000 Cornell Presentation at DISCEX

Security Architecture

Group key management Fault-tolerant, partitionable Currently exploring a very large scale

configuration that would permit rapid key refresh and revocation even with millions of users

All provably correct

January, 2000 Cornell Presentation at DISCEX

Transition Paths?

Through collaboration with BBN, delivered to DARPA QUOIN effort

Part of DD-21 architecture Strong interest in industry, good

prospects for “major vendor” offerings within a year or two

A good success for Cornell and DARPA

January, 2000 Cornell Presentation at DISCEX

What Next?

Continue some work with Ensemble Research focuses on proof of replication stack Otherwise, keep it alive, support and extend it Play an active role in transition Assist standards efforts

But shift in focus to a completely new effort Emphasize adaptive behavior, extreme

scalability, robustness against local disruption Fits “Intrinisically Survivable Systems” initiative

January, 2000 Cornell Presentation at DISCEX

Throughput Stability: Achilles Heel of Group Multicast

When scaled to even modest environments, overheads of virtual synchrony become a problem One serious challenge involves management of

group membership information But multicast throughput also becomes unstable

with high data rates, large system size, too. A problem in every protocol we’ve studied

including other “scalable, reliable” protocols

January, 2000 Cornell Presentation at DISCEX

Thoughput Scenario

Most members are healthy….

… but one is slow

January, 2000 Cornell Presentation at DISCEX

Throughput as one member of a multicast group is "perturbed" by forcing it to sleep for varying amounts of time.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

50

100

150

200

250Virtually synchronous Ensemble multicast protocols

Degree of slowdown

Gro

up

th

rou

gh

pu

t (h

ealt

hy m

em

bers

)

32 group members

64 group members

96 group members

January, 2000 Cornell Presentation at DISCEX

Bimodal Multicast in Spinglass

A new family of protocols with stable throughput, extremely scalable, fixed and low overhead per process and per message

Gives tunable probabilistic guarantees Includes a membership protocol and a

multicast protocol Ideal match for small, nomadic devices

January, 2000 Cornell Presentation at DISCEX

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

20

40

60

80

100

120

140

160

180

200

slowdown

aver

age

thro

ugh

pu

t

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

20

40

60

80

100

120

140

160

180

200

slowdown

aver

age

thro

ugh

pu

t

Throughput with 25% Slow processes

January, 2000 Cornell Presentation at DISCEX

Spinglass: Summary of objectives

Radically different approach yields stable, scalable protocols with steady throughput

Small footprint, tunable to match conditions Completely asynchronous, hence demands

new style of application development But opens the door to a new lightweight

reliability technology supporting large autonomous environments that adapt

January, 2000 Cornell Presentation at DISCEX

Conclusions

Cornell: leader in reliable distributed computing High impact on important DoD problems, such as

AEGIS (DD-21), QUOIN, NSA intelligence gathering, many other applications

Demonstrated modular plug-and-play protocols that perform well and can be proved correct

Transition into standard, off the shelf O/S Spinglass – the next major step forward