Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
© Neeraj SuriEU-NSF ICT March 2006
Dependable Embedded Systems & SW Groupwww.deeds.informatik.tu-darmstadt.de
HP: Hybrid Paxos for WANs
Dan Dobre, Matthias Majuntke, Marco Serafini and Neeraj Suridan,majuntke,marco,[email protected]
TU Darmstadt, Germany
EDCC, Valencia, May 18, 2010 2Matthias Majuntke
Resilience of Critical Services
request
reply
clients
n ≥ 2t+1replicas
serverrequest
no reply
clients
SMR
Safety Critical Systems Resilience against
catastrophic failures State Machine Replication
Illusion of a single serverthat never fails
Wide Area Replication Large and unpredictable
delays in WANs latency-optimal protocol
EDCC, Valencia, May 18, 2010 3Matthias Majuntke
Which Consensus Protocol State Machine Replication (SMR)
Clients propose commands to replicas Agreement on sequence of commands → replicas are in consistent
state when executing command sequence Consensus protocol needed
Latency-optimal protocols Latency: #message delays between when client proposes command and
when command is learned by learner (to be executed).
Two Protocols by Lamport Classic Paxos (CP)
•• 3 message delays (during normal operation)3 message delays (during normal operation)•• Majority quorum for recoveryMajority quorum for recovery
Fast Paxos (FP)•• 2 message delays (during normal operation)2 message delays (during normal operation)•• 2 + 4 message delays in presence of collisions2 + 4 message delays in presence of collisions•• Larger quorum for recoveryLarger quorum for recovery
Client → Leader →Acceptors → Client
Client →Acceptors → Client
EDCC, Valencia, May 18, 2010 4Matthias Majuntke
Paxos vs. Fast Paxos Compared Latency “Planetlab” Experiments Simulation of the CP and FP msg. patterns (different topologies) FP not always faster than CP
Some clients prefer CP, some FP Single crash can turn setting
EDCC, Valencia, May 18, 2010 5Matthias Majuntke
Motivation for a Hybrid Protocol
No clear winner between CP and FP With respect to latency
Hybrid Protocol: Hybrid Paxos (HP) Runs CP and FP in parallel Chooses quickest outcome of two protocols Implements Generalized Consensus
•• Commuting commands may be chosen in any orderCommuting commands may be chosen in any order Does not negatively affect throughput
•• FP mode switched off when not beneficialFP mode switched off when not beneficial
EDCC, Valencia, May 18, 2010 6Matthias Majuntke
Outline of the Talk
Contribution System Model Background on Paxos and Generalized Consensus Hybrid Paxos protocol Evaluation Discussion Conclusion
EDCC, Valencia, May 18, 2010 7Matthias Majuntke
Contribution
Hybrid Paxos (HP) CP with additional “fast mode“ Fast learning in absence of collisions 3 msg delays as CP in presence of collisions Latency optimal 2f+1 servers, f may crash (optimal) Linear number of messages (optimal)
First efficient implementation of Generalized Consensus Experiments using Emulab
HP reaches theoretical minimum of latency HP does not negatively affect throughput
EDCC, Valencia, May 18, 2010 8Matthias Majuntke
System Model
Distributed System n servers Any number of clients (may crash) Communication via reliable FIFO channels Crash-stop model At most minority of servers fails (n ≥ 2f+1), f = #crashes
Asynchrony ΩΩ Failure detector (eventually outputs same correct leader)
Generalized Consensus Command History Equivalence class of command sequences Sequences c1 and c2 are equivalent iff executing them produces same
outputs and state commuting commands
clients servers
EDCC, Valencia, May 18, 2010 9Matthias Majuntke
Background on Generalized Consensus
Protocol operates on command history = equivalence class ofcommand sequences
Terms on histories Prefix relation on histories glb of histories (largest common prefix, intersection) lub of histories (smallest common extension, union) h and h‘ compatible iff exists g: h g, h‘ g
Definition of Generalized Consensus Consistency: every two learned histories are compatible. Nontriviality: if history is chosen than all contained commands have
been proposed. Conservatism: if history h is learned, then h was chosen. Progress: if command c is proposed, eventually a history containing c is
learned.
EDCC, Valencia, May 18, 2010 10Matthias Majuntke
Background on Paxos Family
Following holds for CP, FP, and HP Clients are proposers and learners Servers are acceptors
Cooperate to choose single comand history Acceptors query ΩΩ and elect leader among them
Unique Leader needed for progress only Paxos * protocols operate in rounds
Each leader is preassigned a set of round numbers Operation modes
Recovery, to change rounds (must ensure consistency) Normal operation
Quorums of acceptors CP: any two quorums intersect FP: requires larger fast quorums
•• intersection of quorum and fast quorum FQ is larger than n-|FQ|intersection of quorum and fast quorum FQ is larger than n-|FQ|
|FQ|n-|FQ|
n-|FQ|+1
EDCC, Valencia, May 18, 2010 11Matthias Majuntke
CP and FP Message Patterns
Recovery (all protocols)cl
ld
acc
Normal Operation of FPcl
ld
acc
Normal Operation of CP
Fast mode Recovery from collision
1a 1b 2a2b
Phase 1 Phase 2
2a 2b
2b
2bfast
2bfast
1a 1b2a 2bpropose
propose
chosen
EDCC, Valencia, May 18, 2010 12Matthias Majuntke
Ideas behind Message Patterns Normal Operation CP
Client sends proposal (command) to leader Leader appends command to history
and sends history to acceptors (2a) Acceptors accept history as local history Acceptors send history back to client (2b)
Normal Operation FP Client sends proposal to acceptors Acceptors append commands to local fast history (optimistic) Acceptors send history back to client (and leader) (2bfast) Collision Recovery triggered by Leader
Recovery (to start a new round) Phase 1: initialized by new leader (1a) Acceptors send local histories to leader (1b) Leader determines chosen history Phase 2: Leader synchronizes acceptors to chosen history (2a) Reply to clients (2b)
Core ofprotocol
EDCC, Valencia, May 18, 2010 13Matthias Majuntke
Combining the two protocols
cl
ld
acc2a 2b
2bpropose
2bfast
2bfast chosen
propose
2bfast
Execute CP and FP pattern in parallel CP with additional FP mode Acceptors locally maintain fast and classic history
•• History from ld as classic historyHistory from ld as classic history•• Commands from Commands from clcl appended to fast history appended to fast history
No naïve combination Clients learn either by receiving
•• Quorum of equal 2b messages (Quorum of equal 2b messages (learnlearn))•• Fast Quorum of equal 2bfast messages Fast Quorum of equal 2bfast messages and one 2b messageand one 2b message
((hybrid learnhybrid learn))
CP FPHP
Needed also in FP forspeculative execution
EDCC, Valencia, May 18, 2010 14Matthias Majuntke
Hybrid Recovery
Same message pattern Acceptors maintain separate histories
Classic history Fast history
Leader perform CP and FP like recoveries in parallel Determines history fh from FP recovery Determines history h from CP recovery
Problem: h and fh might be incompatible (no common extension) Determine largest prefix pfh of fh which is compatible with h Pick lub of pfh and h (smallest common extension)
Why is this correct (sufficient for Consistency)? To show: any history lh learned by hybrid learn is prefix of pfh. lh fh, and all prefixes of fh compatible with h are prefixes of pfh Sufficient to show: lh compatible with h By hybrid learning: some acceptor holds lh as classic history lh and h have been sent by leader lh and h are compatible
Neither h nor fh sufficientGoal: lub of h and fh
EDCC, Valencia, May 18, 2010 15Matthias Majuntke
Implementation Optimization
Optimization 1 (msg complexity) Leader does not send entire history to acceptors (2a) FIFO channels
Optimization 2 (execution) Implementing state machine at servers Only leader executes commands (speculatively) Prevents rollbacks at acceptors Clients receive history digests + result
Optimization 3 (latency) Diverging fast and classic histories during normal mode prevents
hybrid learning Periodically acceptors locally align fh to h (as in hybrid recovery)
Optimization 4 (throughput) FP mode switched off during high load Leader monitors load
Also true for FP
EDCC, Valencia, May 18, 2010 16Matthias Majuntke
Evaluation
Experimental setting Banking system, two operations deposit and withdraw deposit operations are commutable (Generalized Consensus) Emulab test bed 20ms link delay between client and servers, 100Mbps Topology similar to “Europe“ topology from beginning of presentation Servers 600Mhz PC, Fedora 6
EDCC, Valencia, May 18, 2010 17Matthias Majuntke
Latency
Latency of HP with varyingwithdraw rate = probabilityof collisions
Latency vs throughput (withand w/o batching)
EDCC, Valencia, May 18, 2010 18Matthias Majuntke
Throughput
Throughput with increasing clients Throughput with increasingnumber of f
EDCC, Valencia, May 18, 2010 19Matthias Majuntke
Related Work [Lamport: ACM Computer 1998] The Part-Time parliament [Lamport: Dist. Comp. 2006] Fast Paxos [Lamport: TR2005] Generalized Consensus and Paxos [Dobre, Suri DSN2006] One-step Consensus with Zero-degradation [Charron-Bost, Schiper: PRDC2006] Improving Fast Paxos: Being Optimal
with no Overhead Minimum latency of FP and CP only in failure-free runs
[Camargos, Schmidt, Pedone: NCA2008] Mulitcoordinated AgreementProtocols for Higher Availability Improved availability of CP by multiple leaders; collision resolution req.
[Zielinski: DISC2005] Optimistic Generic Broadcast Parallel execution of CP and FP; not resilience optimal; quadratic msg complexity
[Mao, Junqueira, Marzullo: OSDI2008] Mencius: Building EfficientReplicated State Machine for WANs Based on CP; partition consensus instances among several leaders (throughput) Each client has LAN connection to one leader (latency) Perfect failure detector needed
EDCC, Valencia, May 18, 2010 20Matthias Majuntke
Discussion
Comparison to CP Implements CP Never worse than CP FP mode switched off when leader is highly loaded
Comparison to FP HP and FP need 2 msg delays in absence of collisions HP needs 3, FP needs 6 msg delays in presence of collisions Experiments: Collision rate grows faster than server utilization rate
•• Servers underutilized when hybrid learning rate below 50%Servers underutilized when hybrid learning rate below 50%•• FP would spend >50% of the time recovering from collisionsFP would spend >50% of the time recovering from collisions
Optimizations Batching possible
Increasing throughput by a magnitude
EDCC, Valencia, May 18, 2010 21Matthias Majuntke
Summary
HP: Hybrid Paxos Idea: add fast learning to Paxos
Generalized Consensus protocol First protocol with 2 msg delays in absence of collisions and 3 msg
delays otherwise Optimal latency, resilience and number of messages
Generalized Consensus is practical approach for WAN replication HP can outperform state of the art protocols
HP is a Generalized Consensus protocolthat features minimal latency and
maximum throughput in most situations !
EDCC, Valencia, May 18, 2010 22Matthias Majuntke
Thank you for yourattention!
Questions?