Upload
maria-caldwell
View
218
Download
2
Embed Size (px)
Citation preview
1
A Modular Approach to Fault-Tolerant Broadcasts
and Related Problems
Author: Vassos Hadzilacos and Sam Toueg
Distributed Systems: 526 U1580Professor: Ching-Chi Hsu
ftp://ftp.db.toronto.edu/pub/vassos/fault.tolerant.broadcasts.dvi.Z
2
Overview
An earlier version appears in “Fault-Tolerant Broadcasts and Related Problems”, in chapter 5 of “Distributed Systems”, edited by Sape Mullender, Addison-Wesley Publishing Co., 1993
Introduction and Preliminaries Broadcast Specifications Broadcast Algorithms Consensus Terminating Reliable Broadcast Multicast Specifications
3
Introduction
The communication primitives available are too weak, e.g., no reliable broadcast primitive
Fault-tolerant broadcasts are communication primitives that facilitate the development of fault-tolerant applications
Another paradigm: Consensus The literature is not coherent Primary goal of this paper: develop material of fault-
tolerant broadcasts and consensus in a coherent way
4
Preliminaries
Focus on message-passing models only The chief characteristics of a message-passing model: the
type of communication network, the model of process and communication failures and the synchrony of the system
Types of communication Networks point-to-point and broadcast channel
Many of the results in this paper are independent of the type of communication networks
When needed, only point-to-point network is considered Point-to-point networks
communication primitives: send and receive
5
Preliminaries
Outgoing message buffer and incoming message buffer Every process executes an infinite sequence of steps
Failure types Process failures
Crash failure send-omission failure receive-omission failure arbitrary(Byzantine or malicious) failure
Link failure omission failure
6
Preliminaries
Synchronous and Asynchronous Networks A point-to-point network is synchronous if:
There is a know upper bound to execute a step Local clocks has known bounded rate of drift with respect to real
time There is a known upper bound on message delay ( consists of the
time to send, transport and receive)
Asynchronous: no timing assumptions Clock and Performance Failures in Synchronous Networks
Clock failure of a process: clock drift rate exceed the bound Performance failure of a process: completion time of a step exceeds
the bound
7
Preliminaries
Performance failure of a link: transport some message in more time than the bound
Classification of Failures and Terminology Omission failures: crash, send-omission, receive-omission failures
of process and link omission failures Timing failures: omission, clock and performance failures Benign failures: synonymous to omission failures in asynchronous
networks and to timing failures in synchronous networks
Causal Precedence Properties of clocks
Clock Monotonicity: the clock never decreases or skip values and for any time c, the clock eventually reaches c.
8
Preliminaries
Logical clocks: for processes p and q, and any steps e and f that occur at p and q, if then Ce(e) < Cp(f)
Synchronized Clocks: clock value at real time t differ by at most a know constant
fe
9
Broadcast Specification
Assume benign failures Reliable Broadcast: two primitives, broadcast and deliver Assume each message is attached with sender’s id and
message’s sequence number Specification of reliable broadcast
Validity: if a correct process broadcasts a message m, then it eventually delivers m
Agreement: if a correct process delivers a message m, then all correct processes eventually deliver m
Integrity: For any message m, every correct process delivers m at most once, and only if m was previously broadcast by sender(m)
If the sender of a message m is faulty, the specification
10
Broadcast Specifications
Two possible outcomes: either m is delivered by all correct processes or by none.
FIFO Broadcast FIFO Order: If a process broadcasts a message m before it
broadcasts a message m’, then no correct process delivers m’ unless it has previously delivered m
Causal Broadcast Causal Order: If the broadcast of a message m causally
precedes the broadcast of a message m’, then no correct process delivers m’ unless it has previously delivered m
11
Broadcast Specifications
Faulty specifications (from the literature) If the broadcast of m causally precedes the broadcast of m’, then
every correct process that delivers both messages must deliver m before m’
Messages that are causally related are delivered in the causal order
Local Order: If a process broadcasts a message m and a process delivers m before broadcasting m’, then no correct process delivers m’ unless it has previously delivered m
Theorem: Causal Order is equivalent to FIFO Order and Local Order
mm’
12
Broadcast Specificatoins
Atomic Broadcast Total Order: If correct processes p and q both deliver
messages m and m’, then p delivers m before m’ if and only if q delivers m before m’
FIFO Atomic Broadcast Causal Atomic Broadcast Timed Broadcasts Elapsed time can be interpreted in two different ways: real
time or local time
13
Broadcast Specifications
Real-Time Timeliness: There is a known constant such that if a message m is broadcast at real time t, then no correct process delivers m after real time t+
Assume each message m contains a timestamp ts(m) denoting the local time at which m was broadcast according to the sender’s clock
Local-Time -Timeliness: There is a known constant such that no correct process p delivers a message m after local time ts(m)+ on p’s clock
14
Broadcast Specifications
Place restrictions on the messages delivered by faulty processes
Uniform Agreement: If a process (whether correct or faulty) delivers a message m, then all correct processes eventually deliver m
Uniform Integrity: For any message m, every process (whether correct or faulty) delivers m at most once, and only if m was previously broadcast by sender(m)
Uniform Real-time -Timeliness: There is a known constant such that if a message m is broadcast at real time t, then no process (whether correct or faulty) delivers m after real time t +
15
Broadcast Specifications
Uniform Local-Time -Timeliness: There is a known constant such that no process p (whether correct or faulty) delivers a message m after local time ts(m)+ on p’s clock
Uniform FIFO Order, Uniform Local Order, Uniform Causal Order, Uniform Total Order
Broadcast Specifications for Arbitrary Failures
16
Relationship Among Broadcast Primitives
ReliableBroadcast
AtomicBroadcast
FIFOBroadcast
FIFO AtomicBroadcast
Causal AtomicBroadcast
CausalBroadcast
Total Order
Total Order
Total Order
FIFO Order
Causal Order Causal Order
FIFO Order
17
Inconsistency and Contamination
The traditional specifications of most broadcasts, including Uniform broadcasts, allow the inconsistency of faulty processes, and the subsequent contamination of correct processes
Example: Atomic Broadcast It is possible to prevent the inconsistency of faulty
processes, or at least the contamination of correct ones
18
Amplification of Failures
Broadcast primitives are usually on top of communication primitives
A broadcast algorithm is likely to amplify the severity of failures that occur at the low level
Even if processes are only subject to crash failures, we cannot assume that the message deliveries that a process make before crashing are always correct.
Example: a coordinator based atomic broadcast algorithm. Even if a faulty process behaves correctly until it crashes, it may still deliver messages out-of-order before it crashes!
Crash failures by themselves do not guarantee reasonable behavior at the broadcast/delivery level
19
Broadcast Algorithm I --Methodology
Start with any given Reliable Broadcast algorithm, and show how to achieve each one of these 3 order properties by a corresponding algorithmic transformation
3 transformations: one adds FIFO Order, one adds Causal order and one adds Total Order
None of the transformations require assumptions on the type or synchrony of the underlying network, and all of them work for any type and number of benign failures.
All transformations preserve Uniform Agreement and, under certain assumptions, both versions of -Timeliness
20
Broadcast Algorithms II --Transformations
Achieving total order Achieving FIFO order Achieving causal order All transformations preserve Uniform Agreement and,
under some conditions, both versions of -Timeliness All transformations work for any type and number of
benign failures, and regardless of the type or synchrony of the network
All broadcasts consider here satisfy Uniform Integrity
21
Achieving Total Order
A transform that can be used to transform a Reliable, FIFO or Causal Broadcast that satisfies Local-Time -Timeliness into its Atomic counterpart
This transformation preserves Validity, Agreement, Integrity, FIFO Order and Causal Order ( and their uniform counterparts)
22
Preserving Total Order
Algorithm
To execute broadcast(BA, m)
broadcast(B, m)
deliver(BA, m)
upon deliver(B, m) do
schedule deliver(BA, m) at time ts(m)+
23
Achieving FIFO Order
An algorithm that transforms any Reliable Broadcast algorithm into a FIFO Broadcast that satisfies Uniform FIFO Order.
Preserves (Uniform) Total Order Assume a sequence number is attached at every message
24
Achieving Causal Order
Two algorithms to transform from FIFO Broadcast to Causal Broadcast, one is blocking and the other not
Both require that the given FIFO Broadcast algorithms satisfy Uniform FIFO Order
Non-Blocking Transformation: preserves Total Order, but not Uniform Total Order
If the given FIFO Broadcast satisfies Uniform Agreement, the transformation preserve both versions of -Timeliness
25
Achieving Causal Order
26
Achieving Causal Order
Blocking Transformation Advantage: uses shorter messages Uses vector timestamps Preserves (Uniform) Total Order
27
Point-to-Point Networks
Model of Point-to-Point Networks Primitives send and receive satisfy: Validity: If p sends m to q, and both p and q and the link
from p to q are correct, then q eventually receives m. Uniform Integrity: For any message m, q receives m at most
once from p, and only if p previously sent m to q All Reliable Broadcast algorithms given here rely on two
assumptions Benign Failures: No Partitioning
28
Reliable Broadcast
Algorithm
To execute broadcast(R,m)
send(m) to p
upon receive(m) do
if p has not previously executed deliver(R,m)
then
send(m) to all neighbors
deliver(R, m) The algorithm satisfies Validity, Agreement, and Uniform
Integrity
29
Reliable Broadcast
Additional property of send and receive primitives Uniform FIFO Order: If p sends m to q before it sends m’ to
q, then q does not receive m’ unless it has previously received m
Theorem: If send and receive primitives satisfy Uniform FIFO Order, the Reliable Broadcast algorithm satisfies Uniform Causal Order
Additional property of send and receive primitives Strong Validity: If a process p ( whether correct or not)
completes the sending of a message m to a correct process q, and the link from p to q is correct, then q eventually receives m
30
Reliable Broadcast
Theorem: Consider a network such that: (1) processes do not commit send-omission failures, and (2) every process p (whether correct or faulty) is connected to every correct process via a path consisting entirely of correct processes and links (with the possible exception of p itself). The Reliable Broadcast algorithm satisfies Uniform Agreement
Model of Synchronous Point-to-Point Networks
31
Consensus
Two primitives: propose and decide The consensus problem requires that if each correct process
proposes a value then the following hold: Termination: Every correct process eventually decides exactly one
value Agreement: If a correct process decides v, then all correct processes
eventually decide v Integrity: If a correct process decides v, then v was previously
proposed by some process Agreement and Integrity can be strengthened to Uniformity
32
Consensus
Relating Consensus and Atomic Broadcast Transforming Atomic Broadcast into Consensus
To execute propose(v)
broadcast(A, v)
upon deliver(A, v) do
if p has not previously executed deliver(A, -)
then decide(u)
33
Consensus
Transforming Reliable Broadcast and Consensus to Atomic Broadcast
34
Terminating Reliable Broadcast
With Reliable Broadcast processes have no knowledge of the impending broadcasts
Allow the delivery of a special message With TRB for sender s, s can broadcast any message and the following hold:
Termination: Every correct process eventually delivers exactly one message
Validity: If s is correct and broadcasts a message m, then it eventually delivers m
Agreement: If a correct process delivers a message m, then all correct processes eventually deliver m
Integrity: If a correct process delivers a message m then sender(m)=s. If then m was previously broadcast by s
ΜFs
FsMm
Fsm
35
Terminating Reliable Broadcast
In some synchronous point-to-point networks, Consensus is equivalent to TRB
In asynchronous systems, the two problems are not equivalent