Upload
vaidas-brundza
View
135
Download
1
Embed Size (px)
DESCRIPTION
Introduction to the design choices behind partition-tolerant distributed pub-sub system
Citation preview
Focus on:◦ fault – tolerance
◦ reliability
Based on:◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
Focus on:◦ fault – tolerance
◦ reliability
Based on:◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
Focus on:◦ fault – tolerance
◦ reliability
Based on:◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
1-neighborhood
Focus on:◦ fault – tolerance
◦ reliability
Based on:◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
2-neighborhood
1-neighborhood
Focus on:◦ fault – tolerance
◦ reliability
Based on:◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
2-neighborhood
1-neighborhood
Focus on:◦ fault – tolerance
◦ reliability
Based on:◦ tree-overlay
◦ neighborhood knowledge
◦ δ - configuration parameter
3-neighborhood
2-neighborhood
1-neighborhood
◦ An “island” :
ABCDEF SP D
sourcedestination
◦ An “island” :
◦ A “barrier”:
◦ Partition identifier (PID) = (pd, i, pnodes)
ABCDEF SP DEF
ABCDEF SP D
sourcedestination
destination source
Subscription is accepted when it is added into routing tables
That requires acknowledgments from whole outgoing set
ABCDEP S
Subscription is accepted when it is added into routing tables
That requires acknowledgments from whole outgoing set
ABCDEP S
Subscriptions
s
Subscription is accepted when it is added into routing tables
That requires acknowledgments from whole outgoing set
ABCDEP S
Subscriptions
ssssss
Subscription is accepted when it is added into routing tables
That requires acknowledgments from whole outgoing set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑conf
Subscription is accepted when it is added into routing tables
That requires acknowledgments from whole outgoing set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑conf
☑conf
☑conf
☑conf
☑conf
☑conf
Subscription is accepted when it is added into routing tables
That requires acknowledgments from whole outgoing set
ABCDEP S
Subscriptions
Confirmations
ssssss
☑conf
☑conf
☑conf
☑conf
☑conf
☑conf
☑
Brokers’ B FD detects partition, and connects to first alive broker along the path
It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
Brokers’ B FD detects partition, and connects to first alive broker along the path
It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
sss
Brokers’ B FD detects partition, and connects to first alive broker along the path
It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑conf
ss
☑
conf
Brokers’ B FD detects partition, and connects to first alive broker along the path
It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑conf
ss
☑
conf
☑conf*
* Tag conf with pid
Brokers’ B FD detects partition, and connects to first alive broker along the path
It removes identified nodes from Outs list and sends confirmation to upper brokers with included PID of partition
Subscription is accepted when all ACK messages are received from brokers in Outs list
ABCDEP S
Confirmations
Subscriptions
CD B
s
☑conf
ss
☑
conf
☑conf*
☑conf*
☑* pid tag is alsostored alongwith s* Tag conf with pid
☑
Forwarding compromises of five steps:◦ Queuing
◦ Barrier checking
◦ Matching
◦ Routing
◦ cleanup
Forwarding only uses subscriptions accepted brokers. Steps in forwarding of publication p:
◦ Identify broker of accepted subscriptions that match p◦ Determine active connections towards matching subscriptions’
brokers◦ Send p on those active connections and wait for confirmations◦ If there are local matching subscribers, deliver to them◦ If no downstream matching subscriber exists, issue confirmation
towards P◦ Once confirmations arrive, discard p and send a conf towards P
Publications
ABCDEP S
Subscriptions
☑
p
☑ ☑ ☑ ☑ ☑ ☑
CE
p p p p p
Deliver to localsubscribers
confconfconfconfconfconf
p
Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription
Publications
ABCDEP S
Subscriptions
Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription
Publications
ABCDEP S
Subscriptions
☑ ☑ ☑ ☑ ☑*
Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p
Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
p
Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription
conf
conf
conf
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
conf
p
Key forwarding invariant to ensure reliability: ensuring that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription
conf
conf
conf
Publications
ABCDEP S
Subscriptionsp
C BD
☑ ☑ ☑ ☑ ☑*
p p
Depending on when this link has been establishedeither recovery or subscription propagation ensure
C accepts s prior to receiving p
conf
p
Is initiated upon activation of a new session.
Have five steps:◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are transferred too
◦ Subscriptions are accepted by R and sent to its downstream network
◦ Partition information is updated within distance 2δ
Is initiated upon activation of a new session.
Have five steps:◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are transferred too
◦ Subscriptions are accepted by R and sent to its downstream network
◦ Partition information is updated within distance 2δ
ABCDEX R
New session
Is initiated upon activation of a new session.
Have five steps:◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are transferred too
◦ Subscriptions are accepted by R and sent to its downstream network
◦ Partition information is updated within distance 2δ
sisi
ABCDEX R
New sessionsi sisisi
Is initiated upon activation of a new session.
Have five steps:◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are transferred too
◦ Subscriptions are accepted by R and sent to its downstream network
◦ Partition information is updated within distance 2δ
sisi
ABCDEX R
New session
csi
si
csicsi
csicsicsi
sisisiAck messages
Is initiated upon activation of a new session.
Have five steps:◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are transferred too
◦ Subscriptions are accepted by R and sent to its downstream network
◦ Partition information is updated within distance 2δ
sisi
ABCDEX R
New session
csi ☑*
si
csicsi
csicsicsi
sisisiAck messages
Is initiated upon activation of a new session.
Have five steps:◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are transferred too
◦ Subscriptions are accepted by R and sent to its downstream network
◦ Partition information is updated within distance 2δ
sisi
ABCDEX R
New session
csi ☑*
si
csicsi
csicsicsi
sisisiAck messages
Is initiated upon activation of a new session.
Have five steps:◦ Notify about active session
◦ Reply by sending a summary of subscriptions
◦ Summary is compared to local list, missing subscriptions are transferred too
◦ Subscriptions are accepted by R and sent to its downstream network
◦ Partition information is updated within distance 2δ
sisi
ABCDEX R
New session
csi ☑*
si
csicsi
csicsicsi
sisisiAck messages
Is required for crashed broker, that have been restarted
Restarted node should be able:◦ Restoring its δ+1 – neighborhood from stable storage
◦ Querying a network management service aware of neighborhood information
Further steps:◦ Activating links with neighbors
◦ Partial recovery initiation
Size of brokers’ neighborhoods as a function of ∆
∆=4∆=3
∆=1
∆=2
• Network size of 1000
• Broker fanout of 3
Impact of failures on end-to-end broker reachability
– Overlay setup:• Network size 1000 Brokers with
fanout=3
– Failure injection:• Failures: up to 100 brokers• We randomly marked a given
number of nodes as failed
– Measurements:• The number of end-to-end
brokers whose intermediate primary tree path contains ∆ consecutive failed brokers in a chain have been counted.
Impact of failures on end-to-end broker reachability
∆=3
∆=4
∆=2∆=1
– Overlay setup:• Network size 1000 Brokers with
fanout=3
– Failure injection:• Failures: up to 100 brokers• We randomly marked a given
number of nodes as failed
– Measurements:• The number of end-to-end
brokers whose intermediate primary tree path contains ∆ consecutive failed brokers in a chain have been counted.
Impact of failures on publication delivery
500 brokers deployed on 8-core machines in a cluster:• Network setup: Overlay
fanout = 3.• We measured
aggregate pub. delivery count in an interval of 120s
• Expected bar is number of publications that must be delivered despite failures (this excludes traffic to/from failed brokers).
Impact of failures on publication delivery
500 brokers deployed on 8-core machines in a cluster:• Network setup: Overlay
fanout = 3.• We measured
aggregate pub. delivery count in an interval of 120s
• Expected bar is number of publications that must be delivered despite failures (this excludes traffic to/from failed brokers).
Snoeren – publications are forwarded redundantly on multiple disjoint paths between subscribers and publishers
XNET – provides crash/failover scheme similar to this works when δ=1
Gryphon – based on replication scheme, in which routing information is replicated across multiple physical machines
Developed reliable P/S system that toleratesconcurrent broker and link failures:
◦ Configuration parameter δ determines level of resiliency against failures (in the worst case).
◦ Dissemination trees augmented with neighborhood knowledge.
◦ Neighborhood knowledge allows brokers to maintain network connectivity and make forwarding decision despite failures.