Upload
moiz-raja
View
426
Download
0
Embed Size (px)
Citation preview
▪Abhishek Kumar▪Basheeruddin Ahmed▪Colin Dixon▪Harman Singh▪Kamal Rameshan▪Robert Varga▪Tony Tkacik
My Collaborators
3
Tom Pantelis
▪Luis Gomez▪Phillip Shea▪Radhika Hirannaiah▪and many more…
Subsystems
6
member-1
member-2
member-3
Distributed Data Store
member-1 member-2
Remote RPC Connector
Actor Systems
8
Distributed Data Store Remote RPC Connector
Actor Hierarchy
Configuration
Dispatchers
Data Synchronization
9
Data storeSynchronized Data TreeRaft for Distributed Consensus
Remote RPCSynchronized RPC RegistryGossip for data distribution
HA
17
member-2
member-3
inventory – follower -1
inventory – follower - 2
Client
member-1
DistributedDataStore
inventory – leader
Raft Distributed Consensus
18
disc
over
s no
de w
ith h
ighe
r ter
m Follower
Candidate
Leader
starts up/ recovers
times out, starts elections
receives votesfrom majority of nodes
times out, restarts elections
follower-2follower-1
leader
appe
nd en
tries
append entries
Election Replication/Consensus
Journal replication
19
leader
follower-1
follower-2
transaction-1
transaction-2
transaction-3
transaction-4
transaction-1
transaction-2
transaction-3
transaction-4
transaction-1
transaction-2
transaction-3
transaction-4
RPC Registry Replication - Gossip
26
version=1
version=2
modify
changeversion
Local bucket updates change version
m1,v1
m2,v5
m3,v7
All buckets and theirversions known to allmembers
Every 1 second memberssend all known bucket versions to any one peer
m1
m2 m3status
statusstat
us
m2 m3
m1
status
upda
telocal versions higher – send updatelocal versions lower – send status to sender
Modules
28
sal-clustering-commons
sal-akka-raft sal-remoterpc-connector
sal-distributed-datastore
sal-clustering-config
sal-akka-raft-example
sal-dummy-distributed-datastore
clustering-test-app
▪Some common messages▪Actor base classes▪The Protobuf messages used in Helium▪The Protobuf NormalizedNode serialization code▪The NormalizedNode streaming code▪Other miscellaneous utility classes
sal-clustering-commons
29
▪Implementation of the Raft Algorithm on top of akka▪Uses akka-persistence for durability▪Provides a base class called RaftActor which when can be extended
by anyone who wants to replicate state▪See sal-akka-raft-example which provides a simple implementation of
a replicated HashMap
sal-akka-raft
30
▪ConcurrentDOMDataBroker▪DistributedDataStore▪Implementation of the DOMStore SPI▪Shard built on top of RaftActor▪Creates Shards based on Sharding strategy▪Code for a client to interact with the Shard Leader
sal-distributed-datastore
31
▪RemoteRpcProvider▪Default RPC Provider. Invoked when an RPC is not found in the local
MD-SAL registry.▪Code for BucketStore which provides a mechanism to replicate state
based on Gossip▪Code for RpcBroker which allows invoking a remote rpc
sal-remoterpc-connector
32
Startup
34
DistributedConfigDataStoreProviderModule
DistributedDataStore
ShardManager
Shard1 Shard Shard3 Shard4
createInstance
ActorContextwaitTillReadyLatch
create & waitTillReady
Recovery
35
Shard1 Shard Shard3 Shard4
ShardManager
read last known state from disk
ready
waitTillReadyLatch
countDown
▪Recovery must be complete▪All Shard Leaders must be known▪Three messages are monitored by ShardManager
▪Cluster.MemberStatusUp▪Used to figure out the address of a cluster member
▪LeaderStateChanged▪Used to figure out if a Follower has a different Leader
▪ShardRoleChanged▪Use to figured out any changes in a Shard’s Role
▪Waiting is not infinite, by default it lasts only 90 seconds but is configurable
▪Will block config sub-system
Waiting for Ready
36
First Operation
38
ActorContext.findPrimary
PrimaryCache.lookup/ShardManager.findPrimary
Found?
LocalTransactionContextRemoteTransactionContext
NoOpTransactionContext
TransactionProxywrite(“inventory”, node)
Local?
N
Y
N Y
Transactions
39
Client
DistributedDataStore
inventory – leader
Client
DistributedDataStore
inventory – leader
Local Transaction Remote Transaction
mem
ber-
1 mem
ber-
1m
embe
r-2
Local Transaction Optimization
40
LocalTransactionContext Shard - Leader
write
merge
delete
ready
member-1
Remote Transaction Optimization
41
RemoteTransactionContext Shard Leader
write
merge
delete
ready
write mod
merge mod
delete mod
member-1 member-2
Transaction Rate Limiting
42
rate-limit = 100 Tx/Sec
TxCohort
Shard Leader
member-2
20ms
TxCohort
50ms
TxCohort
15ms
after rate-limit/2 transactions done….new-rate-limit = 25 Tx/Sec
Operation Limiting
43
RemoteTransactionContext Shard Leader
write
merge
delete
write mod
merge mod
delete mod
member-1 member-2
……
block
Commit Coordination
44
Shard Leader
member-2
ShardCommitCoordinator
Tx1 - ready
Tx2 - ready
Tx3 - ready
Tx1 - commit
Tx3 - commit
Tx3 - abort
Tx2 - commit
Tx1
Tx2
Tx3
Managing the in-memory journalReplicated To All
45
Client leader
follower-1 follower-2
commit transaction
txn
txn txn
Managing the in-memory journalCluster member unavailable
46
Client leader
follower-1
commit transaction
txn
txn
txntxntxn
txntxntxn
Data Change Notifications
47
Client leader
follower-1 follower-2
commit transaction
txn
txn txn
notify
Startup
49
RemoteRpcBrokerModulecreateInstance
RpcManager
RemoteRpcProvider
RpcBroker RpcRegistry RemoteRpcImpl RpcListener
Default RPC Delegate
50
RpcManager SchemaContext
DOMRpcProviderService
read all rpc definitions
registerImplementation(remoteRpcImpl)
RPC Registered
51
RpcProviderRegistryaddRoutedRpcImpl
RoutedRpcRegistrationregisterPath
RpcListener
RpcRegistry
Invoking a Remote RPC
52
RemoteRpcImplinvokeRpc
RpcRegistry
Route found?
RpcBroker
ExecuteRpc
FooService
throw Exception
Invoking a Remote RPC
53
RemoteRpcImpl
Consumer
Provider
member-1 member-2
RpcBroker
RpcRegistry
invokeRpc
invokeRpcfindRoute
ExecuteRpc
Transaction Tracing
55
Created txn member-2-txn-9400 of type READ_WRITE on chain member-2-txn-chain-13
Client
Server
Tx member-2-txn-9400 read /(urn:opendaylight:inventory?...
member-3-shard-inventory-operational: Creating transaction : shard-member-2-txn-9400
Tx member-2-txn-9400 Readying 1 transactions for commit
Tx member-2-txn-9400 commit
member-3-shard-inventory-operational: Readying transaction member-2-txn-9400
member-3-shard-inventory-operational: Committing transaction member-2-txn-9400
Tx member-2-txn-9400: commit succeeded
Cluster MemberInitiator
Counter
Transaction Type
Module
Data store type
Replication Tracing
56
Leader
Sending AppendEntries to follower member-2-shard-topology-operational: AppendEntries [term=2, leaderId=member-1-shard-topology-operational, prevLogIndex=520, prevLogTerm=2, entries=[Entry{index=521, term=2}], leaderCommit=520, replicatedToAllIndex=-1]
Follower
handleAppendEntries: AppendEntries [term=2, leaderId=member-2-shard-topology-operational, prevLogIndex=520, prevLogTerm=2, entries=[Entry{index=521, term=2}], leaderCommit=520, replicatedToAllIndex=-1]
handleAppendEntries returning : AppendEntriesReply [term=2, success=true, logLastIndex=521, logLastTerm=2, followerId=member-1-shard-topology-operational]
handleAppendEntriesReply from member-2-shard-topology-operational: applying to log – commitIndex: 521, lastAppliedIndex: 520
handleAppendEntriesReply - FollowerLogInformation for member-2-shard-topology-operational updated: matchIndex: 521, nextIndex: 522
Shard MBean
57
org.opendaylight.controller:type=DistributedOperationalDataStore,Category=Shards,name=member-1-shard-inventory-operational
Operational
Config
member-1
member-2
member-3
default
inventory
topology
operational
config
AttributesAbortTransactionsCount CommitIndex CommittedTransactionsCount CurrentTerm FailedTransactionsCount
FollowerInfo FollowerInitialSyncStatus
InMemoryJournalDataSize
InMemoryJournalLogSize LastApplied
LastCommittedTransactionTime LastIndex LastTerm Leader RaftState
ReadOnlyTransactionCount
ReadWriteTransactionCount WriteOnlyTransactionCount
VotedFor and more….
ShardManager MBean
58
org.opendaylight.controller:type=DistributedOperationalDataStore,Category=ShardManager,name=shard-manager-operational
Operational
Config
operational
config
Attributes
• LocalShards• SyncStatus
Data store GeneralRuntimeInfo MBean
59
org.opendaylight.controller:type=DistributedConfigDatastore,name=GeneralRuntimeInfo
Operational
Config
Attributes
• TransactionCreationRateLimit
Transaction Commit RateMBean
60
org.opendaylight.controller.cluster.datastore:name=distributed-data-store.config.commit.rate
Attributes
• 50thPercentile• 75thPercentile• 90thPercentile• and so on…
operational
config
• Count• Min• Max• StdDev
Data store GeneralRuntimeInfo MBean
61
org.opendaylight.controller:type=DistributedConfigDatastore,name=GeneralRuntimeInfo
Operational
Config
Attributes
• TransactionCreationRateLimit
Message Statistics MBean
62
org.opendaylight.controller.actor.metric:name=/user/shardmanager-config.msg-rate.ActorInitialized
Attributes
• 50thPercentile• 75thPercentile• 90thPercentile• and so on…
operational
config
• Count• Min• Max• StdDev
Message Name
RemoteRpcBroker MBean
64
org.opendaylight.controller:type=RemoteRpcBroker,name=RemoteRpcRegistry
Attributes
• BucketVersions• GlobalRpc• LocalRegisteredRoutedRpc
Operations
• findRpcByName• findRpcByRoute
Message Statistics MBean
65
org.opendaylight.controller.actor.metric:name=/user/rpc/registry.msg-rate.AddOrUpdateRoutes
Attributes
• 50thPercentile• 75thPercentile• 90thPercentile• and so on…
• Count• Min• Max• StdDev
Message Name
▪Deploy a cluster▪Run clustering integration tests▪Write an application that works in the cluster▪Write bugs to report features which you find missing▪Try running dsBenchMark on a cluster▪Test out replication using the dummy data store▪Check out the code▪Send email to [email protected] with questions
Suggested Next Steps…
67