Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
An SNMP based failure detection service
Matthias Wiesmann, Péter Urbán, Xavier Défago
FN NFSSwiss National Science Foundation
北陸先端科學技術大學院大學
1
An SNMP based failure detection service
Overview
• Failure Detection
• SNMP
• SNMP-FD Service
• Conclusion & Future Work
2
2
An SNMP based failure detection service
Failure detection
• Basic abstraction for distributed systems
• Needed for solving fundamental problems:
• Consensus, Atomic broadcast, Atomic commitment
• Formal definitions: P, ♢S, Ω, φ, etc.
• Also outside distributed system community
• System, Network, Cluster management, Middleware, Grid systems
• Common problem→ common solution
3
3
Failure detection service
An SNMP based failure detection service
•Factorize-out failure detection
• Shared, separate implementation
• Complex, adaptive policies
• Proposed in the literature
• Implemented by middleware systems, grid systems
• Problem: no standard
•Every system has its own service and interfaces
4
4
An SNMP based failure detection service
Standard Needed
• External standard
• Connect clients of the service to the service• Application, Middleware, Monitoring and administration tools
• Suspicion notifications, configuration
• Internal standard
• Use standard for heartbeat messages, internal queries
• We need some form of Middleware
5
5
An SNMP based failure detection service
Overview
• Failure Detection
• SNMP
• SNMP-FD Service
• Conclusion & Future Work
6
6
An SNMP based failure detection service
Why SNMP ?
• Simple Network Management Protocol
• Present in many network attached devices now
• Routers, Smart Switches, UPS, Printers…
• Simple Middleware
• Offers basic features
• Relatively lightweight
• Many applicable standard defined
• Monitoring, fault reporting.
7
7
Network
EquipmentNetwork
Management
StationSNMP
Agent
Monitored
ObjectMIB
An SNMP based failure detection service
SNMP Architecture
• Network Management Station
• Manages equipment
• Equipment’s agent
• Exposes Management Information Base (MIB)
• Sends asynchronous messages (traps) to NMS.
8
8
An SNMP based failure detection service
Overview
• Failure Detection
• SNMP
• SNMP-FD Service
• Conclusion & Future Work
9
9
!"#$%&'
!"#$%&'S
()*$'B!"#$%"&'()&"*'++,
p1B
-,."+%,&'+"/&*'+,!01
-,2"%$!*3%$"#,4,53&6'%,!01
-,.'3&%7'3%+,!01
!"#$%&'&($%)*$+,-&.%*,+/01,
!"#$%"&'()&"*'++,
p2B
!"#$%"&'()&"*'++,
p3B
-,!01,
23(4&56,%*
()*$'A
7,*
8+09:
2,*&;&7,*
<,0+*=,0*&
8+09:!"#$%&'&()%&$*"+",)-./%"012
!"34)-5"012
-,.'3&%7'3%,8#"9:'(6',!01
-,)&"*'++,8#"9:'(6',!01
-,;3$:/&',<'%'*%$"#,!01,
!"#$%&'&($%)*$+)%6&.%*,+/01,&
!"#$%"&$#6)&"*'++
pA
3$*)!10*)$%
2,*&;&7,*
6%)*7)-7"012
*/+%"=,!01
()
(*
)>?+$*3:,@$#A
!'++36'+
B/'&$'+,C,D&$%'+
An SNMP based failure detection service
SNMP-FD service
10
10
An SNMP based failure detection service
Service Architecture
• Distributed architecture
• Dæmon runs on each host
• Monitor local processes
• Exchange heartbeats
• Export MIB information
• Can interact with equipment
• Similar to other failure detection service architectures
11
!"#$%&'
!"#$%&'S
()*$'B!"#$%"&'()&"*'++,
p1B
-,."+%,&'+"/&*'+,!01
-,2"%$!*3%$"#,4,53&6'%,!01
-,.'3&%7'3%+,!01
!"#$%&'&($%)*$+,-&.%*,+/01,
!"#$%"&'()&"*'++,
p2B
!"#$%"&'()&"*'++,
p3B
-,!01,
23(4&56,%*
()*$'A
7,*
8+09:
2,*&;&7,*
<,0+*=,0*&
8+09:!"#$%&'&()%&$*"+",)-./%"012
!"34)-5"012
-,.'3&%7'3%,8#"9:'(6',!01
-,)&"*'++,8#"9:'(6',!01
-,;3$:/&',<'%'*%$"#,!01,
!"#$%&'&($%)*$+)%6&.%*,+/01,&
!"#$%"&$#6)&"*'++
pA
3$*)!10*)$%
2,*&;&7,*
6%)*7)-7"012
*/+%"=,!01
()
(*
)>?+$*3:,@$#A
!'++36'+
B/'&$'+,C,D&$%'+
11
An SNMP based failure detection service
Features
• Uses many standard MIBs
• Host Resource MIB → Process state
• Target & Notification MIB → Subscriber description
• Alarm MIB → Current suspicion description
• Lightweight messaging:
• Heartbeat are UDP traps.
• Leverages SNMP
• Configuration, Security, Interoperability with devices12
❏✔
12
An SNMP based failure detection service
New MIBs
• Heartbeat MIB
• Describes heartbeats
• Failure detection MIB
• Describes failure detector configuration
• Described using standard SNA syntax file
• Accessible using standard tools
13
SNMP table: SnmpFD::kbhbTable
index interval lastbeat lasttime count lostcount mean stddev min max kt-dhcp23.jaist.ac.jp 100 2678 0:0:04:38.49 2671 0 99.9955 28.3417 0 356
13
An SNMP based failure detection service
Implementation
• Validate the architecture
• Implemented in Java
• Open source libraries
• SNMP4J, SNMP4J Agent, Apache Libraries
• Available for download
• http://ddsg.jaist.ac.jp//en/projects/snmp-fd/
14
14
An SNMP based failure detection service
Implementation
• Validate the architecture
• Implemented in Java
• Open source libraries
• SNMP4J, SNMP4J Agent, Apache Libraries
• Available for download
• http://ddsg.jaist.ac.jp//en/projects/snmp-fd/
14
14
• Simple failure detector
• Stable performance
• Consistent with expectations
• Validates approach
• Issues
• High latency
• High CPU load on receiver
An SNMP based failure detection service
Performance
15
0.001
0.01
0.1
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Timeout (ms)
Mis
takes
per
sec
ond
99.85%
99.9%
99.95%
100.0%
Quer
y
Acc
ura
cyMistake Rate
Query Accuracy
15
• Simple failure detector
• Stable performance
• Consistent with expectations
• Validates approach
• Issues
• High latency
• High CPU load on receiver
An SNMP based failure detection service
Performance
15
0.001
0.01
0.1
1
0 1 2 3 4 5 6 7 8 9 10 11 12 13
Timeout (ms)
Mis
takes
per
sec
ond
99.85%
99.9%
99.95%
100.0%
Quer
y
Acc
ura
cyMistake Rate
Query Accuracy
}SNMP4JIssue, code spends 95% of time there.
15
An SNMP based failure detection service
Overview
• Failure Detection
• SNMP
• SNMP-FD Service
• Conclusion & Future Work
16
16
An SNMP based failure detection service
Conclusion
• Failure detection service
• Standard based approach desirable
• No need for new standard → use existing SNMP
• Approach validated by prototype
• Can be extended with more advance FD techniques
• Hierarchical FD, adaptive FD, etc.
• Can interact with network devices
• For instance link-down traps
17
17
An SNMP based failure detection service
Future work
• Look into performance issues
• Optimizations needed in the SNMP4J library.
•Port the framework to the latest version of SNMP4J
• Support for AgentX
• Implement framework as sub-agent
• Implement more advanced failure detectors
• Handle network equipment data
18
18