EVA Community Ji-won Kim([email protected]) 2013.8.24
Intent : People make mistakes and are slow; to minimize downtime the system should take care of itself, without human intervention.
2
50%~
25%
25%
The cause of failures
Human
Hardware
Software
(From a study of the US telephone network[Kuh97])
Recognition/Report
Fix/Modification Analysis Destruction
Test/ PlanningOperation
5Network
ComputingMachine
Math
Switch
Human is not a machine
AlwaysIdentical ProceduralSoftware&Hardware
Become Bored, inattentive with routine, monotonous tasks
Control
Self Recovery
Automatic Error-HandleDetection->Processing
7
The risk of Procedural Errors
Incorrect system from imperfect Human’s
Requirements
Operator
Whole System
8
Fault Observer
Monitoring System
Component1 Component2
Component3
Fault Observer
Fault Observer
RecoverySystem
Operator
From Book, originally from©iStockphoto.com/Don Bayley
Minimize Human
Intervention
Detection
• Fault Observer
• Audible Alarm
Recovery
• Recovery Blocks
• Error Handlers
• Maximize Human Participation
Management
• Maintenance Interfaces
• IO Triage
Prevention
& Correct
• Reintegration
• Revise Procedure
Helpful Patterns :An Input and Output Pattern Language [HS00]