Fault tolerant 4_5

Preview:

Citation preview

EVA Community Ji-won Kim(logicarchitect@ieee.com) 2013.8.24

Intent : People make mistakes and are slow; to minimize downtime the system should take care of itself, without human intervention.

2

50%~

25%

25%

The cause of failures

Human

Hardware

Software

(From a study of the US telephone network[Kuh97])

Recognition/Report

Fix/Modification Analysis Destruction

Test/ PlanningOperation

5Network

ComputingMachine

Math

Switch

Human is not a machine

AlwaysIdentical ProceduralSoftware&Hardware

Become Bored, inattentive with routine, monotonous tasks

7

The risk of Procedural Errors

Incorrect system from imperfect Human’s

Requirements

Operator

Whole System

8

Fault Observer

Monitoring System

Component1 Component2

Component3

Fault Observer

Fault Observer

RecoverySystem

Operator

From Book, originally from©iStockphoto.com/Don Bayley

Minimize Human

Intervention

Detection

• Fault Observer

• Audible Alarm

Recovery

• Recovery Blocks

• Error Handlers

• Maximize Human Participation

Management

• Maintenance Interfaces

• IO Triage

Prevention

& Correct

• Reintegration

• Revise Procedure

Helpful Patterns :An Input and Output Pattern Language [HS00]

Recommended