Download pdf - Fault tolerant 4_5

Transcript
Page 1: Fault tolerant 4_5

EVA Community Ji-won Kim([email protected]) 2013.8.24

Intent : People make mistakes and are slow; to minimize downtime the system should take care of itself, without human intervention.

Page 2: Fault tolerant 4_5

2

Page 3: Fault tolerant 4_5

50%~

25%

25%

The cause of failures

Human

Hardware

Software

(From a study of the US telephone network[Kuh97])

Page 4: Fault tolerant 4_5

Recognition/Report

Fix/Modification Analysis Destruction

Test/ PlanningOperation

Page 5: Fault tolerant 4_5

5Network

ComputingMachine

Math

Switch

Human is not a machine

AlwaysIdentical ProceduralSoftware&Hardware

Become Bored, inattentive with routine, monotonous tasks

Page 7: Fault tolerant 4_5

7

The risk of Procedural Errors

Incorrect system from imperfect Human’s

Requirements

Operator

Page 8: Fault tolerant 4_5

Whole System

8

Fault Observer

Monitoring System

Component1 Component2

Component3

Fault Observer

Fault Observer

RecoverySystem

Operator

Page 9: Fault tolerant 4_5

From Book, originally from©iStockphoto.com/Don Bayley

Page 10: Fault tolerant 4_5

Minimize Human

Intervention

Detection

• Fault Observer

• Audible Alarm

Recovery

• Recovery Blocks

• Error Handlers

• Maximize Human Participation

Management

• Maintenance Interfaces

• IO Triage

Prevention

& Correct

• Reintegration

• Revise Procedure

Helpful Patterns :An Input and Output Pattern Language [HS00]

Page 11: Fault tolerant 4_5

Recommended