Upload
eselab
View
108
Download
0
Embed Size (px)
DESCRIPTION
Citation preview
Lecture 1/2
Lecture outline
• Course informa+on – Examina+on: project
• What is a “safety-‐cri+cal embedded system”? – Embedded systems
– Real-‐+me systems
– Safety-‐cri+cal systems
• Fundamental concepts of dependability – The “dependability” concept – Threats: fault, error, failure – ALributes: reliability, availability
Lecture 1/3
Course informa+on
• Contact – Paul Pop, course leader and examiner
• Email: [email protected]
• Phone: 4525 3732 • Office: building 322, office 228
• Webpage – All the informa+on is on CampusNet
Lecture 1/4
Course informa+on, cont.
• Textbook: Israel Koren and C. Mani Krishna, Fault-‐Tolerant Systems Morgan Kaufmann
• Full text available online, see the link on CampusNet
Lecture 1/5
Course informa+on, cont.
• Lectures – Language: English – 12 lectures
• Lecture notes: available on CampusNet as a PDF file the day before
• Dec. 1 is used for the project • Two invited lectures, from Novo Nordisk and Danfoss
• Examina+on – Project: 70% report + 30% presenta+on
• 7.5 ECTS points
Lecture 1/6
Project
• Milestones – End of September: Group registra+on and topic selec+on
• Email to [email protected]
– End of October: Project report drae • Upload drae to CampusNet
– End of November: Report submission • Upload final report to CampusNet
– Last lecture: Project presenta+on and oral opposi+on • Upload presenta+on to CampusNet
Lecture 1/7
Project, cont.
• Project registra+on – E-‐mail Paul Pop, [email protected]
• Subject: 02228 registra+on • Body:
– Name student #1, student ID
– Name student #2, student ID
– Project +tle – Project details
• Notes – Groups of max. 3 persons
Project approval
Lecture 1/8
Project, cont.
• Topic categories 1. Literature survey
• See the “references” and “further reading” in the course literature
2. Tool case-‐study • Select a commercial or research tool and
use it on a case-‐study
3. Soeware implementa+on • Implement a technique,
e.g., error detec+on or fault-‐tolerance technique
– Suggested topics available on CampusNet
Lecture 1/9
Project, cont.
• Examples of last years’ projects – ARIANE 5: Flight 501 Failure – Hamming Correc+ng Code Implementa+on in
Transmimng System – Applica+on of a Fault Tolerance to a Wind Turbine – Guaranteed Service in Fault-‐Tolerant Network-‐on-‐Chip – Fault tolerant digital communica+on – Resilience in Mobile Mul+-‐hop Ad-‐hoc Networks – Fault tolerant ALU – Reliable message transmission in the CAN, TTP and FlexRay
Lecture 1/10
Project deliverables
1. Literature survey – WriLen report
• Structure – Title, authors
– Abstract
– Introduc+on
– Body
– Conclusions
– References
2. Tool case-‐study – Case-‐study files
– Report • Document your work
4. Soeware implementa+on – Source code with comments
– Report • Document your work
Deadline for drae: End of October
Deadline for final version End of November
Lecture 1/11
Project presenta+on & opposi+on
• Poster presenta+on of project – 15 min. + 5 min. ques+ons
• Note! – During the presenta+on you might be asked general ques+ons that relate to any course topic
Deadline: Last lecture
Lecture 1/12
Embedded systems
• Compu+ng systems are everywhere
• Most of us think of “desktop” computers – PC’s – Laptops – Mainframes
– Servers • But there’s another type of compu+ng system
– Far more common...
Lecture 1/13
Embedded systems, cont.
• Embedded compu+ng systems – Compu+ng systems embedded within electronic devices
– Hard to define. Nearly any compu+ng system other than a desktop computer
– Billions of units produced yearly, versus millions of desktop units
– Perhaps 50 per household and per automobile
Lecture 1/14
What is an embedded system?
• Defini+on – an embedded system special-‐purpose computer system, part of a larger system which it controls.
• Notes – A computer is used in such devices primarily as a means to simplify the system design and to provide flexibility.
– Oeen the user of the device is not even aware that a computer is present.
Lecture 1/15
Characteris+cs of embedded systems
• Single-‐func+oned – Dedicated to perform a single func+on
• Complex func+onality – Oeen have to run sophis+cated algorithms or mul+ple algorithms.
• Cell phone, laser printer.
• Tightly-‐constrained – Low cost, low power, small, fast, etc.
• Reac+ve and real-‐+me – Con+nually reacts to changes in the system’s environment
– Must compute certain results in real-‐+me without delay
• Safety-‐cri+cal – Must not endanger human life and the environment
Lecture 1/16
Func+onal vs. non-‐func+onal requirements
• Func+onal requirements – output as a func+on of input
• Non-‐func+onal requirements: – Time required to compute output – Reliability, availability, integrity, maintainability, dependability
– Size, weight, power consump+on, etc.
Lecture 1/17
Real-‐+me systems
• Time – The correctness of the system behavior depends not only on the logical results of the computa+ons, but also on the !me at which these results are produced.
• Real – The reac+on to the outside events must occur during their evolu+on. The system +me must be measured using the same +me scale used for measuring the +me in the controlled environment.
Lecture 1/18
Real-‐+me systems, cont.
Lecture 1/19
Safety-‐cri+cal systems
• Defini+ons – Safety is a property of a system that will not endanger human life or the environment.
– A safety-‐related system is one by which the safety of the equipment or plant is ensured.
• Safety-‐cri?cal system is: – Safety-‐related system, or
– High-‐integrity system
Lecture 1/20
System integrity
• Defini+on – The integrity of a system is its ability to detect faults in its own opera+on and to inform the human operator.
• Notes – The system will enter a failsafe state if faults are detected – High-‐integrity system
• Failure could result large financial loss • Examples: telephone exchanges, communica+on satellites
Lecture 1/21
Failsafe opera+on
• Defini+on – A system is failsafe if it adopts “safe” output states in the event of failure and inability to recover.
• Notes – Example of failsafe opera+on
• Railway signaling system: failsafe corresponds to all the lights on red
– Many systems are not failsafe • Fly-‐by-‐wire system in an aircrae: the only safe state is on the ground
Lecture 1/22
Preliminary topics
• Fundamental concepts of dependability
• Means of achieving dependability • Hazard and risk analysis • Reliability analysis • Hardware redundancy • Informa+on and +me redundancy • Soeware redundancy • Checkpoin+ng • Fault-‐Tolerant Networks
Lecture 1/23
Dependability: an integra+ng concept
Availability Reliability Safety Confiden?ality Integrity Maintainability
Fault preven?on Fault tolerance Fault removal Fault forecas?ng
Faults Errors Failures
aEributes
means
threats
dependability
• Dependability is a property of a system that jus+fies placing one’s reliance on it.
Lecture 1/24
Threats: Faults, Errors & Failures
Cause of error (and failure)
Fault Error
Unintended internal state of subsystem
Failure
Devia+on of actual service from intended service
Lecture 1/25
Threats: Faults, Errors & Failures, cont.
• Fault – Physical defect, imperfec+on, of flaw that occurs within some hardware
or soeware component.
– Examples • Shorts between electrical conductors • Physical flaws or imperfec+ons in semiconductor devices
• Program loop that when entered can never be exited
– Primary cause of an error (and, perhaps, a failure) • Does not necessarily lead to an error e.g., a bit in memory flipped by radia+on
– can cause an error if next opera+on on memory cell is “read”
– causes no error if next opera+on on memory cell is “write”
Lecture 1/26
Threats: Faults, Errors & Failures, cont.
• Error – An incorrect internal state of a computer
• Devia+on from accuracy or correctness
– Example • Physical short results in a line in the circuit permanently being stuck at a logic 1. The physical short is a fault in the circuit. If the line is required to transi+on to a logic 0, the value on the line will be in error.
– The manifesta+on of a fault
– May lead to a failure, but does not have to
Lecture 1/27
Threats: Faults, Errors & Failures, cont.
• Failure – Denotes a devia+on between the actual service and the specified or intended service
– Example • A line in a circuit is responsible for turning a valve on or off: a logic 1 turns the valve on and a logic 0 turns the valve off. If the line is stuck at logic 1, the valve is stuck on. As long as the user of the system wants the valve on, the system will be func+oning correctly. However, when the user wants the valve off, the system will experience a failure.
– The failure is an event (i.e. occurs at some +me instant, if ever) caused by an error
Lecture 1/28
The pathology of failure
Lecture 1/29
Three-‐universe model
1. Physical universe: where the faults occur – Physical en++es: semiconductor devices, mechanical elements,
displays, printers, power supplies
– A fault is a physical defect or altera+on of some component in the physical universe
2. Informa?onal universe: where the error occurs – Units of informa+on: bits, data words
– An error has occurred when some unit of informa+on becomes incorrect
3. External (user’s universe): where failures occur – User sees the effects of faults and errors
– The failure is any devia+on from the desired or expected behavior
Lecture 1/30
Causes of faults
• Problems at any stages of the design process can result in faults within the system.
Lecture 1/31
Causes of faults, cont.
• Specifica+on mistakes – Incorrect algorithms, architectures, hardware or soeware design
specifica+ons • Example: the designer of a digital circuit incorrectly specified the +ming characteris+cs of some of the circuit’s components
• Implementa+on mistakes – Implementa+on: process of turning the hardware and soeware designs
into physical hardware and actual code – Poor design, poor component selec+on, poor construc+on,
soeware coding mistakes • Examples: soeware coding error, a printed circuit board is constructed such that adjacent lines of a circuit are shorted together
Lecture 1/32
Causes of faults, cont.
• Component defects – Manufacturing imperfec+ons, random device defects,
component wear-‐out – Most commonly considered causes of faults
• Examples: bonds breaking within the circuit, corrosion of the metal
• External disturbance – Radia+on, electromagne+c interference, operator mistakes,
environmental extremes, baLle damage • Example: lightning
Lecture 1/33
Failure modes
Lecture 1/34
Failure modes, cont.
• Failure domain – Value failures : incorrect value delivered at interface – Timing failures : right result at the wrong +me (usually late)
• Failure consistency – Consistent failures : all nodes see the same, possibly wrong, result – Inconsistent failures : different nodes see different results
• Failure consequences – Benign failures : essen+ally loss of u+lity of the system – Malign failures : significantly more than loss of u+lity of the system;
catastrophic, e.g. airplane crash • Failure oRenness (failure frequency and persistency)
– Permanent failure : system ceases opera+on un+l it is repaired – Transient failure : system con+nues to operate
• Frequently occurring transient failures are called intermiLent
Lecture 1/35
Failure modes, cont.
• Consistent failures – Fail-‐silent
• system produces correct results or remains quiet (no delivery) – Fail-‐crash
• system produces correct results or stops quietly – Fail-‐stop
• system produces correct results or stops (made known to others)
• Inconsistent failures – Two-‐faced failures, malicious failures, Byzan+ne failures
Lecture 1/36
Propor+on of failures
Lecture 1/37
Dependability aLributes
• Availability: readiness for correct service • Reliability: con+nuity of correct service • Safety: absence of catastrophic consequences on the user(s)
and the environment
• Confiden?ality: absence of unauthorized disclosure of informa+on
• Integrity: absence of improper system altera+ons
• Maintainability: ability to undergo, modifica+ons, and repairs
• Security: the concurrent existence of (a) availability for authorized users only, (b) confiden+ality, and (c) integrity with ‘improper’ taken as meaning ‘unauthorized’.