Saturday, March 3, 2012

Failure Perception

A fail-silent failure is one in which the failing unit either present incorrect result or no result at all.  Fail-slient failure is the easist type of failure to be tolerated because observerd failure is that the failing unit has stopped working.  The reason for failure is unknown but the failing element is identified and the failure is contained and not spread to other part of the system.

A crash-failure is one where the unit stops after the first fail-silent failure. 

A fail-stop failure is a crash-failure that is visible to the rest of the system.

Consistent failures are seen as the same kind of failure by all observers in the system.  Inconsistent failures are ones that appear different to different observers.  These are also called two-faced failures, malicious failure of Byzantine failures  These are most diffiuclt to isolate or correct.  An example of consistent failure is a system that report 1 to all questions.  An example of inconsistent failure is a system that report 1 to user 1 and 2 to the rest of the users, or route all traffic to a certain network address and none to another.

It is a common design principle for fault tolerance is to assume only one failure at any one time.  However, many failures have occured when this assumption is invalid. 

Fail-silent failures requires n+1 to tolerate.  Consistent failure requires 2n+1 to tolerate.  Malicious failure requires 3n+1 to tolerate.  The computer system in Space Shuttle is designed to tolerate 2 simultaneous failures which must be consistent but need not be fail-silent requires to have 5 computer systems. 

No comments: