If all n different events have same mean time m, then the Mean time to the first one of the events = m/n
Theorem 1:
Mean time to event MT(A)=1/P(A)
Theorem 2:
P(A or B) = P(A) + P(B) - P(A and B)
Assuming A and B are independent
= P(A) + P(B) - P(A) * P(B)
= P(A) + P(B) (if P(A) and P(B) are very small)
Theorem 3:
If events A,B, have mean time MT(A), MT(B), then the mean time to the first event is 1/(P(A) + P(B))
Prove:
if p is the probability of an event in given time, then the mean time m = 1/p,
and there are n events, then the probability of one of these events = n * p
Therefore, mean time to one of these events = 1/ n*p = m/n
Fault Tolerance Strategy:
- Fail-vote:
use two or more modules and compare their outputs, stops if there are no majority outputs agreeing. If fails twices as often with duplication but gives clean failure semantics
2.Fail-fast:
Similar to the fail vote except the system senses which modules are available and then uses the majority of the available modules.
Improve the software reliability:
- Periodic transfer of data: The primary process does all the work until it fails, and the second process called backup takes over the primary and continues
- Checkpoint-restart: The primary records its state on a duplexed storage module, at takeover the secondary starts reading the state of the primary from the duplexed storage and resumes the application.
- Checkpoint messages: The primary sends its state changes as messages to the backup. At takeover the backup gets its current state from the most recent checkpoint message.
- Persistent: backup restarts in the null state and lets Transaction mechanism to clean up all uncommitted transactions. This is the approach taken by the most Database Systems.
- Highly available storage
- write to several storage modules.
- have some kind of checksum to make sure that the data read is correct with a very high probability.
- Disk mirroring is an example of this.
- Shadowing is another mirroring technique which allows atomic write operations.
- Highly available Processes
- process pairing
- transaction based restart
- checkpoint restart