Lecture #13: Recovery
These topics are from Chapter 12 (Recovery) in Advanced
Concepts in OS.
Topics in this Chapter
- fault recovery: terminology & background
- single-process recovery approaches
- problems of multi-process &distributed recovery
- consistent set of local checkpoints
Terminology
- A system consists of a set of hardware and software components
and is designed to provide a specified service.
- Failure of a system occurs when the system does not perform its
services in the manner specified.
- An erroneous state of the system is a state which could lead to a system failure by a sequence of valid state transitions
- A fault is an anomalous physical condition.
- An error is a manifestation of a fault in a system, which can lead to
system failure.
Recovery
- Failure recovery is a process that restores an erroneous state
to an error-free state.
(after a failure, restoring system to its "normal" state)
Failure Classification
- process failure
- system failure
- secondary storage failure
- communication medium failure
What are some causes for each?
Tolerating Process Failures
- signal process to recover internally
- restart process from a prior state
- abort process
What are some situations where each is appropriate?
Recovering from System Failures
- amnesia -- restart in predefined state
- partial amnesia -- reset part of the state to predefined
- pause -- roll back to before failure
- halting -- give up
Tolerating Secondary Storage Failures
- archiving (periodic backup)
- mirroring (continuous)
- activity logging
Tolerating Communication Medium Failures
- ack & resend
- more complex fault-tolerant algorithms
Backward versus Forward Error Recovery
- skip forward to a new correct state
- requires contextual knowledge of what "forward" is
- go back to a previous correct state
- overhead: takes time to save and restore state
- fault may repeat (¥ cycling)
- recovery may be impossible
Backward Error Recovery
- based on recovery points
- two approaches:
- operation-based recovery
- state-based recovery
System Model
Stable Storage
- does not lose information in the event of system failure
- is used to keep logs & recovery points
- algorithms in this chapter assume an underlying stable storage
system already exists
Two Approaches to Fault Tolerance
- operation based
- record a log (audit trail) of the operations performed
- restore previous state by reversing steps
- state based
- record a snapshot of the state (checkpoint)
- restore state by reloading snapshot (rollback)
Practical systems employ a combination of the two approaches,
e.g., logging with periodic full-DB snapshots for archive.
Fundamental Issues in Crash Recovery
- disk writes are only atomic by sector
- updates and commits require multiple writes
- a crash may occur between writes
- log contains record of updates, commits, and aborts
- data is written to disk asynchronously
- DB is cached
- log is buffered
The textbook jumps right into the problem of supporting crash
recovery, without first reviewing any basic transaction models.
The following are two more basic models than those mentioned in
the text.
Basic Deferred-Update Model
- save a transaction's updates as it runs,
in temporary storage
- use the saved updates to update the database
when the transaction commits
- update: record a redo record
(e.g. the new value of the item being updated)
in an intention list
- read: combine the intention list and the database to determine
the desired value
- commit: update the database by applying the intention list
in order, starting with the first operation done by the
transaction
- abort: discard the transaction's intention list
Basic Update-In-Place Model
- update the DB as transaction runs
- undo the updates if the transaction aborts
- update: record an undo
record (e.g., the old value of the item being updated)
to an undo log, and then update the database
- read:
- commit: discard the transaction's undo log
- abort: use the undo records in the transaction's undo log
to back out the transaction's updates, by backing out the
operations in the reverse of the order in which they were
originally done
What provides for disk crash recovery?
Extended Update-In-Place Model
- update: modify the online DB
and record both undo
and redo records, including:
- name/location of object
- old state/value of object (for undo)
- new state/value of object (for redo)
in a safe order
- read: read the current value and apply the transaction's undo log
- commit: discard/invalidate the transaction's undo log
- abort: use the undo records in the transaction's undo log
to back out the transaction's updates
Where does the stable storage fit in?
Crash Recovery with Update-In-Place
We now have a way to reconstruct the DB system in event of a
crash, starting from an archived snapshot and the subsequent log:
- transactions not logged as committed are treated as aborted
- back out active or aborted transactions, using undo records
- do DB updates that may have been in cache, using redo records
If we are starting with a snapshot, why do we need to worry
about active, uncommitted, and aborted transactions?
Problem: DB write before log write
There is a defect in the above scheme
- if the cached new value of X is written to DB on disk
- and then the system crashes, before the old value of X is written to log
How to solve?
Solution: Write-Ahead-Log
Before a block is written to DB disk,
make sure the corresponding undo record is completely written
to the log disk.
The log must be forced to disk as part
of committing a transaction.
Crash Recovery with Write-Ahead-Log
- redo phase: redo all the updates in the log,
including undo operations of aborted transactions
- undo phase: abort all transactions that have
no commit or abort record in the log, using the usual
undo records in the log
State Based Approach
- based on checkpoints of entire state of process
- recovery does rollback to checkpointed state
- use of shadow pages can reduce size of checkpoints
Problems in Distributed/Concurrent Systems
- communicating processes must coordinate checkpoints & rollbacks
- lost messages
- orphan messages
- livelocks
Orphan Messages
Note domino effect if Z is rolled back
Lost Messages
What is the difference between a lost message
and an orphan message?
Livelock
These all motivate need for coordinating checkpoints & recovery
Strongly Consistent Set of Checkpoints
There is no information flow between any processes in the set during
the time interval spanned by the checkpoints, i.e., no messages in
transit.
Consistent Set of Checkpoints
There may be information flow between the processes, but each
message recorded as received should be recorded as sent. That is,
there are no orphan messages.
What is the remaining problem here?
Observations
- This is an application of concepts in Ch 5 for global state
- Strong consistency does not allow lost messages
- Consistency alone still allows lost messages(*)
- Strong consistency is more costly to achieve
- Cost: blocking during checkpoint