Recovery

Next: Implementation Up: Remote-Memory-based Transaction Systems Previous: EXODUS and RVM

Recovery

In the event of a workstation/network crash, our system needs to recover data and continue its operation. If the local workstation crashes, and reboots, it will read all its ``seemingly lost'' data from the remote memory, store them safely on the disk, and continue its operation normally. If the remote workstation crashes, the local transaction manager will realize it after a timeout period. After the timeout, the local manager may either search for another remote memory server, or just stop using remote memory, and commit transactions to disk as usual. If the network crashes, the local workstation will stop using remote memory and will commit all transactions to disk. In all circumstances, the system can recover within a few seconds, in the worst case. The reason is that at all times there exist two copies of the log data: if one copy is lost due to a crash, the system can easily switch to the other copy quickly.

Evangelos Markatos
Fri Apr 11 14:07:02 EET DST 1997