Reliable Main Memory

Next: EXODUS and RVM Up: Remote-Memory-based Transaction Systems Previous: Remote-Memory-based Transaction Systems

Reliable Main Memory

The performance of transaction-based systems is usually limited by slow disk accesses. During its lifetime, a transaction makes a number of disk accesses to read its data (if the data have not been cached in main memory), makes a few calculations on the data, writes its results back (via a Log file), and then, if all goes well, it commits. Although disk read operations may be reduced with the help of large main memory caches (or even network main memory caches [1, 15]), disk write operations at transaction commit time are difficult to avoid, since the transaction's modified data and meta-data have to reach stable storage before the transaction is able to commit, otherwise a system crash would leave the data repository in a non-consistent state. Several current transaction based systems use a magnetic disk as the stable storage, and force all dirty data to it using the fsync(2) system call [6, 25]. Magnetic disks can usually survive power and software failures, thereby providing a stable medium to store data that must survive crashes.

We believe however, that in Networks of Workstations the collective main memory of all workstations in the system can be made reliable in such a way as to survive power outages and software failures, and thus become a viable alternative to disk storage for sensitive transaction data. We believe two are the main sources of system crashes that may lead to data loss: (i) software failures, and (ii) power loss. We deal with each of them in turn:

Software failures are the result of software malfunctions, operating system crashes, etc. When the operating system crashes and reboots, it may destroy the contents of its main memory, thereby eliminating all data that have not been written to stable storage. Data that must survive software failures are replicated to the main memories of (at least) two workstations that are connected in two different power supplies (e.g. one is on a UPS, and the other is on the main power supply). Since different workstations run different copies of the operating system, they will probably crash (due to software errors) independent of each other. Thus, if the data that must survive software crashes are replicated to two workstations, they will survive software failures with high probability.
Power losses are the result of malfunctions in the power supply system. To cope with power losses we assume the existence of two power supplies: one could be the main power supply, and the second could be provided by an uninterrupted power supply (UPS). A UPS for a workstation can cost less than $100.00, making it a small percentage ( ) of the cost of a workstation. If workstations are connected to UPSs, they will retain their main memory contents even after a power loss. However, if a power loss is detected, the UPS gives plenty of time to workstations to save their sensitive data to magnetic disks.

Based on our description we advocate that using mirroring and UPSs, we can make the (remote) main memory, a storage medium as reliable as the magnetic disk. Thus, sensitive data that need to be synchronously written to disk, can be (synchronously) written to remote main memory with the same level of reliability. Our described main memory system suffers from data loss once every several years, which is the same level of reliability current magnetic disks provide.

Next: EXODUS and RVM Up: Remote-Memory-based Transaction Systems Previous: Remote-Memory-based Transaction Systems

Evangelos Markatos
Fri Apr 11 14:07:02 EET DST 1997