The performance of transaction-based systems is usually limited by slow disk accesses. During its lifetime, a transaction makes a number of disk accesses to read its data (if the data have not been cached in main memory), makes a few calculations on the data, writes its results back (via a Log file), and then, if all goes well, it commits. Although disk read operations may be reduced with the help of large main memory caches (or even network main memory caches [1, 15]), disk write operations at transaction commit time are difficult to avoid, since the transaction's modified data and meta-data have to reach stable storage before the transaction is able to commit, otherwise a system crash would leave the data repository in a non-consistent state. Several current transaction based systems use a magnetic disk as the stable storage, and force all dirty data to it using the fsync(2) system call [6, 25]. Magnetic disks can usually survive power and software failures, thereby providing a stable medium to store data that must survive crashes.
We believe however, that in Networks of Workstations the collective main memory of all workstations in the system can be made reliable in such a way as to survive power outages and software failures, and thus become a viable alternative to disk storage for sensitive transaction data. We believe two are the main sources of system crashes that may lead to data loss: (i) software failures, and (ii) power loss. We deal with each of them in turn:
Power losses are the result of malfunctions in the power supply system. To cope with power losses we assume the existence of two power supplies: one could be the main power supply, and the second could be provided by an uninterrupted power supply (UPS). A UPS for a workstation can cost less than $100.00, making it a small percentage ( ) of the cost of a workstation. If workstations are connected to UPSs, they will retain their main memory contents even after a power loss. However, if a power loss is detected, the UPS gives plenty of time to workstations to save their sensitive data to magnetic disks.
Based on our description we advocate that using mirroring and UPSs, we can make the (remote) main memory, a storage medium as reliable as the magnetic disk. Thus, sensitive data that need to be synchronously written to disk, can be (synchronously) written to remote main memory with the same level of reliability. Our described main memory system suffers from data loss once every several years, which is the same level of reliability current magnetic disks provide.