The performance of transaction-based systems
is usually limited by slow disk accesses.
During its lifetime,
a transaction makes a number of disk accesses to read its data
(if the data have not been cached in main memory),
makes a few calculations on the data, writes its results
back (via a Log file), and then, if all goes well, it commits.
Although disk read operations may be reduced with the help of
large main memory caches (or even network main memory caches [1, 15]),
disk write operations at transaction
commit time are difficult to avoid, since the transaction's
modified data and meta-data have to reach stable
storage before the transaction
is able to commit, otherwise a system crash would leave
the data repository in a non-consistent state.
Several current transaction based systems use
a magnetic disk as the stable storage, and force all dirty data
to it using the fsync(2) system call [6, 25].
Magnetic disks can usually survive power and software failures,
thereby providing a stable medium to store data that must survive crashes.
We believe however, that in Networks of Workstations the collective main memory of all workstations in the system can be made reliable in such a way as to survive power outages and software failures, and thus become a viable alternative to disk storage for sensitive transaction data. We believe two are the main sources of system crashes that may lead to data loss: (i) software failures, and (ii) power loss. We deal with each of them in turn:
Power losses are the result of malfunctions in the power
supply system. To cope with power losses we assume the existence of
two power supplies: one could be the main power supply, and
the second could be provided by an uninterrupted power supply (UPS).
A UPS for a workstation can cost less than $100.00, making it
a small percentage (
) of the cost of a workstation.
If workstations are connected to UPSs, they will retain their main
memory contents even after a power loss. However, if a power loss is detected,
the UPS gives plenty of time to workstations to save their
sensitive data to magnetic disks.
Based on our description we advocate that using mirroring and UPSs, we can make the (remote) main memory, a storage medium as reliable as the magnetic disk. Thus, sensitive data that need to be synchronously written to disk, can be (synchronously) written to remote main memory with the same level of reliability. Our described main memory system suffers from data loss once every several years, which is the same level of reliability current magnetic disks provide.