Next: Remote-Memory-based Transaction Systems Up: On using Network Memory Previous: On using Network Memory

Introduction

Transactions have been valued for their atomicity and recoverability properties that are useful to several systems, ranging from CAD environment to large-scale databases. Unfortunately, adding transaction support to an existing data repository was traditionally thought to be expensive, mostly due to the fast that the performance of transaction-based systems is usually limited by the performance of the magnetic disks that are used to hold the data repository. A major challenge in transaction-based systems is to decouple the performance of transaction management from the performance of the disks.

In this paper we describe a novel way to improve the performance of transaction management by using the collective main memory (hereafter called remote memory) in a Network of Workstations (NOW) [2, 4, 16]. The main idea behind our approach is to reduce the number of disk accesses by substituting them with (remote) main memory accesses. There are two main areas where remote memory can be used to improve performance of a transaction-based system:

Speeding up Read Accesses: The collective main memory in a NOW can be used as a large cache of the transaction-based system. This cache is larger than any single workstation can provide, and thus can be used to hold large amounts of data. Reading data from remote main memory (over a high speed interconnection network), was shown to be significantly faster than reading data from a (local) magnetic disk [10]. This is due to the fact that current high-speed networks provide higher throughput and lower latency than current high-speed disks. Architecture trends suggest that this disparity between magnetic disks and interconnection networks will continue to increase with time [10]. Thereby, the use of remote memory instead of magnetic disks (whenever possible) will continue to result in everincreasing performance improvements.
Speeding up Synchronous Write Operations to Reliable Storage: Transaction-based systems frequently use synchronous write operations to force all modified data to disk at transaction commit time. This is done so that the system will be able to recover in a consistent state after a hardware or a software failure. We advocate (and show in this paper) that the set of remote main memories in a NOW has comparable reliability to a magnetic disk, and thus can be used to hold sensitive data that must survive a system crash. Synchronous write operations are dominated by the latency of the medium where the write is performed. Remote memory latency is dictated by network latency and is usually significantly lower than disk latency. Moreover, remote memory latency is mostly induced by communication protocol processing (e.g. TCP/IP), which improves with processor speed. On the other hand, disk latency is mostly induced by mechanical arm movements, which does not improve at as fast. Thus, data can be synchronously written to remote memories much faster than they can be written to a magnetic disk. This performance disparity will continue to increase, according to current architecture trends.

The first of the above issues (reading from remote memory) has been somewhat explored in the areas of file systems [1, 18, 21], paging [22] and global memory databases for workstation clusters [15, 17]. All previous work suggests that the use of remote main memory as a large file (database) cache results in significant performance improvements. The thrust of this paper is on exploring the second issue: using remote memory to speed up synchronous disk write operations. We believe that transaction-based systems make lots of small synchronous write operations to stable storage, and thus they are going to benefit significantly from any improvements to synchronous disk write operations.

Recent architecture trends in the area of interconnection networks, and Networks of Workstations (NOWs) make it more attractive than ever to use the main memory of remote workstations (within the same workstation cluster) to speedup synchronous disk I/O operations, because the latency of Local Area Networks has significantly decreased over the last few years. Traditional interconnection networks (like Ethernet and FDDI) have latency in the range of several hundred microseconds. ATM networks have latency in the range of a few hundred microseconds [26], while more recent networks (like SCI [24] and Memory Channel [16]) have latency in the range of few microseconds. At the same time, disk latency has remained in the order of a few milliseconds for several years now, and is not expected to improve at a significant rate. Thus, operations dominated by disk latency (i.e. synchronous disk write operations) will remain in the millisecond range. On the contrary, operations dominated by network latency (e.g. synchronous remote memory write operations) may complete within microseconds.

Based on the current architecture trends, we believe that transaction-based systems should make use of the remote main memory of a NOW, in order to avoid (synchronous) disk data transfers and substitute them with (synchronous) network data transfers. To demonstrate our approach, we implemented our approach within two existing transaction-based systems: The EXODUS storage manager [6], and the RVM (Recoverable Virtual Memory) System [25]. Section 2 describes the design and the implementation of our systems. We have run several benchmarks on top of the modified transaction systems and have observed performance improvements up to two orders of magnitude. We report our performance results in section 3. Section 4 places our work in context by surveying previous work and comparing it with our approach. Finally section 5 concludes the paper.

Next: Remote-Memory-based Transaction Systems Up: On using Network Memory Previous: On using Network Memory

Evangelos Markatos
Fri Apr 11 14:07:02 EET DST 1997