Next: Conclusions Up: On using Network Memory Previous: OO7

Related Work

Using Remote Main Memory to improve the performance and reliability of I/O in a Network of Workstations (NOW) has been previously explored in the literature. For example, several file systems [1, 8, 18, 21] use the collective main memory of several clients and servers as a large file system cache. Paging systems may also use remote main memory in a workstation cluster to improve application performance [14, 19, 20, 22]. Even Distributed Shared Memory systems, can exploit the remote main memory in a NOW [13, 9] for increased performance and reliability.

The closest of these systems to our research is the Harp file system [21]. Harp uses replicated file servers to tolerate single server failure. Each file server is equipped with a UPS to tolerate power failures, and speedup synchronous write operations. Although RRVM and REX use similar approaches (redundant power supplies and information replication) to survive both hardware and software failures, there are several differences between our work and Harp:

Data Granularity: Our work is concerned mostly with transaction-based systems that make lots of small read and write operations. Being able to efficiently read and write a small amount of data is particularly important for the performance of these systems. On the contrary, Harp (by being a file system) can not address data at a granularity finer than a file block. Thus, to read/write even a single byte of data, Harp will have to read/write an entire block of data, which leads to significant performance degradation. Our performance results suggest that on top of a workstation cluster connected via Ethernet, RRVM is able to sustain several hundred (short) transaction operations per second (see figure 1). Published results for Harp [21], suggest that it is able to sustain several tens of NFS operations per second. This implies, that if each transaction needs at least one NFS operation, then the number of transactions per second that Harp will be able to sustain is an order of magnitude lower than our RRVM system.
Open User Level Implementation: RRVM is linked with user applications as a library, outside the operating system kernel. Thus, it is portable and easily modifiable; any ordinary computer user can link RRVM to their program and run it. In contrast, Harp runs inside the operating system kernel, which makes it difficult to port and install. Users will be able to benefit from Harp, only if the file system installed by the system administrators is Harp, or a Harp derivative. Currently there are very few file systems (if at all) that provide functionality similar to Harp. Moreover, Harp adds significant overhead to applications that do not want/need the reliability Harp offers, but are forced to use it and pay for it. On the contrary, in our systems, data replication (for reliability) is only done for database applications, that need it and are willing to suffer its overhead. All other applications run without any intervention from our systems.

Summarizing, Harp is a kernel-level file system that sustains hardware and software failures, while our approach leads to open, portable, flexible, and lightweight user-level transaction-based systems.

The Rio file system changes the operating system to avoid destroying its main memory contents in case of a crash [7]. Thus, if a workstation is equipped with a UPS and the Rio file system, it can survive all failures: power failures do not happen (due to the UPS), and software failures do not destroy the contents of the main memory. Systems like Rio may simplify the implementation of our approach significantly. Unfortunately, few file systems (if any at all) follow Rio's approach (although they should). However, even Rio may lead to data loss in case of UPS malfunction. In these cases, our approach that keeps two copies of sensitive data in two workstations connected to two different power supplies, will be able to avoid data loss.

Network file systems like Sprite [23] and xfs [1, 11], can also be used to store replicated data and build a reliable network main memory. However, our approach, would still result in better performance due to the minimum (block) size transfers that all file systems are forced to have. Moreover, our approach would result in wider portability since, being user-level, it can run on top of any operating system, while several file systems, are implemented inside the operating system kernel.

Franklin, Carey and Livny have proposed the use of remote main memory in a NOW as a large database cache [15]. They validate their approach using simulation, and report very encouraging results. Griffioen et. al proposed that DERBY storage manager, that exploits remote memory and UPSs to reliably store a transaction's data [17]. They simulate the performance of their system and provide encouraging results. Although our approach is related to the DERBY system, there are significant differences: (i) we provide a full-fledged implementation of our approach on two independent transaction-based systems, (ii) we demonstrate the performance improvements of our system using the same benchmarks that demonstrated the performance of the original RVM and EXODUS systems, (iii) DERBY places the burden of data reliability to the clients of the database, while we place it to the transaction managers who have better knowledge of how to manage the various resources (memory, disks) in the system.

Feeley et. al. proposed a generalized memory management system, where the collective main memory of all workstations in a cluster is handled by the operating system [12]. Their experiments suggest that generalized memory management results in performance improvements. For example, OO7 on top of their system runs up to 2.5 times faster, than it used to run on top of a standard UNIX system. We believe that our approach complements this work in the sense that both [15] and [12] improve the performance of read accesses (by providing large caches), while our approach improves the performance of synchronous write accesses. Thus, if used both, they improve the performance of database applications even further.

To speed up database and file system write performance, several researchers have proposed to use special hardware. For example, Wu and Zwaenepoel have designed and simulated eNVy [27], a large non-volatile main memory storage system built primarily with FLASH memory. Their simulation results suggest that a 2 Gbyte eNVy system can support I/O rates corresponding to 30,000 transactions per second. To avoid frequent writes to FLASH memory, eNVy uses about 24 Mbytes of battery-backed SRAM per Gbyte of FLASH memory. Although the cost of eNVy is comparable to the cost of a DRAM system of the same size, eNVy realizes its cost effectiveness only for very large configurations: for hundreds of Mbytes. Furthermore, although the chip cost of eNVy may be low, its market price will probably be much higher, unless it is massively produced and sold. Thus, eNVy would be used only for expensive and high-performance database servers, and not for ordinary workstations. As another example, Baker et al. have proposed the use of battery-backed SRAM to improve file system performance [3]. Through trace-driven simulation they have shown that even a small amount of SRAM reduces disk accesses between 20% and 90% even for write-optimized file systems, like log-based file systems.

Next: Conclusions Up: On using Network Memory Previous: OO7

Evangelos Markatos
Fri Apr 11 14:07:02 EET DST 1997