Parity Caching

Next: Discussion - Future Up: Reliability Previous: Reliability

Parity Caching

To reduce the main memory requirements of mirroring, and the long latency of parity, we developed the method of parity caching described in section 3. Summarizing, each client reserves a small number of local pages to hold parity frames. When a page is swapped in or out, its parity frame is swapped in as well (if not already at the client's parity frames), and the new parity is computed. This method increases the number of page ins and page outs, because besides prorram pages, parity frames are swapped in and out as well. To measure the additional overhead of parity caching, and compare it to mirroring, we use execution driven simulation on top of the same DEC Alpha 3000 model 300 workstation. We use ATOM [16], an object file rewriting tool, that executes each application, while at the same time simulates the reliability policy we want to evaluate. The policies we evaluate are:

NO_RELIABILITY: No redundant information is kept. When a server crashes, the application will not be able to continue its execution.
MIRRORING: When a page is swapped out, it is sent to two servers instead of one, so that when one of them fails, the other will still have all the information the application needs.
PARITY_CACHING: When a page is swapped in or out, its parity frame is swapped in (if not already there) and is XOR'ed with the page. When a server crashes, its pages which are not in the client's main memory can be reconstructed by XORing the relevant pages of the other servers and the parity frames. In our experiments, we simulated 50 memory servers, and a client that caches as many as 8 of the parity frames locally. Pages are distributed round robin among the available servers.

The applications we simulate are:

MVEC: Matrix vector multiplication of a 20002000 matrix.
GAUSS: Gaussian elimination on a 20002000 matrix.
SORT: Sorting of an array of 32 Mbytes, using the standard quicksort algorithm.

The architecture simulated is a DEC Alpha 3000 model 300 workstation with 16 Mbytes of main memory available to applications. Only eight pages of the main memory were used to hold only parity frames. A total of 50 servers were simulated for each client.

If all workstations are connected via a broadcast interconnection network, no extra page transfers are needed to implement a reliable policy. For example, in MIRRORING, each swapped-out page needs to be broadcasted only once over the interconnection network to reach all servers. Similarly, PARITY_CACHING does nor need extra parity frame transfers. If the workstations that keep the parity frames snoop in the interconnection network, they can intercept all swapped-in and swapped-out pages, and update their parity records. If, however, the interconnection network is not broadcast-based, then extra page transfers are needed for the reliable policies. For example, MIRRORING doubles the number of page transfers for all swapped-out pages, while PARITY_CACHING increases the number of page transfers by a factor that depends on the effectiveness of caching. The exact magnitude of this factor is studied in our simulations, where we measure the number of pages swapped-in (including parity pages), and swapped-out (including parity and mirror pages) by each policy. The results are plotted in graphs 5 and 6. We see that the number of pages swapped in for MIRRORING and NO_RELIABILITY are the same, but the number of pages swapped out for MIRRORING are twice that of the NO_RELIABILITY. For PARITY_CACHING, both the number of pages swapped in and swapped out, are within a 5% of those for NO_RELIABILITY. The reason is that all applications have some locality of reference. Thus, pages swapped-out within a short time interval using some LRU policy, will probably be swapped-in also within a short time interval. Pages who were initially swapped out close to each other, belong to the same parity frame. Thus, as long as these pages are swapped close in time, their parity frame will reside in the client's cache, and no extra page transfers to move the parity will be needed.

We see that reliability comes at little extra cost, actually, from 0% to 5%, depending on the nature of the interconnection network, the application, and the policy used. We believe that the little extra overhead is a small cost to pay for the benefit provided.

Figure 5: Number of Pages swapped in.

Figure 6: Number of Pages swapped out.

Next: Discussion - Future Up: Reliability Previous: Reliability

Evangelos Markatos
Fri Mar 24 14:41:51 EET 1995