Although figure 3 suggests that
the performance of REMOTE_MEMORY is significantly better than
the performance of DISK, the completion time
of an application even under REMOTE_MEMORY
may be unacceptably high.
Hopefully, the performance of REMOTE_MEMORY
will be improved as soon as the Ethernet interconnection network is
substituted with a faster one (e.g. FDDI, ATM, FCS, etc.).
To evaluate the performance of the
applications on top of faster networks, or faster disks we
make detailed performance
measurements that separate
the completion time of the application
into three factors:
(i) bandwidth-dependent blocking time,
(ii) useful user time, and
(iii) protocol-dependent systems overhead.
Using the provided time command we measure the elapsed time
for each application which is the sum of factors (i)-(iii). The
same command also provides the user-time (factor (ii)), and the
system time (factor (iii)).
If from the elapsed time, we subtract
the user plus the system time, we get the time the system was idle
waiting for pages to go through the interconnection network (factor (i)).
By dividing this idle time
with the number of page ins plus page outs, we get the average time
the application waits for each page to go through
the interconnection network. Assuming that an X
times faster interconnection network will reduce this waiting time by
a factor of X, we can predict the
completion time of the application on the faster network
by adding the measures user and system times, with the
predicted blocking time.
We made all these measurements on our FFT application, and predict its performance on a system with an interconnection network which is two and ten times as fast as the Ethernet. We also predict its completion time on a system with twice as fast disk ( DISK*2), and on a system that has enough memory to hold all the working set of the application ( ALL_MAIN_MEMORY). The predicted execution times, along with the measured execution times of DISK and REMOTE_MEMORY are plotted in figure 4. We see that ETHERNET*10 performs very close to ALL_MAIN_MEMORY, and significantly better than both REMOTE_MEMORY and DISK.
To understand the results shown in figure 4,
we analyze the execution time of FFT
with 28Mbytes of input. The measured elapsed time is 208 seconds,
consisting of
78.5 sec of useful user time,
5 sec of system time,
and 124 sec of network blocking time,
spent waiting for pages to go through the Ethernet.
During the same run, the application suffered
6520 page-outs and 7791 page-ins.
The average waiting time for a page transfer
(both for page ins and page outs) on top of the Ethernet
is , or about 8.6 ms. Using a ten times
faster interconnection network, the average waiting time will be reduced
at least to 0.86 ms. Thus, the total completion time of FFT would be
at most
sec,
divided as follows: 82% in user time, 5% in system time, and
13% in network blocking time.
We see that a 100 Mbit/sec interconnection network reduces the total
paging overhead to a mere 17% of the total applications execution time.
We believe that most users would be willing to pay such an overhead
in order to run an application that does not fit in main memory.
Figure: Performance of FFT for various Architecture Alternatives.
DISK is the measured completion time when paging to
a local disk.
REMOTE_MEMORY is the measured completion time when paging to
remote memory on top of the Ethernet.
ETHERNET*2 and ETHERNET*10 is the predicted completion time when
using remote memory as a paging device, on top of a network
that is twice and ten times
as fast as the Ethernet interconnection network.
DISK*2 is the predicted completion time when using
a twice as fast disk for paging.
ALL_MAIN_MEMORY is the predicted completion time of FFT
when we use the same workstation but with enough memory to
hold its entire working set.