In this set of experiments we investigate further the effect that the size of the time interval will have on the performance of Top-10 prefetching. Recall that a client has to make a certain number of references within the time interval before it is enabled to start prefetching from a server. Intuitively, we believe that a very small time interval will result in low hit ratio, since few clients will make enough references to qualify for prefetching. On the other hand, a very long interval will imply that more clients would make enough references to qualify for prefetching, thus increasing hit ratio and traffic as well. However, a very long time interval may result in dis-proportionate increase in traffic, since the documents that are popular at the beginning of the interval may not be popular at the end of it.
Figure 14: Hit Ratio as a function of the size of the TOP-10.
Figure 15: Traffic Increase as a function of the size of the TOP-10.
Figure 16: Hit Ratio as a function of the size of the TOP-10.
Figure 17: Traffic Increase as a function of the size of the TOP-10.
Figure 18: Hit Ratio as a function of the size of the TOP-10.
Figure 19: Traffic Increase as a function of the size of the TOP-10.
Figure 20: Hit Ratio as a function of the size of the TOP-10.
Figure 21: Traffic Increase as a function of the size of the TOP-10.
Figure 22: Hit Ratio as a function of the size of the TOP-10.
Figure 23: Traffic Increase as a function of the size of the TOP-10.
To investigate the influence of interval size, we run the simulations again for intervals ranging from 50,000 accesses to 350,000 accesses, and plotted the results for each server in figure 14 to 23. For each server we plot both the hit ratio and the traffic increase. We see that different servers achieve the best hit rates for different time intervals. For example, FORTH achieves best hit rate for time interval of 50,000 references, while FORTHnet achieves best hit rate for 350,000 references. The other servers achieve their best performance for time intervals of 200,000 references and larger. However, we should closely observe the large hit ratio may imply significant traffic increase as well. In all figures we see that for large values of the time interval and the TOP-10, the traffic increase may be unacceptably high. Fortunately, when TOP-10 is less than 500, the traffic increase is always low, while the hit ratio is close to the hit ratio achieved for much higher values of TOP-10. Thus, a value of TOP-10 equal to 500 seems a reasonable choice in all cases. The only exception to the rule seems to be NASA, that seems to achieve a good balance between hit ratio and traffic increase for values of TOP-10 around 100. The reason for this traffic increase lies in the size of the documents NASA provides to its clients: popular NASA documents are much bigger than popular documents of other servers. Figure 24 plots the size of the most popular files of all servers. We can easily see that NASA serves the largest files. For example, the 500 most popular documents of NASA are 22 MBytes long, while the 500 most popular documents of FORTHnet are only 2.2 Mbytes long. The reason is that NASA provides lots of large images that are very popular among many people. Thus, prefetching lots of documents from NASA may result in high traffic increase. Moreover unsuccessfully prefetched documents from NASA result in much higher traffic than unsuccessfully prefetched documents from any other server.
Figure 24: Size of the most popular documents.