Next: Discussion Up: A Top-10 Approach to Previous: Changing the Time Interval

Previous Work

The area of web prefetching is rather new and is currently being actively explored. Padmanabhan [13] suggested a server-initiated prefetching algorithm. He observed that web document requests have interdependencies, that is, if a document is requested from a web server by some client, then probably document will also be requested within a small time interval , by the same client. Each web server keeps this dependency information in the form of a graph. The nodes of the graph are documents; an edge between documents and , represents how probable is to access document after document has been accessed. When a client requests document , the server along with document , sends all the documents , that are likely to be accessed next. Alternatively, the server along with document sends only the names of the documents that are likely to be accessed, leaving the initiative for prefetching to the client. Padmanabhan validated his approach using trace-driven simulations that gave very encouraging results: e.g. a 36% reduction in network latency can be achieved at the cost of 40% increase in network traffic.

Bestavros has also proposed a similar approach for server-initiated prefetching [4, 3]. For all pairs of documents and , Bestavros calculates the probability p[i,j] with which document will be requested within time interval after document is requested. Based on those probabilities, a server can advice clients on which documents to prefetch. Bestavros conducted trace-driven simulations which provided very encouraging results. For example, his approach using 10% extra bandwidth only, can result in 23% reduction in document miss rate.

Contrary to the previously proposed sophisticated prefetching algorithms, our Top-10 approach uses a simple and easy-to-calculate metric. Most web servers routinely calculate their most popular documents, among other statistics. Thus, no extra work is needed on behalf of the server in order to calculate which documents should be prefetched. Interestingly enough, Top-10 achieves similar (and sometimes better) results than other prefetching heuristics. For example, Top-10 has shown to achieve close to 60% hit rate, at only 10% traffic increase (see figure 10), because TOP-10 capitalizes on two web invariants:

Popular documents are very popular,
The use of proxies increases the hit ratio and reduces network traffic.

Gwertzman [10, 9] proposed a geographical push-cashing approach to reduce a web server's load: when the load of a web server exceeds some limit, the server replicates (pushes) the most popular of its documents to other cooperating servers that have reduced load, so that clients will be able to make future requests for these documents from the other servers. Push-cashing replicates only popular documents (much like Top-10) in some server, while Top-10 replicates them in some proxy close to the client. In push-cashing, the client still needs to request the replicated documents from some, usually non-local server, while in Top-10 the clients request the documents from a local proxy. To make matters worse, in push-cashing clients need to know which server to ask the documents from, that may involve a request to the original server, which adds even more to the client's latency. Thus, although push-cashing may off-load a busy server, by replicating its most popular documents, it still requires the client to make one or more requests to non-local servers in order to find the replicated data. On the contrary, Top-10 replicates popular documents only to local proxies, so that clients always make local accesses to replicated data.

Wachsberg et al.[17] propose the use of prefetching as a way to improve performance of web browsing over low-bandwidth links. They propose a client-based approach where each proxy will keep a list of documents needed by its clients, and it will decide which of them to prefetch. However, they have reported no performance results yet.

The benefits of prefetching are going to be investigated within the LowLat [16] project. In the LowLat approach, a preloader will be responsible for communicating with the web servers and prefetching documents from them. The prefetching process will take into account several factors including bandwidth, cache load and space, server load, etc. Unfortunately, the prefetching policies and their associated benefits have not been reported as of the time of this writing (summer 1996).

Recently, there has been considerable work on web Caching, that is, caching of popular documents close to clients [1, 5, 7, 6, 8, 11, 12, 14]. All this work aims at reducing both network traffic and server load by keeping data close to clients that re-use them. Most web servers and proxies today support some form of caching. Our work complements the research in caching, since all benefits of prefetching are in addition to those of caching. While caching attempts to provide fast access to a document the second time it of accessed, prefetching provides fast access to a document, the first time it is accessed. Moreover, the already existing infrastructure for caching (proxies, etc.) can also be exploited for prefetching.

Summarizing, we believe that out Top-10 approach is an easy to calculate algorithm that achieves effective prefetching of documents in the web.

Next: Discussion Up: A Top-10 Approach to Previous: Changing the Time Interval

Evangelos Markatos
Fri Nov 1 16:38:26 EET 1996