Figure 1: Top-10 prefetching operates in a client-proxy-server framework.
The Top-10 approach to prefetching is based on the cooperation of clients and servers to make successful prefetch operations. The server side is responsible for periodically calculating a list with its most popular documents (the Top-10) and serving it to its clients. Actually, quite a few servers today calculate their 10 most popular documents among other statistics regularly (e.g. see http://www.csd.uch.gr/usage/). Calculating beyond the 10 most popular documents is an obvious extension to the existing functionality.
To make sure that documents are prefetched only to clients that can potentially use them, Top-10 does not treat all clients equally. Time is divided in intervals and prefetching from any server is activated only after the client has made sufficient number of requests to that server (;SPMgt;ACCESS_THRESHOLD). Thus, no documents are prefetched to occasional clients, while frequent clients are identified and considered for prefetching.
Some clients, i.e. proxies, make much more requests than others, and a correctly designed algorithm should prefetch different number of documents to different clients. For example, it makes no sense to prefetch the 500 most popular documents to a client that made 10 requests during the previous time interval. Taking the recent past as an indication of the near future, that client will make around 10 requests in the next time interval and at most of the prefetched documents will be used. On the other hand, a client that made 20,000 requests during the previous interval will benefit from prefetching the 500 most popular documents, and even more than those. Top-10 prefetching adjusts the amount of prefetching to various clients based on the amount of requests made in the recent past. Along those lines, a client may not prefetch more than the number of documents it accessed during the previous time interval.
Finally, to make sure that Top-10 policy can be more or less aggressive when needed, the TOP-10 parameter defines the maximum number of documents that can be prefetched during any time interval from any server. Thus, at any point, a client can not prefetch more than TOP-10 documents even if it accessed lots of documents during the previous interval. By choosing a large value for TOP-10, prefetching can be very aggressive. On the other hand, small values of TOP-10 limit the extend of prefetching.
Summarizing, the Top-10 approach to prefetching has two safeguards against letting prefetching getting out of control: (i) ACCESS_THRESHOLD which identifies occasional clients and does not prefetch documents to them, and (ii) TOP-10 which can practically deny prefetching even to very frequent clients. We believe these safeguards are enough to control the extent of prefetching.