ERCIM News No.25 - April 1996 - FORTH
Caching, Prefetching and Coherence in the World Wide Web
by Evangelos P. Markatos and Catherine E. Chronaki
Without caching, the World Wide Web will soon become a victim of its
own success. Experience suggests that Web servers become inaccessible as
soon as they become popular. As a result, popular Web servers appear slow,
and users become frustrated with their service. A scalable and effective
way to improve Web performance is the sophisticated use of caching and prefetching.
If a document is cached the first time it is accessed by a client, it
will be retrieved more efficiently on subsequent accesses by the same client
or by nearby clients. The use of prefetching improves performance even further,
by reducing the server's load and the client's latency even from the first
time a Web document is accessed.
Although caching and prefetching are key ideas in the success of the Web,
a brute force approach to them will lead to disastrous performance. Simply,
a Web server can neither cache, nor prefetch the entire Web. At ICS-FORTH
we are currently developing sophisticated caching and prefetching policies,
which improve performance without saturating the Internet with useless traffic.
Throughout the Web, we envision a number of Web Brokers. A Web Broker is
a mediator between the client and the Web content. Web Brokers provide storage,
processing, and communication capacity which are used to offload the client,
the server and the network from unnecessary network transfers. Instead of
asking the relevant servers for the required documents, clients request
documents from a Web Broker, which in turn may request the documents from
other Web Brokers, or from the relevant servers.
To reduce network traffic, Web Brokers cache documents that they fetch on
behalf of their clients. They also keep extensive indices of other Web Brokers
that cache documents so that they know approximately which Brokers cache
which varieties of documents. Cached documents are kept in a multi-level
storage hierarchy that consists of main memory, magnetic disks, and even
tapes. Small and popular documents should be kept at the upper levels of
the hierarchy to optimize for low latency. Large and less frequently accessed
documents should be kept at the lower levels of the hierarchy in such a
way as to optimize for bandwidth.
To reduce network traffic even further, Web Brokers calculate document access
patterns in order to define clusters of documents that are probably requested
together. For example, once a client requests a page from a book, it will
probably request more pages and pictures from the same book. Clustering
pages into chapters, and chapters into books, will allow Web Brokers to
prefetch large amounts of useful information with one request. Based on
the calculated access patterns, Web Brokers prefetch popular documents.
For example, Web Brokers may prefetch magazines, newspapers, etc., so that
clients will find it there when they need it. Based on its clients reading
habits, a Web Broker will soon 'learn' which newspapers, magazines, etc.
to prefetch.
To make sure that their data are not stale, Web Brokers take care of the
consistency of their cached documents. Periodically updated documents, like
newspapers are regularly fetched in the cache. Other documents are kept
up-to-date using a combination of update-based and invalidate-based protocols.
Finally, there may even be some documents where a small out-of-date information
can be tolerated in the interests of reduced network traffic.
We are currently evaluating various caching and prefetching policies to
be used by Web Brokers. For our evaluations, we use trace-driven simulation
based on real access traces of Web servers and Web clients. At the same
time, we investigate alternative schemes of integrating our results into
Web servers.
Please contact:
Evangelos P. Markatos - ICS-FORTH
Tel. +30 81 391655
E-mail: markatos@ics.forth.gr