A C

C
OMMITTEE
:
Dr. Alan L. Cox, Chair
Associate Professor of Computer Science
Rice University
Dr. Willy Zwaenepoel
Professor and Dean
Faculty of Computer & Communications
Sciences
Swiss Federal Institute of Technology
Dr. Scott Rixner
Assistant Professor of Computer Science
Rice University
Dr. T. S. Eugene Ng
Assistant Professor of Computer Science
Rice University
H
OUSTON
, T
EXAS
J
UNE
, 2004 A Consistent and Transparent Solution for
Caching Dynamic Web Content
Sumit Mittal
Abstract
Caching is an effective means for reducing load on web servers, especially for those that
dynamically generate documents in dynamic web applications. While adding caching to
a web application can greatly reduce response times for requests, the logic to ensure con-
sistency with the backend database requires considerable effort to develop. Much of the
complexity is in minimizing unnecessary page invalidations, a key goal for improving the
cache hit rate and response times. In this thesis I explore a range of invalidation policies
that are progressively more precise. A policy is more precise than the other if it produces
less false positives (removal of valid pages). A contribution of this work is in achieving
precise invalidations at the application server layer automatically.
To explore these issues, I introduce AutoWebCache, a system for adding server-side
caching for dynamic content automatically to web applications having a back-end database.
To achieve automation, it uses aspect-oriented programming for injecting the cache code
into the application. Dependencies between the read and write requests are determined au-
tomatically, during run-time. Formulating the dependencies requires SQL query analysis
be performed at run-time, which is costly. I demonstrate how to reduce this dynamic analy-
sis overhead through effective caching of intermediate analysis results. In two e-commerce
benchmarks, RUBiS and TPC-W, I show my method can be highly effective, reducing the
response times, 63% and 97%, respectively. Acknowledgments
I am grateful to my advisors Dr. Alan L. Cox and Dr. Willy Zwaenepoel for their guidance,
constant encouragement and motivation. Their interest and condence in the work was
always a source of inspiration. Many thanks to my other committee members: Dr. Scott
Rixner and Dr. Eugene Ng, for their valuable feedback and suggestions.
I also thank Dr. Sara Bouchenak and Dr. Steven Dropsho who have worked with me
on this project from across the Atlantic. Their insight and contributions have played a
signicant part in this project. Special thanks goes to all my friends in the Systems group
at Rice: Ajay, Amit, Animesh, Anupam, Anwis, Atul, Johnny, Khaled, and Santa. All have
contributed to my thesis in unique ways. Aravind, while working from EPFL, was always
a help with his knowledge of the benchmarks and was gracious in the sharing of machines
we used for simulations.
Lastly, my family members deserve special credit for keeping me inspired and moti-
vated. I dedicate this work to their support. Contents
Abstract
ii
Acknowledgments
iii
List of Illustrations
vi
List of Tables
vii
1 Introduction
1
2 Background
5
2.1
Dynamic Web Applications . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.2
Structure of Database Queries
. . . . . . . . . . . . . . . . . . . . . . . .
6
2.3
Aspect-oriented Programming and AspectJ
. . . . . . . . . . . . . . . . .
9
2.3.1
Introduction to AspectJ . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Design of the AutoWebCache
12
3.1
Overview
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2
Consistency Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3
Cache Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4
Invalidation Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5
Query Analysis Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.1
Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.5.2
Intersection Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5.3
Analysis Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5.4
Generation of Extra Queries . . . . . . . . . . . . . . . . . . . . . 25
3.6
Code Injection with AspectJ . . . . . . . . . . . . . . . . . . . . . . . . . 26 v
3.7
Limitations of Automatic Caching . . . . . . . . . . . . . . . . . . . . . . 29
3.7.1
The Hidden State Problem . . . . . . . . . . . . . . . . . . . . . . 29
3.7.2
Transparent Updates to Database . . . . . . . . . . . . . . . . . . . 30
3.7.3
Lack of Knowledge of Application Semantics . . . . . . . . . . . . 31
3.8
A General Invalidation Strategy
. . . . . . . . . . . . . . . . . . . . . . . 31
4 Evaluation and Results
34
4.1
Evaluation Environment
. . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.1
Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.1.2
Client Emulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.3
Software
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.4
Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1
RUBiS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.2
TPC-W . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.3
Web Page Cache Size Requirement . . . . . . . . . . . . . . . . . 46
4.2.4
Query Analysis Caching . . . . . . . . . . . . . . . . . . . . . . . 47
5 Related Work
49
6 Conclusions and Future Work
52
Bibliography
55 Illustrations
2.1
Architecture of Dynamic Web Applications . . . . . . . . . . . . . . . . .
5
3.1
Design of the Automatic Caching System . . . . . . . . . . . . . . . . . . 14
3.2
Use of AspectJ to Achieve Transparency . . . . . . . . . . . . . . . . . . . 28
4.1
Response Times for RUBiS Browsing Mix . . . . . . . . . . . . . . . . . . 37
4.2
Cache Hit Rates for RUBiS Bidding Mix . . . . . . . . . . . . . . . . . . . 38
4.3
Response Times for RUBiS Bidding Mix
. . . . . . . . . . . . . . . . . . 38
4.4
Breakdown of Hits by Request Type in RUBiS . . . . . . . . . . . . . . . . 39
4.5
Breakdown of Overall vs Miss Response Time in RUBiS . . . . . . . . . . 41
4.6
Extra Query Overhead for RUBiS Bidding Mix . . . . . . . . . . . . . . . 41
4.7
Cache Hit Rates for TPC-W Shopping Mix
. . . . . . . . . . . . . . . . . 43
4.8
Response Times for TPC-W Shopping Mix . . . . . . . . . . . . . . . . . 43
4.9
Breakdown of Hits by Request Type in TPC-W . . . . . . . . . . . . . . . 44
4.10 Breakdown of Overall vs Miss Response Time in TPC-W . . . . . . . . . . 45
4.11 Extra Query Overhead for TPCW Shopping Mix
. . . . . . . . . . . . . . 46
4.12 Cache Size for RUBiS
. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.13 Cache Size for TPCW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Tables
2.1
A Database Table
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
3.1
Cache entries table.
Uses the request URI as a hash index to lookup the
cached
web page for the request. . . . . . . . . . . . . . . . . . . . . . . . 17
3.2
Instances table.
Uses the full SQL query string (a.k.a, query type) as the
index to return the complete set of instances for the type. The
instance-specic information are the dynamic values used by the instance
and a link (the URI) to the cached web page. . . . . . . . . . . . . . . . . . 17
3.3
Write query analysis results table.
Uses the full SQL query string
(a.k.a, query type) as the index to return the summary of the querys
analysis, which is the internal representation of the query and the list of
read query types with which the write could intersect, and extra queries
needed for additional values (only for AC-extraQuery). . . . . . . . . . . . 18
3.4
Read query analysis results table.
Uses the full SQL query string (a.k.a,
query type) as the index to return the summary of the read querys
analysis, which is the internal representation of the query. The
dependency relationships with writes are kept in Table 3.3. . . . . . . . . . 18
4.1
Query analysis cache statistics for RUBiS @ 1000 clients and TPC-W @
400 clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 1
Chapter 1
Introduction
Dynamically generated web content represents a large portion of web requests, and the rate
at which dynamic documents are delivered is often orders of magnitude slower than static
content [5, 8]. Web sites for dynamic content are usually based on a multi-tier architec-
ture implemented using several middleware systems: an HTTP server as a web front-end
and provider of static content, an application server to execute the program generating the
dynamic content, and a database to store the non-emphemeral data required by the applica-
tion program. Dynamic content generation places a signicant burden on the servers, often
leading to performance bottlenecks. As a result, various techniques have been studied for
server-side acceleration of dynamic content web sites, including replication and clustering
of the tiers, and caching of content at various levels. The use of these techniques is ren-
dered more complicated by the dynamic nature of these services, requiring mechanisms to
maintain consistency between various cached or replicated copies of the data.
This thesis introduces AutoWebCache, a new method for server-side caching of dynam-
ically generated web pages. This method provides both consistency and transparency with
good performance. AutoWebCache maintains strong consistency, i.e., the cached pages
reect current va