TPOT: Translucent Proxying of TCP

ranslucent Proxying of TCP (TPOT) overcomes this limitation by using TCP options and IP tunneling to ensure that
all IP packets belonging to a TCP connection will traverse the proxy that intercepted the rst packet. This guarantee
allows the ad-hoc deployment of TPOT proxies anywhere within the network. No extra signaling support is required.
In addition to the advantages TPOT proxies offer at the application level, they also generally improve the throughput of
intercepted TCP connections. In this paper we discuss the TPOT protocol, explain how it enables various applications,
address deployment and scalability issues, and summarize the impact of TPOT on TCP performance.
1
Introduction and Related Work
Transparent proxies are commonly used in solutions when an application is to be proxied in a manner that is
completely oblivious to a client, without requiring any prior conguration. Recently, there has been a great deal of
activity in the area of transparent proxies for Web caching. Several vendors in the area of Web proxy caching have
announced dedicated Web proxy switches and appliances [1, 2, 7, 10].
In the simplest scenario, a transparent proxy intercepts all TCP connections that are routed through it. This may
be rened by having the proxy intercept TCP connections destined only for specic ports (e.g., 80 for HTTP), or for a
specic set of destination addresses. The proxy responds to the client request, masquerading as the remote web server.
Scalability is achieved by partitioning client requests into separate hash buckets based on the destination address,
effectively mapping web servers to multiple caches attached to the proxy.
In the event of a cache miss, the cache re-issues the request to the web server, and pipes the response it receives
from the web server back to the client, keeping a copy for itself (assuming the response is cacheable). Note that, in
general, this mechanism may be repeated, where a subsequent proxy along the path may intercept an earlier cache
miss, and so on.
The proxy described above is often termed as a Layer-4 switch, or simply L-4 switch, since TCP is a Transport
Layer protocol, which maps to Layer 4 in the OSI networking stack. In a variant of the above, the proxy/switch parses
the HTTP request and extracts the URL and possibly other elds of the HTTP Request, before deciding what to do
with the request. Since such a switch inspects the HTTP Request, which is an Application Layer or Layer 7 function,
it is called an L-7 switch [2].
An acute problem that limits the use of transparent L-4 and L-7 Web proxies, is the need to have the proxy at
a location that is guaranteed to see all the packets of the request [7]. Since routing in an IP network can lead to
£
He contributed to this work during an internship at AT&T.
1 situations where multiple paths from client to server can have the lowest cost, packets of a connection may sometimes
follow multiple paths. In such a situation a transparent proxy may see only a fraction of packets of the connection.
Occasionally it is also possible that routes change mid-way through a TCP connection, due to routing updates in the
underlying IP network. For these reasons transparent proxies are deployed exclusively at the edges or focal points
within the network such as gateways to/from single-homed clients or servers. This is not always the best place to
deploy a cache. In general one would expect higher hit rates for objects cached deeper inside the network [8].
TPOT solves this problem by making an innovative use of TCP-OPTIONs and IP tunnels. A source initiating a
TCP connection signals to potential proxies that it is TPOT-enabled by setting a TCP-OPTION within the SYN packet.
A TPOT proxy, on seeing such a SYN packet, intercepts it. The ACK packet that it returns to the source carries the
proxys IP address stuffed within a TCP-OPTION. On receiving this ACK, the source sends the rest of the packets via
the intercepting proxy over an IP tunnel. The protocol is discussed in detail in Section 3.
The above mechanism will work if the client is TPOT enabled. In a situation where the client is not TPOT enabled,
we may still be able to use TPOT. As long as the client is single-homed, and has a proxy at a focal point, we can TPOT
enable the connection by having the proxy behave like a regular transparent proxy on the side facing the client, but a
TPOT (translucent) proxy on the side facing the server.
The general idea of using TCP-OPTIONs as a signaling scheme for proxies is not new [16]. However combining
this idea with IP tunneling to pin down the path of a TCP connection has not been proposed before to the best of our
knowledge.
One alternative to TPOT is the use of Active Network techniques [26]. We believe that TPOT is a relatively
lightweight solution that does not require an overhaul of existing IP networks. In addition, TPOT can be deployed
incrementally in the current IP network, without disrupting other Internet trafc.
The authors of [21] also use the term translucent to distinguish their proposed web caching proposal from trans-
parent caching. However, their work is distinct and largely complementary to TPOT. In [21], the routers along the
path from the client need to be enhanced so that they can provide the next-hop cache information, to the previous-hop
cache. This requires routers to know in advance the information of next-hop caches. TPOT on the other hand does not
require any such information, and is therefore easier to administer and manage.
The proposal in [21] has a maximum number of requests option that can be exploited in TPOT as well, to limit the
number of TPOT proxies that can intercept a TCP connection. We could insert this as part of the TCP-OPTION, and
decrement it every time the connection is intercepted by a TPOT proxy.
1.1
Paper Overview
Section 2 highlights two classes of applications of TPOT. Section 3 describes the TPOT protocol. In addition to
the basic version, a pipelined version of the protocol is also discussed. Pathological cases, extensions, and limitations
are also studied. Section 4 discusses deployment issues. Section 5 discusses our approach to solving this problem
using a technique that we call TPARTY, which employs a farm of servers that sit behind a front-end machine. The
front-end machine only farms out requests to the army of TPOT machines that sit behind it. We address the TCP level
performance of TPOT in Section 6. Due to space limitations, we were unable to cover the details here. A discussion of
the performance of TPOT using both analysis and experiments may be found in an extended version of this paper [23].
The technical report also covers a prototype implementation that was used in these experiments. Finally Section 7
highlights our major contributions, discusses future work, and possible extensions to TPOT.
2
Applications of TPOT
As mentioned in Section 1 TPOT allows the deployment of TCP proxies anywhere in the network. While several
applications exist, in this section we describe two that show how TPOT may be applied to important real world
problems.
2 2.1
Hierarchical Caching and Content Distribution Trees
In addition to allowing the placement of transparent Web proxy caches anywhere in the network, TPOT also
enables newer architectures that employ Web proxy networks. In such architectures a proxy located along the path
from the client to the server simply picks up the request and satises it from its own cache, or lets it pass through.
This, in turn, may be picked up by another proxy further down the path. These incremental actions lead to the dynamic
construction of spontaneous hierarchies rooted at the server. Such architectures require the placement of multiple
proxies within the network, not just at their edges and gateways. Existing proposals [13, 17, 28] either need extra
signaling, or they simply assume that all packets of the connection will pass through an intercepting proxy. Since
TPOT explicitly provides this guarantee, implementing such architectures with TPOT is elegant and easy. With TPOT
no extra signaling support or prior knowledge of neighboring proxies is required.
2.2
Transcoding
Transcoding refers to a broad class of problems that involve some sort of adaptation of content (e.g., [11, 19]),
where content is transformed so as to increase transfer efciency, or is distilled to suit the capabilities of the client.
Another similar use is the notion of enabling a transformer tunnel [25] over a segment of the path within which data
transfer is accomplished through some alternate technique that may be better suited to the specic properties of the
link(s) traversed. Proposals that we know of in this space require one end-point to explicitly know of the existence
of the other end-point requiring either manual conguration or some external signaling/discovery protocol. TPOT
can accomplish such functionality in a superior fashion. In TPOT an end-point non-invasively ags a connection,
signifying that it can transform content without actually performing any transformation. Only if and when a second
TPOT proxy (capable of handling this transformation) sees this ag and noties the rst proxy of its existence, does
the rst proxy begin to transform the connection. Note that this does not require any additional handshake for this to
operate correctly, since the TPOT mechanism plays out in concert with TCPs existing 3-way handshake.
3
The TPOT Protocol
This section describes the operation of the basic and pipelined versions of the TPOT protocol. Pathological cases,
extensions, and limitations are also studied. Before describing the operation of the TPOT protocol, we provide a brief
background of IP and TCP which will help in better understanding TPOT. S