COMMAND
HTTP cache-control
SYSTEMS AFFECTED
Most systems
PROBLEM
Martin Pool found following. HTTP cache-control headers such as
If-Modified-Since allow servers to track individual users in a
manner similar to cookies, but with less constraints. This is a
problem for user privacy against which browsers currently provide
little protection.
Let's say Alice is browsing the web; Bob runs a number of
otherwise-unrelated web servers. Alice makes several requests to
Bob's servers over time. Bob would like to tie together as many
as possible of the requests made by Alice to learn more about
Alice's usage patterns and identity: we call this identifying the
request chain. Alice would like to access Bob's servers but not
give away this information.
The standard approach for associating user requests across several
responses is the HTTP `Cookie' state-management extension. The
Cookie response header allows a server to ask the client to store
arbitrary short opaque data, which should be returned for future
requests of that server matching particular criteria. Cookies are
commonly used to store per-user form defaults, to manage web
application sessions, and to associate requests between executions
of the user agent.
The user agent always has the option to just ignore the Set-Cookie
response header, but most implementations default to obeying it to
preserve functionality. Cookies can optionally specify an expiry
time after which they should no longer be used, that they should
persist on disk between client session, or that they should only
be passed over transmission-level-secure connections.
The privacy implications of cookies have been extensively
discussed, and several problems have been found and recitified in
the past. One example of privacy compromise through cookies is
the use of cookies attached to banner images downloaded from a
central banner server: the same cookie is used within images
linked from several servers, and so the user can be tracked as
they move around.
An obvious means to associate requests is by source IP address.
Over the short term this will generally work quite well, as a
client is likely to use a single IP address during a browsing
session. Even then it is complicated by proxies acting for
multiple clients, network address translation, or multiuser
machines. Over a longer term, the information is convolved by
dynamically-assigned IPs, mobile computers moving between
networks, dialup pools and the like. Indeed, cookies were
proposed in large part to allow legitimate stateful applications
to cope with the impossibility of uniquely identifying users by
IP address.
The fundament of the meantime exploit is that the server wishes to
`tag' the client with some information that will later be reported
back, allowing the server to identify a chain. Cookies are a good
approach to this, but their privacy implications are well known
and so Bob requires a more surreptitious approach.
The HTTP cache-control headers are perfect for this: the data is
provided by the server, stored but not verified by the client, and
then provided verbatim back to the server on the next matching
request.
Two headers in particular are useful: Last-Modified and ETag.
Both are designed to help the client and server negotiate whether
to use a cached copy or fetch the resource again.
The general approach of meantime is that rather than using the
headers for their intended purpose, Bob's servers will instead
send down a unique tag for the client.
Last-Modified is constrained to be a date, and therefore is
somewhat inflexible. Nevertheless, the server can reasonably
choose any second since the Unix epoch, which allows it to tag on
the order of one billion distinct clients.
ETag allows an arbitrary short string to be stored and passed. It
is not so commonly implemented in user agents at the moment, and
so not such a good choice.
In both cases the tag will be lost if the client discards the
resource from its cache, or if it does not request the exact same
resource in the future, or if the request is unconditional. (For
example, Netscape sends an unconditional response when the user
presses Shift+Reload.) Bob has less control over this than he has
with cookies, which can be instructed to persist for an
arbitrarily long period.
The date is only sent back for the exact same URL, including any
query parameters. By contrast, cookies can be returned for all
resources in a site or section of a site. This makes Bob's job a
little harder.
Bob therefore should make sure that all pages link to a small
common resource: perhaps a one-pixel image. This image is
generated by a script that supplies and records a unique timestamp
to each client, and records whatever is already present.
For a demonstration, more explanation and details, please see
http://www.linuxcare.com.au/mbp/meantime/
SOLUTION
Nothing yet.