COMMAND

    HTTP cache-control

SYSTEMS AFFECTED

    Most systems

PROBLEM

    Martin Pool found following.   HTTP cache-control headers such  as
    If-Modified-Since allow  servers to  track individual  users in  a
    manner similar to cookies, but  with less constraints.  This  is a
    problem for user privacy against which browsers currently  provide
    little protection.

    Let's  say  Alice  is  browsing  the  web;  Bob  runs  a number of
    otherwise-unrelated web servers.  Alice makes several requests  to
    Bob's servers over time.  Bob  would like to tie together as  many
    as possible  of the  requests made  by Alice  to learn  more about
    Alice's usage patterns and identity: we call this identifying  the
    request chain.  Alice would  like to access Bob's servers  but not
    give away this information.

    The standard approach for associating user requests across several
    responses is  the HTTP  `Cookie' state-management  extension.  The
    Cookie response header allows a server to ask the client to  store
    arbitrary short opaque data,  which should be returned  for future
    requests of that server matching particular criteria.  Cookies are
    commonly  used  to  store  per-user  form  defaults, to manage web
    application sessions, and to associate requests between executions
    of the user agent.

    The user agent always has the option to just ignore the Set-Cookie
    response header, but most implementations default to obeying it to
    preserve functionality.  Cookies can optionally specify an  expiry
    time after which they should  no longer be used, that  they should
    persist on disk between client  session, or that they should  only
    be passed over transmission-level-secure connections.

    The  privacy  implications  of   cookies  have  been   extensively
    discussed, and several problems have been found and recitified  in
    the past.   One example of  privacy compromise through  cookies is
    the use  of cookies  attached to  banner images  downloaded from a
    central  banner  server:  the  same  cookie  is used within images
    linked from  several servers,  and so  the user  can be tracked as
    they move around.

    An obvious means  to associate requests  is by source  IP address.
    Over the  short term  this will  generally work  quite well,  as a
    client is  likely to  use a  single IP  address during  a browsing
    session.   Even  then  it  is  complicated  by  proxies acting for
    multiple  clients,  network  address  translation,  or   multiuser
    machines.   Over a  longer term,  the information  is convolved by
    dynamically-assigned   IPs,   mobile   computers   moving  between
    networks,  dialup  pools  and  the  like.   Indeed,  cookies  were
    proposed in large part  to allow legitimate stateful  applications
    to cope with  the impossibility of  uniquely identifying users  by
    IP address.

    The fundament of the meantime exploit is that the server wishes to
    `tag' the client with some information that will later be reported
    back, allowing the server to identify a chain. Cookies are a  good
    approach to this,  but their privacy  implications are well  known
    and so Bob requires a more surreptitious approach.

    The HTTP cache-control headers are  perfect for this: the data  is
    provided by the server, stored but not verified by the client, and
    then provided  verbatim back  to the  server on  the next matching
    request.

    Two  headers  in  particular  are  useful: Last-Modified and ETag.
    Both are designed to help the client and server negotiate  whether
    to use a cached copy or fetch the resource again.

    The general  approach of  meantime is  that rather  than using the
    headers for  their intended  purpose, Bob's  servers will  instead
    send down a unique tag for the client.

    Last-Modified  is  constrained  to  be  a  date,  and therefore is
    somewhat  inflexible.   Nevertheless,  the  server  can reasonably
    choose any second since the Unix epoch, which allows it to tag  on
    the order of one billion distinct clients.

    ETag allows an arbitrary short string to be stored and passed.  It
    is not so commonly implemented  in user agents at the  moment, and
    so not such a good choice.

    In both  cases the  tag will  be lost  if the  client discards the
    resource from its cache, or if it does not request the exact  same
    resource in the future, or  if the request is unconditional.  (For
    example, Netscape  sends an  unconditional response  when the user
    presses Shift+Reload.) Bob has less control over this than he  has
    with  cookies,  which  can  be   instructed  to  persist  for   an
    arbitrarily long period.

    The date is only sent back  for the exact same URL, including  any
    query parameters.   By contrast, cookies  can be returned  for all
    resources in a site or section of a site.  This makes Bob's job  a
    little harder.

    Bob therefore  should make  sure that  all pages  link to  a small
    common  resource:  perhaps  a  one-pixel  image.   This  image  is
    generated by a script that supplies and records a unique timestamp
    to each client, and records whatever is already present.

    For a demonstration, more explanation and details, please see

        http://www.linuxcare.com.au/mbp/meantime/

SOLUTION

    Nothing yet.