COMMAND

    CGI.pm

SYSTEMS AFFECTED

    Most of them

PROBLEM

    Kragen Sitaker found following.  CGI.pm contains a method self_url
    which returns the URL with which the script was called,  including
    all of the data fields submitted --- except for the .submit= field
    added by CGI.pm.  Normally, this is used something like this:

        my $self = self_url;
        print qq(<a href="$self#Section2">Section 2</a>\n);

    If CGI.pm is running on  Apache 1.3.6, probably other versions  of
    Apache,  and  possibly  other  Web  servers,  it is possible for a
    client  to  cause  self_url  to  include  arbitrary  sequences  of
    characters at its beginning, such as

        "><script language="JavaScript">evil_code()</script><a href="

    which, if used in the manner described above, leads to the problem
    described  in  CERT  Advisory  CA-2000-02,  "Malicious  HTML  Tags
    Embedded  in  Client  Web  Requests"  (see CSS advisory in mUNIXes
    section).   However it  is important  to note  that this  does not
    exploit a  bug in  Apache.   Apache is  choosing to  deal with  an
    illegal request in a perfectly legitimate manner.  At least,  that
    is understanding of what the spec says.

    Apparently, anything following an unencoded space in the URL  used
    to  invoke  the  script  ends  up  being  inserted,  unencoded but
    converted to  lower case,  at the  beginning of  self_url's return
    value.  Unencoded  spaces are, of  course, illegal in  URLs.  Most
    web  browsers  accept  them  anyway  in HREF attributes, and don't
    bother to %-encode them when they send them in a GET request.

    Netscape 4.6, MSIE 3.0, Mozilla M12, and Lynx 2.8.1rel.2 at least,
    allow HREF  attribute values  to be  delimited by  ' single-quotes
    instead of " double-quotes, which allows insertion of unencoded  "
    double-quotes into the URL --- which is crucial to exploiting this
    problem.   Lynx 2.8.1rel.2,  however, strips  the spaces  from the
    URL  found  in  HTML,  preventing  it  from  being  exploited  via
    <A HREF=''>.

    It  appears  that  this  happens  because  the  unencoded space is
    interpreted  by  the  HTTP  server  (Apache  1.3.6  in  tests)  as
    separating the  URL from  the protocol  name.   So the environment
    variable  SERVER_PROTOCOL  gets  set  to  everything following the
    space,  followed  by  a  space  and  the  actual protocol, such as
    "HTTP/1.0".  Three of the four tested browsers (Netscape 4.6, MSIE
    3.0, and Mozilla M12) send the unencoded space in the request URL,
    which generates an illegal HTTP Request-Line.

    CGI.pm  simply   takes  that   environment  variable,   chops  off
    everything from the slash onwards, lowercases it, and returns  the
    result as the URL scheme.

SOLUTION

    RFC 1738 and RFC  2068 say that only  a-z, 0-9, "+", ".",  and "-"
    are allowed in scheme names.  Accordingly, Kragen Sitaker suggests
    the following change to CGI.pm:

    *** /usr/local/lib/perl5/5.00503/CGI.pm Tue May 18 00:04:20 1999
    --- /home/kragen/lib/perl5/site_perl/5.005//CGI.pm      Mon Feb 14 12:07:37 2000
    ***************
    *** 2594,2600 ****
          return 'https' if $self->server_port == 443;
          my $prot = $self->server_protocol;
          my($protocol,$version) = split('/',$prot);
    !     return "\L$protocol\E";
      }
      END_OF_FUNC

    --- 2594,2602 ----
          return 'https' if $self->server_port == 443;
          my $prot = $self->server_protocol;
          my($protocol,$version) = split('/',$prot);
    !     $protocol = lc $protocol;
    !     $protocol =~ tr/-+.a-z0-9//cd;
    !     return $protocol;
      }
      END_OF_FUNC

    (Sorry ---  using Solaris  diff, which  doesn't have  unified diff
    capability.)   This  prevents  the  exploit,  but  of  course  the
    resulting  URL  is  incorrect.   It  won't  affect  responses   to
    well-formed HTTP requests, which should never have anything  other
    than HTTP for the  $protocol to begin with.   It might be  smarter
    to always return 'http' when not returning 'https'; not  presently
    aware of  any protocols  other than  HTTP and  SSL HTTP  used with
    CGI.  The current draft CGI spec says:

        Note that the scheme and  the protocol are not identical;  for
        instance, a resource accessed via an SSL mechanism may have  a
        Client-URI  with  a  scheme  of  "https"  rather  than "http".
        CGI/1.1 provides no means for the script to reconstruct  this,
        and therefore the Script-URI includes the base protocol used.

    ...  in  other  words,  implementing  self_url  in  a  way that is
    guaranteed to be correct  for future non-HTTP CGI  implementations
    is not possible.

    The  successful  exploit  requires  a  remarkable chain of extreme
    forgiveness:

        1- The web browser must  accept an illegal URL from  (possibly
           valid, although very unusual) HTML.
        2- The web browser must send an illegal HTTP request with  the
           illegal URL, without %-encoding the URL to make it legal.
        3- The HTTP server must accept the illegal HTTP request.
        4- The  HTTP  server  must  invoke  the  CGI  script  with   a
           nonsensical SERVER_PROTOCOL.
        5- The CGI script must accept the nonsensical  SERVER_PROTOCOL
           and use it  to produce an  illegal URL, which  it must then
           embed in HTML it outputs.
        6- The  web  browser  must  then  trust the output of the  CGI
           script in some fashion inappropriate to the supplier of the
           original URL.

    Netscape 4.6, MSIE 3.0, and  Mozilla M12 (and, guess is,  most Web
    browsers) will happily perform steps  1 and 2; Apache 1.3.6  (and,
    guess is, most  Web servers) will  happily perform steps  3 and 4;
    any program using CGI.pm and embedding self_url's return value  in
    their outputs will perform step 5; and as CERT advisory CA-2000-02
    documents, there are a wide  variety of situations that can  cause
    step  6  to  happen.   Part  of  Apache's functionality is to pass
    unknown methods and protocols on to CGIs.  It is be arguable  that
    Apache should  explicitly reject  any request  with more  than two
    unencoded spaces in it.

    Patch above breaks the chain at step 5.  It would be nice to break
    it at other steps as well.  The HTTP requests used in this exploit
    are broken --- i.e. by  having a Request-Line that has  a protocol
    name that not only fails to be "HTTP", but actually fails to be  a
    valid protocol name at all.  Perhaps Apache and other web  servers
    should respond  to such  egregious protocol  violations with error
    messages, rather than passing the bogus data on to CGI scripts.

    Squid, when used as a proxy, does not accept these incorrect URLs.
    Since one may install it as a "transparent proxy", you may tend to
    get  error  messages  from  Squid  about  this  from time to time.
    Usually this is due to sloppy HREFs, not anything malicious.