COMMAND
CGI.pm
SYSTEMS AFFECTED
Most of them
PROBLEM
Kragen Sitaker found following. CGI.pm contains a method self_url
which returns the URL with which the script was called, including
all of the data fields submitted --- except for the .submit= field
added by CGI.pm. Normally, this is used something like this:
my $self = self_url;
print qq(<a href="$self#Section2">Section 2</a>\n);
If CGI.pm is running on Apache 1.3.6, probably other versions of
Apache, and possibly other Web servers, it is possible for a
client to cause self_url to include arbitrary sequences of
characters at its beginning, such as
"><script language="JavaScript">evil_code()</script><a href="
which, if used in the manner described above, leads to the problem
described in CERT Advisory CA-2000-02, "Malicious HTML Tags
Embedded in Client Web Requests" (see CSS advisory in mUNIXes
section). However it is important to note that this does not
exploit a bug in Apache. Apache is choosing to deal with an
illegal request in a perfectly legitimate manner. At least, that
is understanding of what the spec says.
Apparently, anything following an unencoded space in the URL used
to invoke the script ends up being inserted, unencoded but
converted to lower case, at the beginning of self_url's return
value. Unencoded spaces are, of course, illegal in URLs. Most
web browsers accept them anyway in HREF attributes, and don't
bother to %-encode them when they send them in a GET request.
Netscape 4.6, MSIE 3.0, Mozilla M12, and Lynx 2.8.1rel.2 at least,
allow HREF attribute values to be delimited by ' single-quotes
instead of " double-quotes, which allows insertion of unencoded "
double-quotes into the URL --- which is crucial to exploiting this
problem. Lynx 2.8.1rel.2, however, strips the spaces from the
URL found in HTML, preventing it from being exploited via
<A HREF=''>.
It appears that this happens because the unencoded space is
interpreted by the HTTP server (Apache 1.3.6 in tests) as
separating the URL from the protocol name. So the environment
variable SERVER_PROTOCOL gets set to everything following the
space, followed by a space and the actual protocol, such as
"HTTP/1.0". Three of the four tested browsers (Netscape 4.6, MSIE
3.0, and Mozilla M12) send the unencoded space in the request URL,
which generates an illegal HTTP Request-Line.
CGI.pm simply takes that environment variable, chops off
everything from the slash onwards, lowercases it, and returns the
result as the URL scheme.
SOLUTION
RFC 1738 and RFC 2068 say that only a-z, 0-9, "+", ".", and "-"
are allowed in scheme names. Accordingly, Kragen Sitaker suggests
the following change to CGI.pm:
*** /usr/local/lib/perl5/5.00503/CGI.pm Tue May 18 00:04:20 1999
--- /home/kragen/lib/perl5/site_perl/5.005//CGI.pm Mon Feb 14 12:07:37 2000
***************
*** 2594,2600 ****
return 'https' if $self->server_port == 443;
my $prot = $self->server_protocol;
my($protocol,$version) = split('/',$prot);
! return "\L$protocol\E";
}
END_OF_FUNC
--- 2594,2602 ----
return 'https' if $self->server_port == 443;
my $prot = $self->server_protocol;
my($protocol,$version) = split('/',$prot);
! $protocol = lc $protocol;
! $protocol =~ tr/-+.a-z0-9//cd;
! return $protocol;
}
END_OF_FUNC
(Sorry --- using Solaris diff, which doesn't have unified diff
capability.) This prevents the exploit, but of course the
resulting URL is incorrect. It won't affect responses to
well-formed HTTP requests, which should never have anything other
than HTTP for the $protocol to begin with. It might be smarter
to always return 'http' when not returning 'https'; not presently
aware of any protocols other than HTTP and SSL HTTP used with
CGI. The current draft CGI spec says:
Note that the scheme and the protocol are not identical; for
instance, a resource accessed via an SSL mechanism may have a
Client-URI with a scheme of "https" rather than "http".
CGI/1.1 provides no means for the script to reconstruct this,
and therefore the Script-URI includes the base protocol used.
... in other words, implementing self_url in a way that is
guaranteed to be correct for future non-HTTP CGI implementations
is not possible.
The successful exploit requires a remarkable chain of extreme
forgiveness:
1- The web browser must accept an illegal URL from (possibly
valid, although very unusual) HTML.
2- The web browser must send an illegal HTTP request with the
illegal URL, without %-encoding the URL to make it legal.
3- The HTTP server must accept the illegal HTTP request.
4- The HTTP server must invoke the CGI script with a
nonsensical SERVER_PROTOCOL.
5- The CGI script must accept the nonsensical SERVER_PROTOCOL
and use it to produce an illegal URL, which it must then
embed in HTML it outputs.
6- The web browser must then trust the output of the CGI
script in some fashion inappropriate to the supplier of the
original URL.
Netscape 4.6, MSIE 3.0, and Mozilla M12 (and, guess is, most Web
browsers) will happily perform steps 1 and 2; Apache 1.3.6 (and,
guess is, most Web servers) will happily perform steps 3 and 4;
any program using CGI.pm and embedding self_url's return value in
their outputs will perform step 5; and as CERT advisory CA-2000-02
documents, there are a wide variety of situations that can cause
step 6 to happen. Part of Apache's functionality is to pass
unknown methods and protocols on to CGIs. It is be arguable that
Apache should explicitly reject any request with more than two
unencoded spaces in it.
Patch above breaks the chain at step 5. It would be nice to break
it at other steps as well. The HTTP requests used in this exploit
are broken --- i.e. by having a Request-Line that has a protocol
name that not only fails to be "HTTP", but actually fails to be a
valid protocol name at all. Perhaps Apache and other web servers
should respond to such egregious protocol violations with error
messages, rather than passing the bogus data on to CGI scripts.
Squid, when used as a proxy, does not accept these incorrect URLs.
Since one may install it as a "transparent proxy", you may tend to
get error messages from Squid about this from time to time.
Usually this is due to sloppy HREFs, not anything malicious.