COMMAND
mmap()
SYSTEMS AFFECTED
4.4BSD [OpenBSD 2.2 and below, FreeBSD 2.2.5 and below, BSDI 3.0
NetBSD-current (without UVM) and below]
PROBLEM
Followinf info is based on OpenBSD Security Advisory. The 4.4BSD
VM system allows files to be "memory mapped", which causes the
specified contents of a file to be made available to a process via
its address space. Manipulations of that file can then be
performed simply by manipulating memory, rather than using
filesystem I/O calls. This technique is used to simplify code,
speed up access to files, and provide interprocess communication.
Memory mappings can be "private" or "shared". In a private memory
mapping, changes to the mapped memory are not committed back to
the original file. Multiple processes with private mappings of
the same file will not see each other's changes. In a shared
mapping, changes to the mapped memory are reflected in the
original file, and all processes mapping the same file see each
others's changes.
In order to create a writeable mapping for a file descriptor, that
file descriptor must be open in read-write mode. This prevents
users from using read-only access to system files to change the
system configuration (by taking the read-only descriptors and
mapping them read-write). The 4.4BSD VM system verifies that an
open file descriptor is read-write before allowing a shared
read-write mapping.
4.4BSD does not perform this access check when the mapping is not
shared; a process with a private mapping cannot modify the
original file, so the potential for danger is minimized.
Unfortunately, the 4.4BSD VM system automatically changes any
private mapping of a character device to "shared", regardless of
the flags passed to mmap(), after the access check is performed.
This allows a user with read-only access to a character device to
create a read-write mapping to that device, and thus write to the
device. This can be used against the raw memory device
("/dev/mem") to write arbitrary bytes directly to physical memory;
if a process has read-only access to "/dev/mem" (processes in
group "kmem" have this access), it can become "root" by altering
kernel data structures.
Furthermore, a process with a read-write mapping on "/dev/mem" can
rewrite the system securelevel back to zero after it has been
raised. This allows an attacker to bypass the "immutable" and
"append-only" filesystem flags, along with any other securelevel
protections.
The code exhibiting this problem is located in "sys/vm/vm_mmap.c",
in the functions "mmap()" (the mmap system call handler), and
"vm_mmap()", the VM function that actually performs memory
mapping. The problem is due to a faulty access check in mmap(),
combined with a side-effect of character device mapping in
vm_mmap(). The mmap() system call handler performs a read-write
access check by examining the file descriptor passed in as an
argument to the system call. Before allowing a shared read-write
mapping, the system verifies that the file being mapped is open in
write mode:
if (flags & MAP_SHARED) {
if (fp->f_flag & FWRITE)
maxprot |= VM_PROT_WRITE;
else if (prot & PROT_WRITE)
return (EACCES);
}
If the requested mapping is not shared, the access check against
the file (the check for FWRITE in fp->f_flag, which is the file
structure for the descriptor passed to mmap) is not performed. For
regular files, this check is sufficient; a non-shared mapping will
not allow a process to write to the actual file, only to a private
copy in memory. The vm_mmap() kernel VM function handles memory
mapping for all of the kernel facilities that require this
capability, including execve(), System V shared memory, and the
mmap() system call. vm_mmap() checks to see if a mapping is
requested is associated with a character device, and, if so,
automatically creates a shared mapping (comments from original
source code):
if (vp->v_type == VCHR) {
type = OBJT_DEVICE;
handle = (caddr_t) vp->v_rdev;
}
...
/*
* Force device mappings to be shared.
*/
if (type == OBJT_DEVICE) {
flags &= ~(MAP_PRIVATE|MAP_COPY);
flags |= MAP_SHARED;
}
As a result of this code, it is possible to request a non-shared
mapping of a character device (which will appear innocuous to the
mmap() access checking code), and receive a shared, writeable
mapping. This can be used to obtain write access to any readable
character device. This problem is particularly serious when a
hostile process has read access to kernel memory devices. The
system status utilities "ps", "netstat", "systat", and others
operate setgid "kmem", allowing them to use the KVM library to
directly access kernel memory. A bug in any of these programs can
allow an attacker to trivially obtain root access, by mmap()'ing
a read-only descriptor to "/dev/mem" and altering process
credential structures. This issue also directly subverts the
system securelevel. 4.4BSD has a facility called "securelevels"
which adds restrictions to the kernel that take effect only when a
flag in the kernel (the "securelevel") is set. These restrictions
include "immutable" files, which cannot be altered (even by root),
and "append-only" files, which can only have data appended to. The
former is useful for system binaries (to prevent attackers from
backdooring libraries and executables), and the latter is useful
for logs (to prevent attackers from covering their tracks by
deleting log data). The 4.4BSD securelevel features are active
when the securelevel is nonzero. The securelevel is set using the
"sysctl" facility. The system does not allow the securelevel to be
lowered once it is nonzero; if an attacker can lower the
securelevel, she can evade securelevels protections by turning
them off.
The 4.4BSD kernel does not allow processes to write directly to
kernel memory when the securelevel is nonzero; this prevents
"root" from bypassing the securelevel simply by writing to
"/dev/kmem". This is controlled by an access check in
"sys/miscfs/specfs/spec_vnops.c", which provides vnode operations
(open, read, write, etc) for special files (like character
devices). The access check is performed in the "spec_open()"
function, which handles the "open" system call for special files.
When the securelevel is nonzero, the system explicitly checks for
attempts to open devices in read-write mode, and prevents
read-write opens for disk and kernel memory devices.
Unfortunately, the mmap() bug allows a process to write to a
descriptor even if it is open read-only; the assumption made in
spec_open() thus fails to catch attempts to reset the securelevel
using mmap().
Documentation and testing of this problem was conducted by Theo de
Raadt and Chuck Cranor. Matthew Green posted testing code:
/*
* mmap-bug.c: test for the presense of mmap bug with append-only
* files. if it fails (and the bug is not present), it will probably
* exit with an error from a system call. this program will only
* compile on systems with 4.4BSD-compatible `file flags'.
*
* Copyright (c) 1998 Matthew Green. All Rights Reserved.
*/
#include <sys/types.h>
#include <sys/cdefs.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <sys/wait.h>
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
char filedata[] = "you do NOT have the bug.\n";
char data[] = "you do have the bug.\n";
void child __P((const char *));
int
main(argc, argv)
int argc;
char *argv[];
{
caddr_t f;
pid_t pid;
int fd;
if (argc < 2)
errx(1, "usage: mmap-bug <file>");
/* first create the file, and set APPEND */
fd = open(argv[1], O_CREAT|O_TRUNC|O_WRONLY, 0644);
if (fd < 0)
err(1, "open");
if (write(fd, filedata, sizeof filedata) < 0)
err(1, "write");
if (fchflags(fd, SF_APPEND|UF_APPEND) < 0)
err(1, "fchflags");
if (close(fd) < 0)
err(1, "close");
/* now fork the child */
pid = fork();
if (pid < 0)
err(1, "fork");
if (pid == 0)
child(argv[1]);
/* ok, in parent: open file append/read/write, and map it in */
fd = open(argv[1], O_APPEND|O_RDWR, 0);
if (fd < 0)
err(1, "parent open");
f = mmap(0, 4096, PROT_WRITE|PROT_READ, MAP_SHARED, fd, 0);
if (f == (caddr_t)-1)
err(1, "parent mmap");
/* modify the file, and write it out */
strcpy(f, data);
/* wait for the child, and clean up */
wait(NULL);
if (fchflags(fd, 0) < 0)
err(1, "fchflags 2");
if (unlink(argv[1]) < 0)
err(1, "unlink");
exit(0);
}
void
child(path)
const char *path;
{
caddr_t f;
int fd;
sleep(3);
/* ok, in child: open file read, and map it in */
fd = open(path, O_RDONLY);
if (fd < 0)
err(1, "child open");
f = mmap(0, 4096, PROT_READ, MAP_SHARED, fd, 0);
if (f == (caddr_t)-1)
err(1, "child mmap");
/* write it out */
write(1, f, strlen(f));
exit(0);
}
SOLUTION
This is a kernel problem that can only be fixed by patching or
upgrading the problematic system code. Patches for the OpenBSD
operating system are provided in their advisory (February 20th).
The problem is fixed in OpenBSD-current and must be patched in
versions 2.2 and below. More information about the OpenBSD
resolution to the problem is available at:
http://www.openbsd.org/errata.html
This was corrected in FreeBSD-current as of 1998/03/11 and
FreeBSD-stable as of 1998/03/11. Patches can be obtained from:
ftp://ftp.freebsd.org/pub/CERT/patches/SA-98:04/
NOTE: Users of FreeBSD 2.2.5 or FreeBSD-current or FreeBSD-stable
dated before 1998/03/12 will need to apply the patch mentioned in
FreeBSD advisory SA-98:02:
ftp://ftp.freebsd.org/pub/CERT/patches/SA-98:02/
NetBSD has changed the mmap(2) system call to fail when creating a
shared, writable file mapping if the file is marked immutable or
append-only. A patch has been made available for NetBSD 1.3 and
1.3.1, and can be found on the NetBSD FTP server:
ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19980509-mmap