COMMAND

    mmap()

SYSTEMS AFFECTED

    4.4BSD [OpenBSD 2.2 and below,  FreeBSD 2.2.5 and below, BSDI  3.0
           NetBSD-current (without UVM) and below]

PROBLEM

    Followinf info is  based on OpenBSD Security Advisory.  The 4.4BSD
    VM system  allows files  to be  "memory mapped",  which causes the
    specified contents of a file to be made available to a process via
    its  address  space.   Manipulations  of  that  file  can  then be
    performed  simply  by  manipulating  memory,  rather  than   using
    filesystem I/O calls.   This technique is  used to simplify  code,
    speed up access to files, and provide interprocess communication.

    Memory mappings can be "private" or "shared". In a private  memory
    mapping, changes to  the mapped memory  are not committed  back to
    the original file.   Multiple processes with  private mappings  of
    the  same  file  will  not  see  each other's changes. In a shared
    mapping,  changes  to  the  mapped  memory  are  reflected  in the
    original file, and  all processes mapping  the same file  see each
    others's changes.

    In order to create a writeable mapping for a file descriptor, that
    file descriptor  must be  open in  read-write mode.  This prevents
    users from using  read-only access to  system files to  change the
    system  configuration  (by  taking  the  read-only descriptors and
    mapping them read-write).  The  4.4BSD VM system verifies that  an
    open  file  descriptor  is  read-write  before  allowing  a shared
    read-write mapping.

    4.4BSD does not perform this access check when the mapping is  not
    shared;  a  process  with  a  private  mapping  cannot  modify the
    original  file,  so  the   potential  for  danger  is   minimized.
    Unfortunately,  the  4.4BSD  VM  system  automatically changes any
    private mapping of a  character device to "shared",  regardless of
    the flags passed to mmap(), after the access check is performed.

    This allows a user with read-only access to a character device  to
    create a read-write mapping to that device, and thus write to  the
    device.   This  can  be   used  against  the  raw  memory   device
    ("/dev/mem") to write arbitrary bytes directly to physical memory;
    if  a  process  has  read-only  access to "/dev/mem" (processes in
    group "kmem" have this access),  it can become "root" by  altering
    kernel data structures.

    Furthermore, a process with a read-write mapping on "/dev/mem" can
    rewrite the  system securelevel  back to  zero after  it has  been
    raised.   This allows  an attacker  to bypass  the "immutable" and
    "append-only" filesystem flags,  along with any  other securelevel
    protections.

    The code exhibiting this problem is located in "sys/vm/vm_mmap.c",
    in  the  functions  "mmap()"  (the  mmap system call handler), and
    "vm_mmap()",  the  VM  function  that  actually  performs   memory
    mapping.  The problem is due  to a faulty access check in  mmap(),
    combined  with  a  side-effect  of  character  device  mapping  in
    vm_mmap().  The mmap()  system call handler performs  a read-write
    access check  by examining  the file  descriptor passed  in as  an
    argument to the system call.  Before allowing a shared  read-write
    mapping, the system verifies that the file being mapped is open in
    write mode:

        if (flags & MAP_SHARED) {
                if (fp->f_flag & FWRITE)
                        maxprot |= VM_PROT_WRITE;
                else if (prot & PROT_WRITE)
                        return (EACCES);
        }

    If the requested mapping is  not shared, the access check  against
    the file (the  check for FWRITE  in fp->f_flag, which  is the file
    structure for the descriptor passed to mmap) is not performed. For
    regular files, this check is sufficient; a non-shared mapping will
    not allow a process to write to the actual file, only to a private
    copy in memory.  The  vm_mmap() kernel VM function handles  memory
    mapping  for  all  of  the  kernel  facilities  that  require this
    capability, including  execve(), System  V shared  memory, and the
    mmap()  system  call.  vm_mmap()  checks  to  see  if a mapping is
    requested  is  associated  with  a  character  device, and, if so,
    automatically  creates  a  shared  mapping (comments from original
    source code):

        if (vp->v_type == VCHR) {
                type = OBJT_DEVICE;
                handle = (caddr_t) vp->v_rdev;
        }

        ...

        /*
         * Force device mappings to be shared.
         */
        if (type == OBJT_DEVICE) {
                flags &= ~(MAP_PRIVATE|MAP_COPY);
                flags |= MAP_SHARED;
        }

    As a result of this code,  it is possible to request a  non-shared
    mapping of a character device (which will appear innocuous to  the
    mmap()  access  checking  code),  and  receive a shared, writeable
    mapping. This can be used  to obtain write access to  any readable
    character device.   This problem  is particularly  serious when  a
    hostile process  has read  access to  kernel memory  devices.  The
    system  status  utilities  "ps",  "netstat",  "systat", and others
    operate setgid  "kmem", allowing  them to  use the  KVM library to
    directly access kernel memory. A bug in any of these programs  can
    allow an attacker to  trivially obtain root access,  by mmap()'ing
    a  read-only  descriptor  to   "/dev/mem"  and  altering   process
    credential  structures.   This  issue  also  directly subverts the
    system securelevel.  4.4BSD has  a facility  called "securelevels"
    which adds restrictions to the kernel that take effect only when a
    flag in the kernel (the "securelevel") is set.  These restrictions
    include "immutable" files, which cannot be altered (even by root),
    and "append-only" files, which can only have data appended to. The
    former is useful  for system binaries  (to prevent attackers  from
    backdooring libraries and executables),  and the latter is  useful
    for  logs  (to  prevent  attackers  from  covering their tracks by
    deleting log data).   The 4.4BSD securelevel  features are  active
    when the securelevel is nonzero.  The securelevel is set using the
    "sysctl" facility. The system does not allow the securelevel to be
    lowered  once  it  is  nonzero;  if  an  attacker  can  lower  the
    securelevel,  she  can  evade  securelevels protections by turning
    them off.

    The 4.4BSD kernel  does not allow  processes to write  directly to
    kernel  memory  when  the  securelevel  is  nonzero; this prevents
    "root"  from  bypassing  the  securelevel  simply  by  writing  to
    "/dev/kmem".  This   is  controlled   by  an   access  check    in
    "sys/miscfs/specfs/spec_vnops.c", which provides vnode  operations
    (open,  read,  write,  etc)  for  special  files  (like  character
    devices).   The  access  check  is  performed in the "spec_open()"
    function, which handles the "open" system call for special  files.
    When the securelevel is nonzero, the system explicitly checks  for
    attempts  to  open  devices  in  read-write  mode,  and   prevents
    read-write   opens   for   disk   and   kernel   memory   devices.
    Unfortunately,  the  mmap()  bug  allows  a  process to write to a
    descriptor even if  it is open  read-only; the assumption  made in
    spec_open() thus fails to catch attempts to reset the  securelevel
    using mmap().

    Documentation and testing of this problem was conducted by Theo de
    Raadt and Chuck Cranor.  Matthew Green posted testing code:

    /*
     * mmap-bug.c: test for the presense of mmap bug with append-only
     * files.  if it fails (and the bug is not present), it will probably
     * exit with an error from a system call.  this program will only
     * compile on systems with 4.4BSD-compatible `file flags'.
     *
     * Copyright (c) 1998 Matthew Green.  All Rights Reserved.
     */

    #include <sys/types.h>
    #include <sys/cdefs.h>
    #include <sys/mman.h>
    #include <sys/stat.h>
    #include <sys/wait.h>

    #include <err.h>
    #include <fcntl.h>
    #include <stdio.h>
    #include <string.h>
    #include <unistd.h>

    char filedata[] = "you do NOT have the bug.\n";
    char data[] = "you do have the bug.\n";

    void child __P((const char *));

    int
    main(argc, argv)
            int argc;
            char *argv[];
    {
            caddr_t f;
            pid_t pid;
            int fd;

            if (argc < 2)
                    errx(1, "usage: mmap-bug <file>");

            /* first create the file, and set APPEND */
            fd = open(argv[1], O_CREAT|O_TRUNC|O_WRONLY, 0644);
            if (fd < 0)
                    err(1, "open");
            if (write(fd, filedata, sizeof filedata) < 0)
                    err(1, "write");
            if (fchflags(fd, SF_APPEND|UF_APPEND) < 0)
                    err(1, "fchflags");
            if (close(fd) < 0)
                    err(1, "close");

            /* now fork the child */
            pid = fork();
            if (pid < 0)
                    err(1, "fork");
            if (pid == 0)
                    child(argv[1]);

            /* ok, in parent: open file append/read/write, and map it in */
            fd = open(argv[1], O_APPEND|O_RDWR, 0);
            if (fd < 0)
                    err(1, "parent open");
            f = mmap(0, 4096, PROT_WRITE|PROT_READ, MAP_SHARED, fd, 0);
            if (f == (caddr_t)-1)
                    err(1, "parent mmap");

            /* modify the file, and write it out */
            strcpy(f, data);

            /* wait for the child, and clean up */
            wait(NULL);
            if (fchflags(fd, 0) < 0)
                    err(1, "fchflags 2");
            if (unlink(argv[1]) < 0)
                    err(1, "unlink");

            exit(0);
    }

    void
    child(path)
            const char *path;
    {
            caddr_t f;
            int fd;

            sleep(3);

            /* ok, in child: open file read, and map it in */
            fd = open(path, O_RDONLY);
            if (fd < 0)
                    err(1, "child open");
            f = mmap(0, 4096, PROT_READ, MAP_SHARED, fd, 0);
            if (f == (caddr_t)-1)
                    err(1, "child mmap");

            /* write it out */
            write(1, f, strlen(f));

            exit(0);
    }

SOLUTION

    This is a  kernel problem that  can only be  fixed by patching  or
    upgrading the problematic  system code.   Patches for the  OpenBSD
    operating system are provided  in their advisory (February  20th).
    The problem  is fixed  in OpenBSD-current  and must  be patched in
    versions  2.2  and  below.   More  information  about  the OpenBSD
    resolution to the problem is available at:

        http://www.openbsd.org/errata.html

    This  was  corrected  in  FreeBSD-current  as  of  1998/03/11  and
    FreeBSD-stable as of 1998/03/11.  Patches can be obtained from:

        ftp://ftp.freebsd.org/pub/CERT/patches/SA-98:04/

    NOTE: Users of FreeBSD 2.2.5 or FreeBSD-current or  FreeBSD-stable
    dated before 1998/03/12 will need to apply the patch mentioned  in
    FreeBSD advisory SA-98:02:

        ftp://ftp.freebsd.org/pub/CERT/patches/SA-98:02/

    NetBSD has changed the mmap(2) system call to fail when creating a
    shared, writable file mapping if  the file is marked immutable  or
    append-only.  A patch has  been made available for NetBSD  1.3 and
    1.3.1, and can be found on the NetBSD FTP server:

        ftp://ftp.NetBSD.ORG/pub/NetBSD/misc/security/patches/19980509-mmap