COMMAND
kernel
SYSTEMS AFFECTED
Linux
PROBLEM
Patrick Reynolds found following. Linux capability bounding sets
are not as secure as originally intended, particularly for
disabling the loading of kernel modules, as suggested in the
documentation for the 'lcap' package and in two back issues of
Linux Weekly News.
Recent Linux kernels include a system setting in /proc called the
"capability bounding set" that allows administrators to set which
POSIX-ish capabilities should be denied all processes on the
system. That is, if you disable a capability in
/proc/sys/kernel/cap-bound, no process on the system can possess
this capability, and no process except init may re-enable the
capability in /proc/sys/kernel/cap-bound. (No existing init
supports this feature AFAIK, so the capability bounding set is
effectively irreversible).
However, the capability bounding set is useless unless you disable
/dev/mem, because /proc/sys/kernel/cap-bound maps directly to the
cap_bset variable in kernel memory. With a quick poke (remember
peek and poke from the days of BASIC on C64s and IBM PCs?) into
/dev/mem, you can reset the cap_bset variable, reenabling any or
all capabilities, despite the intended one-way-ness of the
capability bounding set. To get the address for cap_bset, just:
$ grep cap_bset System.map
c01d46b0 D cap_bset
Strip off the leading 'c' (since the kernel segment maps to
0xc0000000 on x86s) and you get the raw physical memory address
(i.e., offset into /dev/mem) to write to. On an x86, it's a
32-bit, little-endian integer. Write 0xffffffff to it to
re-enable all capabilities. (This does not give processes these
capabilities; it just prevents the kernel from universally
denying them as intended).
To make capability bounding sets at all useful, you have to
disable CAP_SYS_RAWIO, which governs access to /dev/mem. Be
advised that doing so will break X and any other user-space
program that needs raw access to memory or I/O ports.
Mathew Kirkwood dispute this. To make them at all useful, you
have to disable _or closely, (ideally provably) protect_
CAP_SYS_RAWIO. Obviously a setuid-root X server doesn't help
here, but some small necessary evils[0] which aren't setuid (or
{fs,elf}cap'ped) don't increase practical risk.
As it happens, before 2.2.7 or thereabouts, CAP_SYS_RAWIO was not
required open /dev/mem, /dev/kmem, /dev/port or /proc/kcore.
There are a lot of privileged ioctls which allow setting
hardware options (including I/O ports) which haven't been fixed.
As an aside, more fun with module security... Even if you compile
a kernel with module loading completely disabled, a clever
attacker could still load custom, module-like code into the kernel
using /dev/mem. It's trickier than changing cap-bound, but it's
still feasible, because page tables and syscall tables are
similarly exposed through /dev/mem.
Exploit: read open(2) and mmap(2) and write it yourself.
SOLUTION
If you disable anything in the capability bounding set, you must
also disable CAP_SYS_RAWIO and CAP_SYS_MODULE. Matthew don't
agree. CAP_SYS_RAWIO and CAP_SYS_MODULE (and quite possibly
others) must be protected at least as much as
/proc/sys/kernel/cap-bound. For him fix is to document this
adequately.
Matthew did, at one stage, have a patch which changed a lot of
these, but never got around to submitting it for an official
kernel. Thinking about it once more, it should probably demand
both CAP_SYS_ADMIN and CAP_SYS_RAWIO for most of these things.
Anyway, it's at
http://ferret.lmh.ox.ac.uk/~weejock/cap-rawio-fixes.diff