COMMAND

    kernel

SYSTEMS AFFECTED

    Linux

PROBLEM

    Patrick Reynolds found following.  Linux capability bounding  sets
    are  not  as  secure  as  originally  intended,  particularly  for
    disabling  the  loading  of  kernel  modules,  as suggested in the
    documentation for  the 'lcap'  package and  in two  back issues of
    Linux Weekly News.

    Recent Linux kernels include a system setting in /proc called  the
    "capability bounding set" that allows administrators to set  which
    POSIX-ish  capabilities  should  be  denied  all  processes on the
    system.     That   is,   if   you   disable   a   capability    in
    /proc/sys/kernel/cap-bound, no process  on the system  can possess
    this  capability,  and  no  process  except init may re-enable the
    capability  in  /proc/sys/kernel/cap-bound.   (No  existing   init
    supports this  feature AFAIK,  so the  capability bounding  set is
    effectively irreversible).

    However, the capability bounding set is useless unless you disable
    /dev/mem, because /proc/sys/kernel/cap-bound maps directly to  the
    cap_bset variable in kernel memory.   With a quick poke  (remember
    peek and poke from  the days of BASIC  on C64s and IBM  PCs?) into
    /dev/mem, you can reset  the cap_bset variable, reenabling  any or
    all  capabilities,  despite  the  intended  one-way-ness  of   the
    capability bounding set.  To get the address for cap_bset, just:

        $ grep cap_bset System.map
        c01d46b0 D cap_bset

    Strip  off  the  leading  'c'  (since  the  kernel segment maps to
    0xc0000000 on x86s)  and you get  the raw physical  memory address
    (i.e., offset  into /dev/mem)  to write  to.   On an  x86, it's  a
    32-bit,  little-endian  integer.    Write  0xffffffff  to  it   to
    re-enable all capabilities.   (This does not give  processes these
    capabilities;  it  just  prevents  the  kernel  from   universally
    denying them as intended).

    To  make  capability  bounding  sets  at  all  useful, you have to
    disable  CAP_SYS_RAWIO,  which  governs  access  to  /dev/mem.  Be
    advised  that  doing  so  will  break  X  and any other user-space
    program that needs raw access to memory or I/O ports.

    Mathew Kirkwood  dispute this.   To make  them at  all useful, you
    have  to   disable  _or   closely,  (ideally   provably)  protect_
    CAP_SYS_RAWIO.   Obviously  a  setuid-root  X  server doesn't help
    here, but some  small necessary evils[0]  which aren't setuid  (or
    {fs,elf}cap'ped) don't increase practical risk.

    As it happens, before 2.2.7 or thereabouts, CAP_SYS_RAWIO was  not
    required  open  /dev/mem,  /dev/kmem,  /dev/port  or  /proc/kcore.
    There  are  a  lot  of  privileged  ioctls  which  allow   setting
    hardware options (including I/O ports) which haven't been fixed.

    As an aside, more fun with module security...  Even if you compile
    a  kernel  with  module  loading  completely  disabled,  a  clever
    attacker could still load custom, module-like code into the kernel
    using /dev/mem.  It's  trickier than changing cap-bound,  but it's
    still  feasible,  because  page  tables  and  syscall  tables  are
    similarly exposed through /dev/mem.

    Exploit: read open(2) and mmap(2) and write it yourself.

SOLUTION

    If you disable anything in  the capability bounding set, you  must
    also  disable  CAP_SYS_RAWIO  and  CAP_SYS_MODULE.   Matthew don't
    agree.   CAP_SYS_RAWIO  and  CAP_SYS_MODULE  (and  quite  possibly
    others)   must    be    protected   at    least    as   much    as
    /proc/sys/kernel/cap-bound.   For  him  fix  is  to  document this
    adequately.

    Matthew did, at  one stage, have  a patch which  changed a lot  of
    these,  but  never  got  around  to  submitting it for an official
    kernel.  Thinking  about it once  more, it should  probably demand
    both CAP_SYS_ADMIN  and CAP_SYS_RAWIO  for most  of these  things.
    Anyway, it's at

        http://ferret.lmh.ox.ac.uk/~weejock/cap-rawio-fixes.diff