[PATCH v5 bpf-next 0/3] Introduce CAP_BPF

Fri May 8 22:45:36 UTC 2020

On 5/8/2020 2:53 PM, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast at kernel.org>
>
> v4->v5:
>
> Split BPF operations that are allowed under CAP_SYS_ADMIN into combination of
> CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN and keep some of them under CAP_SYS_ADMIN.
>
> The user process has to have
> - CAP_BPF and CAP_PERFMON to load tracing programs.
> - CAP_BPF and CAP_NET_ADMIN to load networking programs.
> (or CAP_SYS_ADMIN for backward compatibility).

Is there a case where CAP_BPF is useful in the absence of other capabilities?
I generally object to new capabilities in cases where existing capabilities
are already required.

>
> CAP_BPF solves three main goals:
> 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF.
>    More on this below. This is the major difference vs v4 set back from Sep 2019.
> 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN
>    prevents pointer leaks and arbitrary kernel memory access.
> 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs
>    and making BPF infra more secure. Currently fuzzers run in unpriv.
>    They will be able to run with CAP_BPF.
>
> The patchset is long overdue follow-up from the last plumbers conference.
> Comparing to what was discussed at LPC the CAP* checks at attach time are gone.
> For tracing progs the CAP_SYS_ADMIN check was done at load time only. There was
> no check at attach time. For networking and cgroup progs CAP_SYS_ADMIN was
> required at load time and CAP_NET_ADMIN at attach time, but there are several
> ways to bypass CAP_NET_ADMIN:
> - if networking prog is using tail_call writing FD into prog_array will
>   effectively attach it, but bpf_map_update_elem is an unprivileged operation.
> - freplace prog with CAP_SYS_ADMIN can replace networking prog
>
> Consolidating all CAP checks at load time makes security model similar to
> open() syscall. Once the user got an FD it can do everything with it.
> read/write/poll don't check permissions. The same way when bpf_prog_load
> command returns an FD the user can do everything (including attaching,
> detaching, and bpf_test_run).
>
> The important design decision is to allow ID->FD transition for
> CAP_SYS_ADMIN only. What it means that user processes can run
> with CAP_BPF and CAP_NET_ADMIN and they will not be able to affect each
> other unless they pass FDs via scm_rights or via pinning in bpffs.
> ID->FD is a mechanism for human override and introspection.
> An admin can do 'sudo bpftool prog ...'. It's possible to enforce via LSM that
> only bpftool binary does bpf syscall with CAP_SYS_ADMIN and the rest of user
> space processes do bpf syscall with CAP_BPF isolating bpf objects (progs, maps,
> links) that are owned by such processes from each other.
>
> Another significant change from LPC is that the verifier checks are split into
> allow_ptr_leaks and bpf_capable flags. The allow_ptr_leaks disables spectre
> defense and allows pointer manipulations while bpf_capable enables all modern
> verifier features like bpf-to-bpf calls, BTF, bounded loops, indirect stack
> access, dead code elimination, etc. All the goodness.
> These flags are initialized as:
>   env->allow_ptr_leaks = perfmon_capable();
>   env->bpf_capable = bpf_capable();
> That allows networking progs with CAP_BPF + CAP_NET_ADMIN enjoy modern
> verifier features while being more secure.
>
> Some networking progs may need CAP_BPF + CAP_NET_ADMIN + CAP_PERFMON,
> since subtracting pointers (like skb->data_end - skb->data) is a pointer leak,
> but the verifier may get smarter in the future.
>
> Please see patches for more details.
>
> Alexei Starovoitov (3):
>   bpf, capability: Introduce CAP_BPF
>   bpf: implement CAP_BPF
>   selftests/bpf: use CAP_BPF and CAP_PERFMON in tests
>
>  drivers/media/rc/bpf-lirc.c                   |  2 +-
>  include/linux/bpf_verifier.h                  |  1 +
>  include/linux/capability.h                    |  5 ++
>  include/uapi/linux/capability.h               | 34 +++++++-
>  kernel/bpf/arraymap.c                         |  2 +-
>  kernel/bpf/bpf_struct_ops.c                   |  2 +-
>  kernel/bpf/core.c                             |  4 +-
>  kernel/bpf/cpumap.c                           |  2 +-
>  kernel/bpf/hashtab.c                          |  4 +-
>  kernel/bpf/helpers.c                          |  4 +-
>  kernel/bpf/lpm_trie.c                         |  2 +-
>  kernel/bpf/queue_stack_maps.c                 |  2 +-
>  kernel/bpf/reuseport_array.c                  |  2 +-
>  kernel/bpf/stackmap.c                         |  2 +-
>  kernel/bpf/syscall.c                          | 87 ++++++++++++++-----
>  kernel/bpf/verifier.c                         | 24 ++---
>  kernel/trace/bpf_trace.c                      |  3 +
>  net/core/bpf_sk_storage.c                     |  4 +-
>  net/core/filter.c                             |  4 +-
>  security/selinux/include/classmap.h           |  4 +-
>  tools/testing/selftests/bpf/test_verifier.c   | 44 ++++++++--
>  tools/testing/selftests/bpf/verifier/calls.c  | 16 ++--
>  .../selftests/bpf/verifier/dead_code.c        | 10 +--
>  23 files changed, 191 insertions(+), 73 deletions(-)
>