[PATCH v8 bpf-next 00/18] BPF token and BPF FS-based delegation
Andrii Nakryiko
andrii.nakryiko at gmail.com
Tue Oct 24 17:52:09 UTC 2023
On Mon, Oct 16, 2023 at 11:04 AM Andrii Nakryiko <andrii at kernel.org> wrote:
>
> This patch set introduces an ability to delegate a subset of BPF subsystem
> functionality from privileged system-wide daemon (e.g., systemd or any other
> container manager) through special mount options for userns-bound BPF FS to
> a *trusted* unprivileged application. Trust is the key here. This
> functionality is not about allowing unconditional unprivileged BPF usage.
> Establishing trust, though, is completely up to the discretion of respective
> privileged application that would create and mount a BPF FS instance with
> delegation enabled, as different production setups can and do achieve it
> through a combination of different means (signing, LSM, code reviews, etc),
> and it's undesirable and infeasible for kernel to enforce any particular way
> of validating trustworthiness of particular process.
>
> The main motivation for this work is a desire to enable containerized BPF
> applications to be used together with user namespaces. This is currently
> impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced
> or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF
> helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read
> arbitrary memory, and it's impossible to ensure that they only read memory of
> processes belonging to any given namespace. This means that it's impossible to
> have a mechanically verifiable namespace-aware CAP_BPF capability, and as such
> another mechanism to allow safe usage of BPF functionality is necessary.BPF FS
> delegation mount options and BPF token derived from such BPF FS instance is
> such a mechanism. Kernel makes no assumption about what "trusted" constitutes
> in any particular case, and it's up to specific privileged applications and
> their surrounding infrastructure to decide that. What kernel provides is a set
> of APIs to setup and mount special BPF FS instanecs and derive BPF tokens from
> it. BPF FS and BPF token are both bound to its owning userns and in such a way
> are constrained inside intended container. Users can then pass BPF token FD to
> privileged bpf() syscall commands, like BPF map creation and BPF program
> loading, to perform such operations without having init userns privileged.
>
> This version incorporates feedback and suggestions ([3]) received on v3 of
> this patch set, and instead of allowing to create BPF tokens directly assuming
> capable(CAP_SYS_ADMIN), we instead enhance BPF FS to accepts a few new
> delegation mount options. If these options are used and BPF FS itself is
> properly created, set up, and mounted inside the user namespaced container,
> user application is able to derive a BPF token object from BPF FS instance,
> and pass that token to bpf() syscall. As explained in patch #2, BPF token
> itself doesn't grant access to BPF functionality, but instead allows kernel to
> do namespaced capabilities checks (ns_capable() vs capable()) for CAP_BPF,
> CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, as applicable. So it forms one
> half of a puzzle and allows container managers and sys admins to have safe and
> flexible configuration options: determining which containers get delegation of
> BPF functionality through BPF FS, and then which applications within such
> containers are allowed to perform bpf() commands, based on namespaces
> capabilities.
>
> Previous attempt at addressing this very same problem ([0]) attempted to
> utilize authoritative LSM approach, but was conclusively rejected by upstream
> LSM maintainers. BPF token concept is not changing anything about LSM
> approach, but can be combined with LSM hooks for very fine-grained security
> policy. Some ideas about making BPF token more convenient to use with LSM (in
> particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF
> 2023 presentation ([1]). E.g., an ability to specify user-provided data
> (context), which in combination with BPF LSM would allow implementing a very
> dynamic and fine-granular custom security policies on top of BPF token. In the
> interest of minimizing API surface area and discussions this was relegated to
> follow up patches, as it's not essential to the fundamental concept of
> delegatable BPF token.
>
> It should be noted that BPF token is conceptually quite similar to the idea of
> /dev/bpf device file, proposed by Song a while ago ([2]). The biggest
> difference is the idea of using virtual anon_inode file to hold BPF token and
> allowing multiple independent instances of them, each (potentially) with its
> own set of restrictions. And also, crucially, BPF token approach is not using
> any special stateful task-scoped flags. Instead, bpf() syscall accepts
> token_fd parameters explicitly for each relevant BPF command. This addresses
> main concerns brought up during the /dev/bpf discussion, and fits better with
> overall BPF subsystem design.
>
> This patch set adds a basic minimum of functionality to make BPF token idea
> useful and to discuss API and functionality. Currently only low-level libbpf
> APIs support creating and passing BPF token around, allowing to test kernel
> functionality, but for the most part is not sufficient for real-world
> applications, which typically use high-level libbpf APIs based on `struct
> bpf_object` type. This was done with the intent to limit the size of patch set
> and concentrate on mostly kernel-side changes. All the necessary plumbing for
> libbpf will be sent as a separate follow up patch set kernel support makes it
> upstream.
>
> Another part that should happen once kernel-side BPF token is established, is
> a set of conventions between applications (e.g., systemd), tools (e.g.,
> bpftool), and libraries (e.g., libbpf) on exposing delegatable BPF FS
> instance(s) at well-defined locations to allow applications take advantage of
> this in automatic fashion without explicit code changes on BPF application's
> side. But I'd like to postpone this discussion to after BPF token concept
> lands.
>
> [0] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/
> [1] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf
> [2] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@fb.com/
> [3] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/
>
> v7->v8:
> - add bpf_token_allow_cmd and bpf_token_capable hooks (Paul);
> - inline bpf_token_alloc() into bpf_token_create() to prevent accidental
> divergence with security_bpf_token_create() hook (Paul);
Hi Paul,
I believe I addressed all the concerns you had in this revision. Can
you please take a look and confirm that all things look good to you
from LSM perspective? Thanks!
> v6->v7:
> - separate patches to refactor bpf_prog_alloc/bpf_map_alloc LSM hooks, as
> discussed with Paul, and now they also accept struct bpf_token;
> - added bpf_token_create/bpf_token_free to allow LSMs (SELinux,
> specifically) to set up security LSM blob (Paul);
> - last patch also wires bpf_security_struct setup by SELinux, similar to how
> it's done for BPF map/prog, though I'm not sure if that's enough, so worst
> case it's easy to drop this patch if more full fledged SELinux
> implementation will be done separately;
> - small fixes for issues caught by code reviews (Jiri, Hou);
> - fix for test_maps test that doesn't use LIBBPF_OPTS() macro (CI);
> v5->v6:
> - fix possible use of uninitialized variable in selftests (CI);
> - don't use anon_inode, instead create one from BPF FS instance (Christian);
> - don't store bpf_token inside struct bpf_map, instead pass it explicitly to
> map_check_btf(). We do store bpf_token inside prog->aux, because it's used
> during verification and even can be checked during attach time for some
> program types;
> - LSM hooks are left intact pending the conclusion of discussion with Paul
> Moore; I'd prefer to do LSM-related changes as a follow up patch set
> anyways;
> v4->v5:
> - add pre-patch unifying CAP_NET_ADMIN handling inside kernel/bpf/syscall.c
> (Paul Moore);
> - fix build warnings and errors in selftests and kernel, detected by CI and
> kernel test robot;
> v3->v4:
> - add delegation mount options to BPF FS;
> - BPF token is derived from the instance of BPF FS and associates itself
> with BPF FS' owning userns;
> - BPF token doesn't grant BPF functionality directly, it just turns
> capable() checks into ns_capable() checks within BPF FS' owning user;
> - BPF token cannot be pinned;
> v2->v3:
> - make BPF_TOKEN_CREATE pin created BPF token in BPF FS, and disallow
> BPF_OBJ_PIN for BPF token;
> v1->v2:
> - fix build failures on Kconfig with CONFIG_BPF_SYSCALL unset;
> - drop BPF_F_TOKEN_UNKNOWN_* flags and simplify UAPI (Stanislav).
>
> Andrii Nakryiko (18):
> bpf: align CAP_NET_ADMIN checks with bpf_capable() approach
> bpf: add BPF token delegation mount options to BPF FS
> bpf: introduce BPF token object
> bpf: add BPF token support to BPF_MAP_CREATE command
> bpf: add BPF token support to BPF_BTF_LOAD command
> bpf: add BPF token support to BPF_PROG_LOAD command
> bpf: take into account BPF token when fetching helper protos
> bpf: consistenly use BPF token throughout BPF verifier logic
> bpf,lsm: refactor bpf_prog_alloc/bpf_prog_free LSM hooks
> bpf,lsm: refactor bpf_map_alloc/bpf_map_free LSM hooks
> bpf,lsm: add BPF token LSM hooks
> libbpf: add bpf_token_create() API
> selftests/bpf: fix test_maps' use of bpf_map_create_opts
> libbpf: add BPF token support to bpf_map_create() API
> libbpf: add BPF token support to bpf_btf_load() API
> libbpf: add BPF token support to bpf_prog_load() API
> selftests/bpf: add BPF token-enabled tests
> bpf,selinux: allocate bpf_security_struct per BPF token
>
> drivers/media/rc/bpf-lirc.c | 2 +-
> include/linux/bpf.h | 83 ++-
> include/linux/filter.h | 2 +-
> include/linux/lsm_hook_defs.h | 15 +-
> include/linux/security.h | 43 +-
> include/uapi/linux/bpf.h | 44 ++
> kernel/bpf/Makefile | 2 +-
> kernel/bpf/arraymap.c | 2 +-
> kernel/bpf/bpf_lsm.c | 15 +-
> kernel/bpf/cgroup.c | 6 +-
> kernel/bpf/core.c | 3 +-
> kernel/bpf/helpers.c | 6 +-
> kernel/bpf/inode.c | 98 ++-
> kernel/bpf/syscall.c | 215 ++++--
> kernel/bpf/token.c | 247 +++++++
> kernel/bpf/verifier.c | 13 +-
> kernel/trace/bpf_trace.c | 2 +-
> net/core/filter.c | 36 +-
> net/ipv4/bpf_tcp_ca.c | 2 +-
> net/netfilter/nf_bpf_link.c | 2 +-
> security/security.c | 101 ++-
> security/selinux/hooks.c | 47 +-
> tools/include/uapi/linux/bpf.h | 44 ++
> tools/lib/bpf/bpf.c | 30 +-
> tools/lib/bpf/bpf.h | 39 +-
> tools/lib/bpf/libbpf.map | 1 +
> .../bpf/map_tests/map_percpu_stats.c | 20 +-
> .../selftests/bpf/prog_tests/libbpf_probes.c | 4 +
> .../selftests/bpf/prog_tests/libbpf_str.c | 6 +
> .../testing/selftests/bpf/prog_tests/token.c | 629 ++++++++++++++++++
> 30 files changed, 1577 insertions(+), 182 deletions(-)
> create mode 100644 kernel/bpf/token.c
> create mode 100644 tools/testing/selftests/bpf/prog_tests/token.c
>
> --
> 2.34.1
>
>
More information about the Linux-security-module-archive
mailing list