[PATCH v2 bpf-next 00/18] BPF token
Andrii Nakryiko
andrii.nakryiko at gmail.com
Thu Jun 15 22:48:36 UTC 2023
On Wed, Jun 14, 2023 at 2:39 AM Christian Brauner <brauner at kernel.org> wrote:
>
> On Wed, Jun 14, 2023 at 02:23:02AM +0200, Djalal Harouni wrote:
> > On Tue, Jun 13, 2023 at 12:27 AM Andrii Nakryiko
> > <andrii.nakryiko at gmail.com> wrote:
> > >
> > > On Mon, Jun 12, 2023 at 5:02 AM Djalal Harouni <tixxdz at gmail.com> wrote:
> > > >
> > > > On Sat, Jun 10, 2023 at 12:57 AM Andrii Nakryiko
> > > > <andrii.nakryiko at gmail.com> wrote:
> > > > >
> > > > > On Fri, Jun 9, 2023 at 3:30 PM Djalal Harouni <tixxdz at gmail.com> wrote:
> > > > > >
> > > > > > Hi Andrii,
> > > > > >
> > > > > > On Thu, Jun 8, 2023 at 1:54 AM Andrii Nakryiko <andrii at kernel.org> wrote:
> > > > > > >
> > > > > > > ...
> > > > > > > creating new BPF objects like BPF programs, BPF maps, etc.
> > > > > >
> > > > > > Is there a reason for coupling this only with the userns?
> > > > >
> > > > > There is no coupling. Without userns it is at least possible to grant
> > > > > CAP_BPF and other capabilities from init ns. With user namespace that
> > > > > becomes impossible.
> > > >
> > > > But these are not the same: delegate full cap vs delegate an fd mask?
> > >
> > > What FD mask are we talking about here? I don't recall us talking
> > > about any FD masks, so this one is a bit confusing without more
> > > context.
> >
> > Ah err, sorry yes referring to fd token (which I assumed is a mask of
> > allowed operations or something like that).
> >
> > So I want the possibility to delegate the fd token in the init userns.
> >
> > > >
> > > > One can argue unprivileged in init userns is the same privileged in
> > > > nested userns
> > > > Getting to delegate fd in init userns, then in nested ones seems logical...
> > >
> > > Again, sorry, I'm not following. Can you please elaborate what you mean?
> >
> > I mean can we use the fd token in the init user namespace too? not
> > only in the nested user namespaces but in the first one? Sorry I
> > didn't check the code.
> >
[...]
> >
> > > >
> > > > Having the fd or "token" that gives access rights pinned in two
> > > > separate bpffs mounts seems too much, it crosses namespaces (mount,
> > > > userns etc), environments setup by privileged...
> > >
> > > See above, there is nothing namespaceable about BPF itself, and BPF
> > > token as well. If some production setup benefits from pinning one BPF
> > > token in multiple places, I don't see the problem with that.
> > >
> > > >
> > > > I would just make it per bpffs mount and that's it, nothing more. If a
> > > > program wants to bind mount it somewhere else then it's not a bpf
> > > > problem.
> > >
> > > And if some application wants to pin BPF token, why would that be BPF
> > > subsystem's problem as well?
> >
> > The credentials, capabilities, keyring, different namespaces, etc are
> > all attached to the owning user namespace, if the BPF subsystem goes
> > its own way and creates a token to split up CAP_BPF without following
> > that model, then it's definitely a BPF subsystem problem... I don't
> > recommend that.
> >
> > Feels it's going more of a system-wide approach opening BPF
> > functionality where ultimately it clashes with the argument: delegate
> > a subset of BPF functionality to a *trusted* unprivileged application.
> > My reading of delegation is within a container/service hierarchy
> > nothing more.
>
> You're making the exact arguments that Lennart, Aleksa, and I have been
> making in the LSFMM presentation about this topic. It's even recorded:
Alright, so (I think) I get a pretty good feel now for what the main
concerns are, and why people are trying to push this to be an FS. And
it's not so much that BPF token grants bpf() syscall usage to unpriv
(but trusted) workloads or that BPF itself is not namespaceable. The
main worry is that BPF token, once issues, could be
illegally/uncontrollably passed outside of container, intentionally or
not. And by having this association with mount namespace (through BPF
FS) we automatically limit the sharing to only contain that has access
to that BPF FS.
So I agree that it makes sense to have this mount namespace
association, but I also would like to keep BPF token to be a separate
entity from BPF FS itself, and have the ability to have multiple
different BPF tokens exposed in a single BPF FS instance. I think the
latter is important.
So how about this slight modification: when a BPF token is created
using BPF_TOKEN_CREATE command, the user has to provide an FD for
"associated" BPF FS instance (superblock). What that does is allows
BPF token to be created with BPF FS and/or mount namespace association
set in stone. After that BPF token can only be pinned in that BPF FS
instance and cannot leave the boundaries of that mount namespace
(specific details to be worked out, this is new area for me, so I'm
sorry if I'm missing nuances).
What this slight tweak gives us is that we can still have multiple BPF
token instances within a single BPF FS. It is still pinnable/gettable
through common bpf() syscall's BPF_OBJ_PIN/BPF_OBJ_GET commands. You
still can have more nuances file permission and getting BPF token can
be controlled further through LSM. Also we still get to use an
extensible and familiar (to BPF users) bpf_attr binary approach.
Basically, it is very much native to BPF subsystem, but it is mount
namespace-bound like was requested by proponents of merging BPF token
and BPF FS together.
I assume that this BPF FS fd can be fetched using fsopen() or fspick()
syscalls, is that right?
WDYT? Does that sound like it would address all the above concerns?
Please point to any important details I might be missing (as I
mentioned, very unfamiliar territory).
>
> https://youtu.be/4CCRTWEZLpw?t=1546
>
> So we fully agree with you here.
I actually just rewatched that entire discussion. :) And after talking
about BPF token at length in the halls of the conference and email
discussions on this patch set, it was very useful to relisten (again)
all the finer points that were made back then. Thanks for the
remainder and the link.
More information about the Linux-security-module-archive
mailing list