[PATCH v2 bpf-next 00/18] BPF token

Fri Jun 23 22:18:18 UTC 2023

On 6/16/23 12:48 AM, Andrii Nakryiko wrote:
> On Wed, Jun 14, 2023 at 2:39 AM Christian Brauner <brauner at kernel.org> wrote:
>> On Wed, Jun 14, 2023 at 02:23:02AM +0200, Djalal Harouni wrote:
>>> On Tue, Jun 13, 2023 at 12:27 AM Andrii Nakryiko
>>> <andrii.nakryiko at gmail.com> wrote:
>>>> On Mon, Jun 12, 2023 at 5:02 AM Djalal Harouni <tixxdz at gmail.com> wrote:
>>>>> On Sat, Jun 10, 2023 at 12:57 AM Andrii Nakryiko
>>>>> <andrii.nakryiko at gmail.com> wrote:
>>>>>> On Fri, Jun 9, 2023 at 3:30 PM Djalal Harouni <tixxdz at gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Andrii,
>>>>>>>
>>>>>>> On Thu, Jun 8, 2023 at 1:54 AM Andrii Nakryiko <andrii at kernel.org> wrote:
>>>>>>>>
>>>>>>>> ...
>>>>>>>> creating new BPF objects like BPF programs, BPF maps, etc.
>>>>>>>
>>>>>>> Is there a reason for coupling this only with the userns?
>>>>>>
>>>>>> There is no coupling. Without userns it is at least possible to grant
>>>>>> CAP_BPF and other capabilities from init ns. With user namespace that
>>>>>> becomes impossible.
>>>>>
>>>>> But these are not the same: delegate full cap vs delegate an fd mask?
>>>>
>>>> What FD mask are we talking about here? I don't recall us talking
>>>> about any FD masks, so this one is a bit confusing without more
>>>> context.
>>>
>>> Ah err, sorry yes referring to fd token (which I assumed is a mask of
>>> allowed operations or something like that).
>>>
>>> So I want the possibility to delegate the fd token in the init userns.
>>>
>>>>>
>>>>> One can argue unprivileged in init userns is the same privileged in
>>>>> nested userns
>>>>> Getting to delegate fd in init userns, then in nested ones seems logical...
>>>>
>>>> Again, sorry, I'm not following. Can you please elaborate what you mean?
>>>
>>> I mean can we use the fd token in the init user namespace too? not
>>> only in the nested user namespaces but in the first one? Sorry I
>>> didn't check the code.
>>>
> 
> [...]
> 
>>>
>>>>> Having the fd or "token" that gives access rights pinned in two
>>>>> separate bpffs mounts seems too much, it crosses namespaces (mount,
>>>>> userns etc), environments setup by privileged...
>>>>
>>>> See above, there is nothing namespaceable about BPF itself, and BPF
>>>> token as well. If some production setup benefits from pinning one BPF
>>>> token in multiple places, I don't see the problem with that.
>>>>
>>>>>
>>>>> I would just make it per bpffs mount and that's it, nothing more. If a
>>>>> program wants to bind mount it somewhere else then it's not a bpf
>>>>> problem.
>>>>
>>>> And if some application wants to pin BPF token, why would that be BPF
>>>> subsystem's problem as well?
>>>
>>> The credentials, capabilities, keyring, different namespaces, etc are
>>> all attached to the owning user namespace, if the BPF subsystem goes
>>> its own way and creates a token to split up CAP_BPF without following
>>> that model, then it's definitely a BPF subsystem problem...  I don't
>>> recommend that.
>>>
>>> Feels it's going more of a system-wide approach opening BPF
>>> functionality where ultimately it clashes with the argument: delegate
>>> a subset of BPF functionality to a *trusted* unprivileged application.
>>> My reading of delegation is within a container/service hierarchy
>>> nothing more.
>>
>> You're making the exact arguments that Lennart, Aleksa, and I have been
>> making in the LSFMM presentation about this topic. It's even recorded:
> 
> Alright, so (I think) I get a pretty good feel now for what the main
> concerns are, and why people are trying to push this to be an FS. And
> it's not so much that BPF token grants bpf() syscall usage to unpriv
> (but trusted) workloads or that BPF itself is not namespaceable. The
> main worry is that BPF token, once issues, could be
> illegally/uncontrollably passed outside of container, intentionally or
> not. And by having this association with mount namespace (through BPF
> FS) we automatically limit the sharing to only contain that has access
> to that BPF FS.

+1

> So I agree that it makes sense to have this mount namespace
> association, but I also would like to keep BPF token to be a separate
> entity from BPF FS itself, and have the ability to have multiple
> different BPF tokens exposed in a single BPF FS instance. I think the
> latter is important.
> 
> So how about this slight modification: when a BPF token is created
> using BPF_TOKEN_CREATE command, the user has to provide an FD for
> "associated" BPF FS instance (superblock). What that does is allows
> BPF token to be created with BPF FS and/or mount namespace association
> set in stone. After that BPF token can only be pinned in that BPF FS
> instance and cannot leave the boundaries of that mount namespace
> (specific details to be worked out, this is new area for me, so I'm
> sorry if I'm missing nuances).

Given bpffs is not a singleton and there can be multiple bpffs instances
in a container, couldn't we make the token a special bpffs mount/mode?
Something like single .token file in that mount (for example) which can
be opened and the fd then passed along for prog/map creation? And given
the multiple mounts, this also allows potentially for multiple tokens?
In other words, this is already set up by the container manager when it
sets up mounts rather than later, and the regular bpffs instance is sth
separate from all that. Meaning, in your container you get the usual
bpffs instance and then one or more special bpffs instances as tokens
at different paths (and in future they could unlock different subset of
bpf functionality for example).

Thanks,
Daniel