[PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf

Wed Jul 31 19:09:27 UTC 2019

On Wed, Jul 31, 2019 at 1:10 AM Song Liu <songliubraving at fb.com> wrote:
>
>
>
> > On Jul 30, 2019, at 1:24 PM, Andy Lutomirski <luto at kernel.org> wrote:
> >
> > On Mon, Jul 29, 2019 at 10:07 PM Song Liu <songliubraving at fb.com> wrote:
> >>
> >> Hi Andy,
> >>
> >>> On Jul 27, 2019, at 11:20 AM, Song Liu <songliubraving at fb.com> wrote:
> >>>
> >>> Hi Andy,
> >>>
> >>>
>
> [...]
>
> >>>
> >>
> >> I would like more comments on this.
> >>
> >> Currently, bpf permission is more or less "root or nothing", which we
> >> would like to change.
> >>
> >> The short term goal is to separate bpf from root, in other words, it is
> >> "all or nothing". Special user space utilities, such as systemd, would
> >> benefit from this. Once this is implemented, systemd can call sys_bpf()
> >> when it is not running as root.
> >
> > As generally nasty as Linux capabilities are, this sounds like a good
> > use for CAP_BPF_ADMIN.
>
> I actually agree CAP_BPF_ADMIN makes sense. The hard part is to make
> existing tools (setcap, getcap, etc.) and libraries aware of the new CAP.

It's been done before -- it's not that hard.  IMO the main tricky bit
would be try be somewhat careful about defining exactly what
CAP_BPF_ADMIN does.

> > I don't see why you need to invent a whole new mechanism for this.
> > The entire cgroup ecosystem outside bpf() does just fine using the
> > write permission on files in cgroupfs to control access.  Why can't
> > bpf() do the same thing?
>
> It is easier to use write permission for BPF_PROG_ATTACH. But it is
> not easy to do the same for other bpf commands: BPF_PROG_LOAD and
> BPF_MAP_*. A lot of these commands don't have target concept. Maybe
> we should have target concept for all these commands. But that is a
> much bigger project. OTOH, "all or nothing" model allows all these
> commands at once.

For BPF_PROG_LOAD, I admit I've never understood why permission is
required at all.  I think that CAP_SYS_ADMIN or similar should be
needed to get is_priv in the verifier, but I think that should mainly
be useful for tracing, and that requires lots of privilege anyway.
BPF_MAP_* is probably the trickiest part.  One solution would be some
kind of bpffs, but I'm sure other solutions are possible.