[PATCH v6 bpf-next 0/3] Introduce CAP_BPF

Wed May 13 18:54:52 UTC 2020

On Wed, May 13, 2020 at 07:30:05PM +0100, Marek Majkowski wrote:
> On Wed, May 13, 2020 at 6:53 PM Alexei Starovoitov
> <alexei.starovoitov at gmail.com> wrote:
> > On Wed, May 13, 2020 at 11:50:42AM +0100, Marek Majkowski wrote:
> > > On Wed, May 13, 2020 at 4:19 AM Alexei Starovoitov
> > > <alexei.starovoitov at gmail.com> wrote:
> > > >
> > > > CAP_BPF solves three main goals:
> > > > 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF.
> > > >    More on this below. This is the major difference vs v4 set back from Sep 2019.
> > > > 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN
> > > >    prevents pointer leaks and arbitrary kernel memory access.
> > > > 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs
> > > >    and making BPF infra more secure. Currently fuzzers run in unpriv.
> > > >    They will be able to run with CAP_BPF.
> > > >
> > >
> > > Alexei, looking at this from a user point of view, this looks fine.
> > >
> > > I'm slightly worried about REUSEPORT_EBPF. Currently without your
> > > patch, as far as I understand it:
> > >
> > > - You can load SOCKET_FILTER and SO_ATTACH_REUSEPORT_EBPF without any
> > > permissions
> >
> > correct.
> >
> > > - For loading BPF_PROG_TYPE_SK_REUSEPORT program and for SOCKARRAY map
> > > creation CAP_SYS_ADMIN is needed. But again, no permissions check for
> > > SO_ATTACH_REUSEPORT_EBPF later.
> >
> > correct. With clarification that attaching process needs to own
> > FD of prog and FD of socket.
> >
> > > If I read the patchset correctly, the former SOCKET_FILTER case
> > > remains as it is and is not affected in any way by presence or absence
> > > of CAP_BPF.
> >
> > correct. As commit log says:
> > "Existing unprivileged BPF operations are not affected."
> >
> > > The latter case is different. Presence of CAP_BPF is sufficient for
> > > map creation, but not sufficient for loading SK_REUSEPORT program. It
> > > still requires CAP_SYS_ADMIN.
> >
> > Not quite.
> > The patch will allow BPF_PROG_TYPE_SK_REUSEPORT progs to be loaded
> > with CAP_BPF + CAP_NET_ADMIN.
> > Since this type of progs is clearly networking type I figured it's
> > better to be consistent with the rest of networking types.
> > Two unpriv types SOCKET_FILTER and CGROUP_SKB is the only exception.
> 
> Ok, this is the controversy. It made sense to restrict SK_REUSEPORT
> programs in the past, because programs needed CAP_NET_ADMIN to create
> SOCKARRAY anyway. 

Not quite. Currently sockarray needs CAP_SYS_ADMIN to create
which makes little sense from security pov.
CAP_BPF relaxes it CAP_BPF or CAP_SYS_ADMIN.

> Now we change this and CAP_BPF is sufficient for
> maps - I don't see why CAP_BPF is not sufficient for SK_REUSEPORT
> programs. From a user point of view I don't get why this additional
> CAP_NET_ADMIN is needed.

That actually bring another point. I'm not changing sock_map,
sock_hash, dev_map requirements yet. All three still require CAP_NET_ADMIN.
We can relax them to CAP_BPF _or_ CAP_NET_ADMIN in the future,
but I'd like to do that in the follow up.

> 
> > > I think it's a good opportunity to relax
> > > this CAP_SYS_ADMIN requirement. I think the presence of CAP_BPF should
> > > be sufficient for loading BPF_PROG_TYPE_SK_REUSEPORT.
> > >
> > > Our specific use case is simple - we want an application program -
> > > like nginx - to control REUSEPORT programs. We will grant it CAP_BPF,
> > > but we don't want to grant it CAP_SYS_ADMIN.
> >
> > You'll be able to grant nginx CAP_BPF + CAP_NET_ADMIN to load SK_REUSEPORT
> > and unpriv child process will be able to attach just like before if
> > it has right FDs.
> > I suspect your load balancer needs CAP_NET_ADMIN already anyway due to
> > use of XDP and TC progs.
> > So granting CAP_BPF + CAP_NET_ADMIN should cover all bpf prog needs.
> > Does it address your concern?
> 
> Load balancer (XDP+TC) is another layer and permissions there are not
> a problem. The specific issue is nginx (port 443) and QUIC. QUIC is
> UDP and due to the nginx design we must use REUSEPORT groups to
> balance the load across workers. This is fine and could be done with a
> simple SOCK_FILTER - we don't need to grant nginx any permissions,
> apart from CAP_NET_BIND_SERVICE.
> 
> We would like to make the REUSEPORT program more complex to take
> advantage of REUSEPORT_EBPF for stickyness (restarting server without
> interfering with existing flows), we are happy to grant nginx CAP_BPF,
> but we are not happy to grant it CAP_NET_ADMIN. Requiring this CAP for
> REUSEPORT severely restricts the API usability for us.
> 
> In my head REUSEPORT_EBPF is much closer to SOCKET_FILTER. I
> understand why it needed capabilities before (map creation) and I
> argue these reasons go away in CAP_BPF world. I assume that any
> service (with CAP_BPF) should be able to use reuseport to distribute
> packets within its own sockets.  Let me know if I'm missing something.

Fair enough. We can include SK_REUSEPORT prog type as part of CAP_BPF alone.
But will it truly achieve what you want?
You still need CAP_NET_ADMIN for sock_hash which you're using.
Are you saying it's part of the different process that has that cap_net_admin
and nginx will be fine with cap_bpf + cap_net_bind_service ?