[PATCH v6 bpf-next 0/3] Introduce CAP_BPF

Marek Majkowski marek at cloudflare.com
Wed May 13 21:14:37 UTC 2020


On Wed, May 13, 2020 at 7:54 PM Alexei Starovoitov
<alexei.starovoitov at gmail.com> wrote:
>
> On Wed, May 13, 2020 at 07:30:05PM +0100, Marek Majkowski wrote:
> > On Wed, May 13, 2020 at 6:53 PM Alexei Starovoitov
> > <alexei.starovoitov at gmail.com> wrote:
> > > On Wed, May 13, 2020 at 11:50:42AM +0100, Marek Majkowski wrote:
> > > > On Wed, May 13, 2020 at 4:19 AM Alexei Starovoitov
> > > > <alexei.starovoitov at gmail.com> wrote:
> > > > >
> > > > > CAP_BPF solves three main goals:
> > > > > 1. provides isolation to user space processes that drop CAP_SYS_ADMIN and switch to CAP_BPF.
> > > > >    More on this below. This is the major difference vs v4 set back from Sep 2019.
> > > > > 2. makes networking BPF progs more secure, since CAP_BPF + CAP_NET_ADMIN
> > > > >    prevents pointer leaks and arbitrary kernel memory access.
> > > > > 3. enables fuzzers to exercise all of the verifier logic. Eventually finding bugs
> > > > >    and making BPF infra more secure. Currently fuzzers run in unpriv.
> > > > >    They will be able to run with CAP_BPF.
> > > > >
> > > >
> > > > Alexei, looking at this from a user point of view, this looks fine.
> > > >
> > > > I'm slightly worried about REUSEPORT_EBPF. Currently without your
> > > > patch, as far as I understand it:
> > > >
> > > > - You can load SOCKET_FILTER and SO_ATTACH_REUSEPORT_EBPF without any
> > > > permissions
> > >
> > > correct.
> > >
> > > > - For loading BPF_PROG_TYPE_SK_REUSEPORT program and for SOCKARRAY map
> > > > creation CAP_SYS_ADMIN is needed. But again, no permissions check for
> > > > SO_ATTACH_REUSEPORT_EBPF later.
> > >
> > > correct. With clarification that attaching process needs to own
> > > FD of prog and FD of socket.
> > >
> > > > If I read the patchset correctly, the former SOCKET_FILTER case
> > > > remains as it is and is not affected in any way by presence or absence
> > > > of CAP_BPF.
> > >
> > > correct. As commit log says:
> > > "Existing unprivileged BPF operations are not affected."
> > >
> > > > The latter case is different. Presence of CAP_BPF is sufficient for
> > > > map creation, but not sufficient for loading SK_REUSEPORT program. It
> > > > still requires CAP_SYS_ADMIN.
> > >
> > > Not quite.
> > > The patch will allow BPF_PROG_TYPE_SK_REUSEPORT progs to be loaded
> > > with CAP_BPF + CAP_NET_ADMIN.
> > > Since this type of progs is clearly networking type I figured it's
> > > better to be consistent with the rest of networking types.
> > > Two unpriv types SOCKET_FILTER and CGROUP_SKB is the only exception.
> >
> > Ok, this is the controversy. It made sense to restrict SK_REUSEPORT
> > programs in the past, because programs needed CAP_NET_ADMIN to create
> > SOCKARRAY anyway.
>
> Not quite. Currently sockarray needs CAP_SYS_ADMIN to create
> which makes little sense from security pov.
> CAP_BPF relaxes it CAP_BPF or CAP_SYS_ADMIN.
>
> > Now we change this and CAP_BPF is sufficient for
> > maps - I don't see why CAP_BPF is not sufficient for SK_REUSEPORT
> > programs. From a user point of view I don't get why this additional
> > CAP_NET_ADMIN is needed.
>
> That actually bring another point. I'm not changing sock_map,
> sock_hash, dev_map requirements yet. All three still require CAP_NET_ADMIN.
> We can relax them to CAP_BPF _or_ CAP_NET_ADMIN in the future,
> but I'd like to do that in the follow up.

Agreed, we can discuss relaxation of SOCKMAP in the future.

> > > > I think it's a good opportunity to relax
> > > > this CAP_SYS_ADMIN requirement. I think the presence of CAP_BPF should
> > > > be sufficient for loading BPF_PROG_TYPE_SK_REUSEPORT.
> > > >
> > > > Our specific use case is simple - we want an application program -
> > > > like nginx - to control REUSEPORT programs. We will grant it CAP_BPF,
> > > > but we don't want to grant it CAP_SYS_ADMIN.
> > >
> > > You'll be able to grant nginx CAP_BPF + CAP_NET_ADMIN to load SK_REUSEPORT
> > > and unpriv child process will be able to attach just like before if
> > > it has right FDs.
> > > I suspect your load balancer needs CAP_NET_ADMIN already anyway due to
> > > use of XDP and TC progs.
> > > So granting CAP_BPF + CAP_NET_ADMIN should cover all bpf prog needs.
> > > Does it address your concern?
> >
> > Load balancer (XDP+TC) is another layer and permissions there are not
> > a problem. The specific issue is nginx (port 443) and QUIC. QUIC is
> > UDP and due to the nginx design we must use REUSEPORT groups to
> > balance the load across workers. This is fine and could be done with a
> > simple SOCK_FILTER - we don't need to grant nginx any permissions,
> > apart from CAP_NET_BIND_SERVICE.
> >
> > We would like to make the REUSEPORT program more complex to take
> > advantage of REUSEPORT_EBPF for stickyness (restarting server without
> > interfering with existing flows), we are happy to grant nginx CAP_BPF,
> > but we are not happy to grant it CAP_NET_ADMIN. Requiring this CAP for
> > REUSEPORT severely restricts the API usability for us.
> >
> > In my head REUSEPORT_EBPF is much closer to SOCKET_FILTER. I
> > understand why it needed capabilities before (map creation) and I
> > argue these reasons go away in CAP_BPF world. I assume that any
> > service (with CAP_BPF) should be able to use reuseport to distribute
> > packets within its own sockets.  Let me know if I'm missing something.
>
> Fair enough. We can include SK_REUSEPORT prog type as part of CAP_BPF alone.
> But will it truly achieve what you want?

It will make the security model much more useful and sane for me and
other users of stuff that depends on SK_REUSEPORT (like nginx + UDP).
So yes, long-term it will help. Thanks.

> You still need CAP_NET_ADMIN for sock_hash which you're using.
> Are you saying it's part of the different process that has that cap_net_admin
> and nginx will be fine with cap_bpf + cap_net_bind_service ?

At this moment good old SOCKARRAY is sufficient. Having both SOCKARRAY
and SK_REUSEPORT_EBPF depend only on CAP_BPF is a good start. Thanks
for considering that. We can discuss relaxation of SOCKMAP in the
future.

Marek



More information about the Linux-security-module-archive mailing list