[PATCH v2 bpf-next 1/4] bpf: unprivileged BPF access via /dev/bpf

Andy Lutomirski luto at kernel.org
Tue Aug 13 23:24:21 UTC 2019


On Tue, Aug 13, 2019 at 3:27 PM Daniel Colascione <dancol at google.com> wrote:
>
> On Tue, Aug 13, 2019 at 2:58 PM Alexei Starovoitov
> <alexei.starovoitov at gmail.com> wrote:
> >
> > On Tue, Aug 06, 2019 at 10:24:25PM -0700, Andy Lutomirski wrote:
> > > >
> > > > Inside containers and inside nested containers we need to start processes
> > > > that will use bpf. All of the processes are trusted.
> > >
> > > Trusted by whom?  In a non-nested container, the container manager
> > > *might* be trusted by the outside world.  In a *nested* container,
> > > unless the inner container management is controlled from outside the
> > > outer container, it's not trusted.  I don't know much about how
> > > Facebook's containers work, but the LXC/LXD/Podman world is moving
> > > very strongly toward user namespaces and maximally-untrusted
> > > containers, and I think bpf() should work in that context.
> >
> > agree that containers (namespaces) reduce amount of trust necessary
> > for apps to run, but the end goal is not security though.
> > Linux has become a single user system.
> > If user can ssh into the host they can become root.
> > If arbitrary code can run on the host it will be break out of any sandbox.
> > Containers are not providing the level of security that is enough
> > to run arbitrary code. VMs can do it better, but cpu bugs don't make it easy.
> > Containers are used to make production systems safer.
> > Some people call it more 'secure', but it's clearly not secure for
> > arbitrary code and that is what kernel.unprivileged_bpf_disabled allows.
> > When we say 'unprivileged bpf' we really mean arbitrary malicious bpf program.
> > It's been a constant source of pain. The constant blinding, randomization,
> > verifier speculative analysis, all spectre v1, v2, v4 mitigations
> > are simply not worth it. It's a lot of complex kernel code without users.
> > There is not a single use case to allow arbitrary malicious bpf
> > program to be loaded and executed.
> > As soon as we have /dev/bpf to allow all of bpf to be used without root
> > we will set sysctl kernel.unprivileged_bpf_disabled=1
> > Hence I prefer this /dev/bpf mechanism to be as simple a possible.
> > The applications that will use it are going to be just as trusted as systemd.
> >
> > > > To solve your concern of bypassing all capable checks...
> > > > How about we do /dev/bpf/full_verifier first?
> > > > It will replace capable() checks in the verifier only.
> > >
> > > I'm not convinced that "in the verifier" is the right distinction.
> > > Telling administrators that some setting lets certain users bypass
> > > bpf() verifier checks doesn't have a clear enough meaning.
> >
> > linux is a single user system. there are no administrators any more.
> > No doubt, folks will disagree, but that game is over.
> > At least on bpf side it's done.
> >
> > > I propose,
> > > instead, that the current capable() checks be divided into three
> > > categories:
> >
> > I don't see a use case for these categories.
> > All bpf programs extend the kernel in some way.
> > The kernel vs user is one category.
> > Conceptually CAP_BPF is enough. It would be similar to CAP_NET_ADMIN.
> > When application has CAP_NET_ADMIN it covers all of networking knobs.
> > There is no use case that would warrant fine grain CAP_ROUTE_ADMIN,
> > CAP_ETHTOOL_ADMIN, CAP_ETH0_ADMIN, etc.
> > Similarly CAP_BPF as the only knob is enough.
> > The only disadvantage of CAP_BPF is that it's not possible to
> > pass it from one systemd-like daemon to another systemd-like daemon.
> > Hence /dev/bpf idea and passing file descriptor.
> >
> > > This type of thing actually fits quite nicely into an idea I've been
> > > thinking about for a while called "implicit rights". In very brief
> > > summary, there would be objects called /dev/rights/xyz, where xyz is
> > > the same of a "right".  If there is a readable object of the right
> > > type at the literal path "/dev/rights/xyz", then you have right xyz.
> > > There's a bit more flexibility on top of this.  BPF could use
> > > /dev/rights/bpf/maptypes/lpm and
> > > /dev/rights/bpf/verifier/bounded_loops, for example.  Other non-BPF
> > > use cases include a biggie:
> > > /dev/rights/namespace/create_unprivileged_userns.
> > > /dev/rights/bind_port/80 would be nice, too.
> >
> > The concept of "implicit rights" is very nice and I'm sure it will
> > be a good fit somewhere, but I don't see why use it in bpf space.
> > There is no use case for fine grain partition of bpf features.
>
> Isn't this "implicit rights" model just another kind of ambient
> authority --- one that constrains the otherwise-free filesystem
> namespace to boot?

Yes.

> IMHO, the kernel should be moving toward explicit
> authorization tokens modeled by file descriptors and away from
> contextual authorization decisions.

And yes, I agree there too. Here's how I think about it: there are
really two layers here:

Rights: these are objects like /dev/rights/bpf/some_bpf_privilege or
/dev/rights/namespace/unpriv_userns, and you would, ideally, use them
like genuine capabilities.  You'd pass an fd with appropriate access
(FMODE_READ, presumably, since exec is awkward to work with for fds)
into bpf() or similar, and the kernel would say "yep, caller has the
capability" and do something.  There's nothing really restricting them
to /dev/rights, but they more or less have to live on a memory-backed
file system (a real backing store has all kinds of issues), and
putting them in /dev gets a lot of nifty properties for free.  For
example, existing container systems that don't know about them will
automatically deny them to containers, since nothing with an ounce of
sense passes all of /dev through to a container.  But container
systems that are aware of them can bind-mount them into the container.
And /dev is already known to be magical due to things like
/dev/urandom.

The implicit part on top is less than ideal, but it solves two problems:

1. It keeps compatibility with existing code.  There are programs that
expect unshare(CLONE_NEWUSER) to work -- with *implicit* rights, it
will work exactly when it's supposed to.  Also, for cases like
CLONE_NEWUSER, it does have more or less the right semantics -- if
they were explicit, most programs would just try to open
/dev/rights/namespace/unpriv_userns and pass the fd to unshare2, so
we're not losing much.

2. For things like eBPF where the set of rights could be a moving
target, implicit rights lets the model evolve without breaking
userspace.  So if LPM maps eventually become bulletproof and a right
is no longer needed, it still works.  Or if some feature in the
verifier that is currently unrestricted were subsequently deemed to
need restrictions, they could be added without retrofitting all the
users.

There are cases where implicit rights would be totally inappropriate.
For example, a CAP_DAC_READ_SEARCH right could not be safely made
implicit.  In general, I think the implicit model works for system
calls where it's unambiguous what the caller wants to have happen and
there, depending on privilege level, it either works or it doesn't.
So, for accessing a filesystem, it's not at all obvious whether a
program is accessing it on its own behalf or on a client's behalf, and
privilege usage should be explicit.  For something like "don't
Spectre-mitigate this eBPF program", the semantics change and the
request should IMO be explicit.  For for "create an LPM map", I don't
see how a confused deputy is likely, and an implicit right seems
reasonable.  Similarly, for creating a namespace or binding a network
port, confused deputies seem unlikely.  (For connecting to a network
address, if such a thing were ever restricted, confused deputies are
definitely possible and happen all the time, e.g. under a DNS
rebinding attack.)



More information about the Linux-security-module-archive mailing list