[PATCH bpf-next] bpf, capabilities: introduce CAP_BPF
Andy Lutomirski
luto at kernel.org
Wed Aug 28 06:12:29 UTC 2019
On Tue, Aug 27, 2019 at 9:43 PM Alexei Starovoitov
<alexei.starovoitov at gmail.com> wrote:
>
> On Tue, Aug 27, 2019 at 05:55:41PM -0700, Andy Lutomirski wrote:
> >
> > I was hoping for something in Documentation/admin-guide, not in a
> > changelog that's hard to find.
>
> eventually yes.
>
> > >
> > > > Changing the capability that some existing operation requires could
> > > > break existing programs. The old capability may need to be accepted
> > > > as well.
> > >
> > > As far as I can see there is no ABI breakage. Please point out
> > > which line of the patch may break it.
> >
> > As a more or less arbitrary selection:
> >
> > void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> > {
> > if (!bpf_prog_kallsyms_candidate(fp) ||
> > - !capable(CAP_SYS_ADMIN))
> > + !capable(CAP_BPF))
> > return;
> >
> > Before your patch, a task with CAP_SYS_ADMIN could do this. Now it
> > can't. Per the usual Linux definition of "ABI break", this is an ABI
> > break if and only if someone actually did this in a context where they
> > have CAP_SYS_ADMIN but not all capabilities. How confident are you
> > that no one does things like this?
> > void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> > {
> > if (!bpf_prog_kallsyms_candidate(fp) ||
> > - !capable(CAP_SYS_ADMIN))
> > + !capable(CAP_BPF))
> > return;
>
> Yes. I'm confident that apps don't drop everything and
> leave cap_sys_admin only before doing bpf() syscall, since it would
> break their own use of networking.
> Hence I'm not going to do the cap_syslog-like "deprecated" message mess
> because of this unfounded concern.
> If I turn out to be wrong we will add this "deprecated mess" later.
>
> >
> > From the previous discussion, you want to make progress toward solving
> > a lot of problems with CAP_BPF. One of them was making BPF
> > firewalling more generally useful. By making CAP_BPF grant the ability
> > to read kernel memory, you will make administrators much more nervous
> > to grant CAP_BPF.
>
> Andy, were your email hacked?
> I explained several times that in this proposal
> CAP_BPF _and_ CAP_TRACING _both_ are necessary to read kernel memory.
> CAP_BPF alone is _not enough_.
You have indeed said this many times. You've stated it as a matter of
fact as though it cannot possibly discussed. I'm asking you to
justify it.
> > Similarly, and correct me if I'm wrong, most of
> > these capabilities are primarily or only useful for tracing, so I
> > don't see why users without CAP_TRACING should get them.
> > bpf_trace_printk(), in particular, even has "trace" in its name :)
> >
> > Also, if a task has CAP_TRACING, it's expected to be able to trace the
> > system -- that's the whole point. Why shouldn't it be able to use BPF
> > to trace the system better?
>
> CAP_TRACING shouldn't be able to do BPF because BPF is not tracing only.
What does "do BPF" even mean? seccomp() does BPF. SO_ATTACH_FILTER
does BPF. Saying that using BPF should require a specific capability
seems kind of like saying that using the network should require a
specific capability. Linux (and Unixy systems in general) distinguish
between binding low-number ports, binding high-number ports, using raw
sockets, and changing the system's IP address. These have different
implications and require different capabilities.
It seems like you are specifically trying to add a new switch to turn
as much of BPF as possible on and off. Why?
> >
> > test_run allows fully controlled inputs, in a context where a program
> > can trivially flush caches, mistrain branch predictors, etc first. It
> > seems to me that, if a JITted bpf program contains an exploitable
> > speculation gadget (MDS, Spectre v1, RSB, or anything else),
>
> speaking of MDS... I already asked you to help investigate its
> applicability with existing bpf exposure. Are you going to do that?
I am blissfully uninvolved in MDS, and I don't know all that much more
about the overall mechanism than a random reader of tech news :) ISTM
there are two meaningful ways that BPF could be involved: a BPF
program could leak info into the state exposed by MDS, or a BPF
program could try to read that state. From what little I understand,
it's essentially inevitable that BPF leaks information into MDS state,
and this is probably even controllable by an attacker that understands
MDS in enough detail. So the interesting questions are: can BPF be
used to read MDS state and can BPF be used to leak information in a
more useful way than the rest of the kernel to an attacker.
Keeping in mind that the kernel will flush MDS state on every exit to
usermode, I think the most likely attack is to try to read MDS state
with BPF. This could happen, I suppose -- BPF programs can easily
contain the usual speculation gadgets of "do something and read an
address that depends on the outcome". Fortunately, outside of
bpf_probe_read(), AFAIK BPF programs can't directly touch user memory,
and an attacker that is allowed to use bpf_probe_read() doesn't need
MDS to read things.
So it's not entirely obvious to me how an attack would be mounted.
test_run would make it a lot easier, I think.
>
> > it will
> > be *much* easier to exploit it using test_run than using normal
> > network traffic. Similarly, normal network traffic will have network
> > headers that are valid enough to have caused the BPF program to be
> > invoked in the first place. test_run can inject arbitrary garbage.
>
> Please take a look at Jann's var1 exploit. Was it hard to run bpf prog
> in controlled environment without test_run command ?
>
Can you send me a link?
More information about the Linux-security-module-archive
mailing list