[PATCH bpf-next] bpf, capabilities: introduce CAP_BPF

Thu Aug 29 13:34:34 UTC 2019

On Wed, 28 Aug 2019 15:08:28 -0700
Alexei Starovoitov <alexei.starovoitov at gmail.com> wrote:

> On Wed, Aug 28, 2019 at 09:14:21AM +0200, Peter Zijlstra wrote:
> > On Tue, Aug 27, 2019 at 04:01:08PM -0700, Andy Lutomirski wrote:
> >   
> > > > Tracing:
> > > >
> > > > CAP_BPF and perf_paranoid_tracepoint_raw() (which is kernel.perf_event_paranoid == -1)
> > > > are necessary to:  
> > 
> > That's not tracing, that's perf.
> >   

> re: your first comment above.
> I'm not sure what difference you see in words 'tracing' and 'perf'.
> I really hope we don't partition the overall tracing category
> into CAP_PERF and CAP_FTRACE only because these pieces are maintained
> by different people.

I think Peter meant: It's not tracing, it's profiling.

And there is a bit of separation between the two, although there is an
overlap.

Yes, perf can do tracing but it's designed more for profiling.

> On one side perf_event_open() isn't really doing tracing (as step by
> step ftracing of function sequences), but perf_event_open() opens
> an event and the sequence of events (may include IP) becomes a trace.
> imo CAP_TRACING is the best name to descibe the privileged space
> of operations possible via perf_event_open, ftrace, kprobe, stack traces, etc.

I have no issue with what you suggest. I guess it comes down to how
fine grain people want to go. Do we want it to be all or nothing?
Should CAP_TRACING allow for write access to tracefs? Or should we go
with needing both CAP_TRACING and permissions in that directory
(like changing the group ownership of the files at every boot). 

Perhaps we should have a CAP_TRACING_RO, that gives read access to
tracefs (and write if the users have permissions). And have CAP_TRACING
to allow full write access as well (allowing for users to add kprobe
events and enabling tracers like the function tracer).

> 
> Another reason are kuprobes. They can be crated via perf_event_open
> and via tracefs. Are they in CAP_PERF or in CAP_FTRACE ? In both, right?
> Should then CAP_KPROBE be used ? that would be an overkill.
> It would partition the space even further without obvious need.
> 
> Looking from BPF angle... BPF doesn't have integration with ftrace yet.
> bpf_trace_printk is using ftrace mechanism, but that's 1% of ftrace.
> In the long run I really like to see bpf using all of ftrace.
> Whereas bpf is using a lot of 'perf'.
> And extending some perf things in bpf specific way.
> Take a look at how BPF_F_STACK_BUILD_ID. It's clearly perf/stack_tracing
> feature that generic perf can use one day.
> Currently it sits in bpf land and accessible via bpf only.
> Though its bpf only today I categorize it under CAP_TRACING.
> 
> I think CAP_TRACING privilege should allow task to do all of perf_event_open,
> kuprobe, stack trace, ftrace, and kallsyms.
> We can think of some exceptions that should stay under CAP_SYS_ADMIN,
> but most of the functionality available by 'perf' binary should be
> usable with CAP_TRACING. 'perf' can do bpf too.
> With CAP_BPF it would be all set.

As the above seems to favor the idea of CAP_TRACING allowing write
access to tracefs, should we have a CAP_TRACING_RO for just read access
and limited perf abilities?

-- Steve