[PATCH bpf-next] bpf, capabilities: introduce CAP_BPF

Wed Oct 2 23:00:27 UTC 2019

On Wed, 2 Oct 2019 17:18:21 +0000
Alexei Starovoitov <ast at fb.com> wrote:

> >> It's an interesting idea, but I don't think it can work.
> >> Please see bpf_trace_printk implementation in kernel/trace/bpf_trace.c
> >> It's a lot more than string printing.  
> > 
> > Well, trace_printk() is just string printing. I was thinking that the
> > bpf_trace_printk() could just use a vsnprintf() into a temporary buffer
> > (like trace_printk() does), and then call the trace event to write it
> > out.  
> 
> are you proposing to replicate get_trace_buf() functionality
> into bpf_trace_printk?

No, do you need bpf_trace_printk() to run in all contexts?
trace_printk() does the get_trace_buf() dance so that it can be called
without locks and from any context including NMIs.

> So print into temp string buffer is done twice?
> I'm not excited about such hack.
> And what's the goal? so that trace_bpf_print(string_msg);
> can go through _run-time_ check whether that particular trace event
> was allowed in tracefs ?

No, just to use a standard event instead of hacking into
trace_printk().

> That's not how file system acls are typically designed.
> The permission check is at open(). Not at write().
> If I understood you correctly you're proposing to check permissions
> at bpf program run-time which is no good.
> 
> bpf_trace_printk() already has one small buffer for
> probe_kernel_read-ing an unknown string to pass into %s.
> That's not ftrace. That's core tracing. That aspect is covered by 
> CAP_TRACING as well.

Then use that buffer.

> 
> 
> >>  
> >>> The user could then just enable the trace event from the file system. I
> >>> could also work on making instances work like /tmp does (with the
> >>> sticky bit) in creation. That way people with write access to the
> >>> instances directory, can make their own buffers that they can use (and
> >>> others can't access).  
> >>
> >> We tried instances in bcc in the past and eventually removed all the
> >> support. The overhead of instances is too high to be usable.  
> > 
> > What overhead? An ftrace instance should not have any more overhead than
> > the root one does (it's the same code). Or are you talking about memory
> > overhead?  
> 
> Yes. Memory overhead. Human users doing cat/echo into tracefs won't be
> creating many instances, so that's the only practical usage of them.

If it's a real event, it can go into any of the ftrace buffers (top
level or instance), but it gives you the choice.

> 
> >   
> >>  
> >>>
> >>>      
> >>>>
> >>>> Both 'trace' and 'trace_pipe' have quirky side effects.
> >>>> Like opening 'trace' file will make all parallel trace_printk() to be ignored.
> >>>> While reading 'trace_pipe' file will clear it.
> >>>> The point that traditional 'read' and 'write' ACLs don't map as-is
> >>>> to tracefs, so I would be careful categorizing things into
> >>>> confidentiality vs integrity only based on access type.  
> >>>
> >>> What exactly is the bpf_trace_printk() used for? I may have other ideas
> >>> that can help.  
> >>
> >> It's debugging of bpf programs. Same is what printk() is used for
> >> by kernel developers.
> >>  
> > 
> > How is it extracted? Just read from the trace or trace_pipe file?  
> 
> yep. Just like kernel devs look at dmesg when they sprinkle printk.
> btw, if you can fix 'trace' file issue that stops all trace_printk
> while 'trace' file is open that would be great.
> Some users have been bitten by this behavior. We even documented it.

The behavior is documented as well in the ftrace documentation. That's
why we suggest the trace_pipe redirected into a file so that you don't
lose data (unless the writer goes too fast). If you prefer a producer
consumer where you lose newer events (like perf does), you can turn off
overwrite mode, and it will drop events when the buffer is full (see
options/overwrite).

-- Steve