[RFC PATCH bpf-next seccomp 00/12] eBPF seccomp filters

Thu May 20 08:56:13 UTC 2021

On Thu, May 20, 2021 at 03:16:10AM -0500, Tianyin Xu wrote:
> On Mon, May 17, 2021 at 10:40 AM Tycho Andersen <tycho at tycho.pizza> wrote:
> >
> > On Sun, May 16, 2021 at 03:38:00AM -0500, Tianyin Xu wrote:
> > > On Sat, May 15, 2021 at 10:49 AM Andy Lutomirski <luto at kernel.org> wrote:
> > > >
> > > > On 5/10/21 10:21 PM, YiFei Zhu wrote:
> > > > > On Mon, May 10, 2021 at 12:47 PM Andy Lutomirski <luto at kernel.org> wrote:
> > > > >> On Mon, May 10, 2021 at 10:22 AM YiFei Zhu <zhuyifei1999 at gmail.com> wrote:
> > > > >>>
> > > > >>> From: YiFei Zhu <yifeifz2 at illinois.edu>
> > > > >>>
> > > > >>> Based on: https://urldefense.com/v3/__https://lists.linux-foundation.org/pipermail/containers/2018-February/038571.html__;!!DZ3fjg!thbAoRgmCeWjlv0qPDndNZW1j6Y2Kl_huVyUffr4wVbISf-aUiULaWHwkKJrNJyo$
> > > > >>>
> > > > >>> This patchset enables seccomp filters to be written in eBPF.
> > > > >>> Supporting eBPF filters has been proposed a few times in the past.
> > > > >>> The main concerns were (1) use cases and (2) security. We have
> > > > >>> identified many use cases that can benefit from advanced eBPF
> > > > >>> filters, such as:
> > > > >>
> > > > >> I haven't reviewed this carefully, but I think we need to distinguish
> > > > >> a few things:
> > > > >>
> > > > >> 1. Using the eBPF *language*.
> > > > >>
> > > > >> 2. Allowing the use of stateful / non-pure eBPF features.
> > > > >>
> > > > >> 3. Allowing the eBPF programs to read the target process' memory.
> > > > >>
> > > > >> I'm generally in favor of (1).  I'm not at all sure about (2), and I'm
> > > > >> even less convinced by (3).
> > > > >>
> > > > >>>
> > > > >>>   * exec-only-once filter / apply filter after exec
> > > > >>
> > > > >> This is (2).  I'm not sure it's a good idea.
> > > > >
> > > > > The basic idea is that for a container runtime it may wait to execute
> > > > > a program in a container without that program being able to execve
> > > > > another program, stopping any attack that involves loading another
> > > > > binary. The container runtime can block any syscall but execve in the
> > > > > exec-ed process by using only cBPF.
> > > > >
> > > > > The use case is suggested by Andrea Arcangeli and Giuseppe Scrivano.
> > > > > @Andrea and @Giuseppe, could you clarify more in case I missed
> > > > > something?
> > > >
> > > > We've discussed having a notifier-using filter be able to replace its
> > > > filter.  This would allow this and other use cases without any
> > > > additional eBPF or cBPF code.
> > > >
> > >
> > > A notifier is not always a solution (even ignoring its perf overhead).
> > >
> > > One problem, pointed out by Andrea Arcangeli, is that notifiers need
> > > userspace daemons. So, it can hardly be used by daemonless container
> > > engines like Podman.
> >
> > I'm not sure I buy this argument. Podman already has a conmon instance
> > for each container, this could be a child of that conmon process, or
> > live inside conmon itself.
> >
> > Tycho
> 
> I checked with Andrea Arcangeli and Giuseppe Scrivano who are working on Podman.
> 
> You are right that Podman is not completely daemonless. However, “the
> fact it's no entirely daemonless doesn't imply it's a good idea to
> make it worse and to add complexity to the background conmon daemon or
> to add more daemons.”
> 
> TL;DR. User notifiers are surely more flexible, but are also more
> expensive and complex to implement, compared with ebpf filters. /*
> I’ll reply to Sargun’s performance argument in a separate email */
> 
> I'm sure you know Podman well, but let me still move some jade from
> Andrea and Giuseppe (all credits on podmon/crun are theirs) to
> elaborate the point, for folks cced on the list who are not very
> familiar with Podman.
> 
> Basically, the current order goes as follows:
> 
>          podman -> conmon -> crun -> container_binary
>                                \
>                                 - seccomp done at crun level, not conmon
> 
> At runtime, what's left is:
> 
>          conmon -> container_binary  /* podman disappears; crun disappears */
> 
> So, to go through and use seccomp notify to block `exec`, we can
> either start the container_binary with a seccomp agent wrapper, or
> bloat the common binary (as pointed out by Tycho).
> 
> If we go with the first approach, we will have:
> 
>          podman -> conmon -> crun -> seccomp_agent -> container_binary
> 
> So, at runtime we'd be left with one more daemon:
> 
>         conmon -> seccomp_agent -> container_binary

That seems like a strawman. I don't see why this has to be out of
process or a separate daemon. Conmon uses a regular event loop. Adding
support for processing notifier syscall notifications is
straightforward. Moving it to a plugin as you mentioned below is a
design decision not a necessity.

> 
> Apparently, nobody likes one more daemon. So, the proposal from

I'm not sure such a blanket statements about an indeterminate group of
people's alleged preferences constitutes a technical argument wny we
need ebpf in seccomp.

> Giuseppe was/is to use user notifiers as plugins (.so) loaded by
> conmon:
> https://github.com/containers/conmon/pull/190
> https://github.com/containers/crun/pull/438
> 
> Now, with the ebpf filter support, one can implement the same thing
> using an embarrassingly simple ebpf filter and, thanks to Giuseppe,
> this is well supported by crun.

So I think this is trying to jump the gun by saying "Look, the result
might be simpler.". That may even be the case - though I'm not yet
convinced - but Andy's point stands that this brings a slew of issues on
the table that need clear answers. Bringing stateful ebpf features into
seccomp is a pretty big step and especially around the
privilege/security model it looks pretty handwavy right now.

Christian