[PATCH v7 3/6] seccomp: add a way to get a listener fd from ptrace

Wed Oct 10 18:28:22 UTC 2018

On Wed, Oct 10, 2018 at 10:26:22AM -0700, Tycho Andersen wrote:
> On Wed, Oct 10, 2018 at 07:15:02PM +0200, Christian Brauner wrote:
> > On Wed, Oct 10, 2018 at 09:54:58AM -0700, Tycho Andersen wrote:
> > > On Wed, Oct 10, 2018 at 05:39:57PM +0200, Christian Brauner wrote:
> > > > On Wed, Oct 10, 2018 at 05:33:43PM +0200, Jann Horn wrote:
> > > > > On Wed, Oct 10, 2018 at 5:32 PM Paul Moore <paul at paul-moore.com> wrote:
> > > > > > On Tue, Oct 9, 2018 at 9:36 AM Jann Horn <jannh at google.com> wrote:
> > > > > > > +cc selinux people explicitly, since they probably have opinions on this
> > > > > >
> > > > > > I just spent about twenty minutes working my way through this thread,
> > > > > > and digging through the containers archive trying to get a good
> > > > > > understanding of what you guys are trying to do, and I'm not quite
> > > > > > sure I understand it all.  However, from what I have seen, this
> > > > > > approach looks very ptrace-y to me (I imagine to others as well based
> > > > > > on the comments) and because of this I think ensuring the usual ptrace
> > > > > > access controls are evaluated, including the ptrace LSM hooks, is the
> > > > > > right thing to do.
> > > > > 
> > > > > Basically the problem is that this new ptrace() API does something
> > > > > that doesn't just influence the target task, but also every other task
> > > > > that has the same seccomp filter. So the classic ptrace check doesn't
> > > > > work here.
> > > > 
> > > > Just to throw this into the mix: then maybe ptrace() isn't the right
> > > > interface and we should just go with the native seccomp() approach for
> > > > now.
> > > 
> > > Please no :).
> > > 
> > > I don't buy your arguments that 3-syscalls vs. one is better. If I'm
> > > doing this setup with a new container, I have to do
> > > clone(CLONE_FILES), do this seccomp thing, so that my parent can pick
> > > it up again, then do another clone without CLONE_FILES, because in the
> > > general case I don't want to share my fd table with the container,
> > > wait on the middle task for errors, etc. So we're still doing a bunch
> > > of setup, and it feels more awkward than ptrace, with at least as many
> > > syscalls, and it only works for your children.
> > 
> > You're talking about the case where you already have shot yourself in
> > the foot by blocking basically all other sensible ways of getting the fd
> > out.
> 
> Ok, but these other ways involve syscalls too (sendmsg() or whatever).
> And if you're going to allow arbitrary policy from your users, you
> have to be maximally flexible.

So, I totally like the idea of being able to get an fd before the filter
is active. If this could be done in seccomp()-only it would be A+ (See
Andy's mail in the other thread.)
But I really don't want to keep you working on this forever. :)

> 
> > Also, this was meant to show that parts of your initial justification
> > for implementing the ptrace() way of getting an fd doesn't really stand.
> > And it doesn't really. Even with ptrace() you can get into situations
> > where you're not able to get an fd. (see prior threads)
> 
> Of course. I guess my point was that we shouldn't design an API that's
> impossible to use. I'll drop the notes about sendmsg() from the commit
> message.
> 
> Tycho