[RFC PATCH 02/27] containers: Implement containers as kernel objects

Ian Kent raven at themaw.net
Thu Feb 21 10:39:28 UTC 2019


On Wed, 2019-02-20 at 14:26 +0100, Christian Brauner wrote:
> On Wed, Feb 20, 2019 at 10:46:24AM +0800, Ian Kent wrote:
> > On Fri, 2019-02-15 at 16:07 +0000, David Howells wrote:
> > > Implement a kernel container object such that it contains the following
> > > things:
> > > 
> > >  (1) Namespaces.
> > > 
> > >  (2) A root directory.
> > > 
> > >  (3) A set of processes, including one designated as the 'init' process.
> > 
> > Yeah, I think a name other than init needs to be used for this
> > process.
> > 
> > The problem being that there is no requirement for container
> > process 1 to behave in any way like an "init" process is
> > expected to behave and that leads to confusion (at least
> > it certainly did for me).
> 
> If you look at the documentation for pid namespaces(7) you can see that
> the pid 1 inside a pid namespace is expected to behave like an init
> process:
> -  "The  first  process created in a new namespace [...] has  the PID 1,
>    and is the "init" process for the namespace (see init(1))."
> - "[...] child process that is orphaned within the namespace will be
>   reparented to this process rather than init(1) [...]"
> - "If the "init" process of a PID namespace terminates, the kernel
>   terminates all of the processes in the  namespace  via a SIGKILL
>   signal. This behavior reflects the fact that the "init" process is
>   essential for the cor‐ rect operation of a PID namespace."
> - "Only signals for which the "init" process has established a signal
>   handler can be sent to the  "init" process by other members of the
>   PID namespace."
> - "[...] the reboot(2) system call causes a signal to be sent to the
>   namespace "init" process."
> 
> This is one of the reasons why all major current container runtimes
> finally after years of failing to realize this run a stub init process
> that mimicks a dumb init. Sure, you get away with not having an init
> that behaves like an init but this is inherently broken or at least
> against the way pid namespaces were designed.

TBH I wasn't sure why the signal I sent didn't arrive, AFAICS
it should have regardless of what signals the container init
process was accepting. But it could have been due to a
different problem in my kernel code (that's very likely).

In any case it wasn't worth perusing because even if I did work
it out I had already found that the request_key sub-system wasn't
playing well with others when trying to run something within a
container's namespaces, so no point in going further ...

Ian



More information about the Linux-security-module-archive mailing list