[RFC v2 19/19] ima: Setup securityfs for IMA namespace

Mon Dec 6 14:51:43 UTC 2021

On Mon, 2021-12-06 at 15:42 +0100, Christian Brauner wrote:
> On Mon, Dec 06, 2021 at 09:21:15AM -0500, James Bottomley wrote:
> > On Mon, 2021-12-06 at 15:11 +0100, Christian Brauner wrote:
> > > On Fri, Dec 03, 2021 at 01:06:13PM -0500, Stefan Berger wrote:
> > > > On 12/3/21 12:03, James Bottomley wrote:
> > [...]
> > > > > > +int ima_fs_ns_init(struct ima_namespace *ns)
> > > > > > +{
> > > > > > +	ns->mount = securityfs_ns_create_mount(ns->user_ns);
> > > > >  
> > > > > This actually triggers on the call to
> > > > > securityfs_init_fs_context,
> > > > > but nothing happens because the callback is null.  Every
> > > > > subsequent use of fscontext will trigger this.  The point of
> > > > > a
> > > > > keyed supeblock is that fill_super is only called once per
> > > > > key,
> > > > > that's the place we should be doing this.   It should also
> > > > > probably be a blocking notifier so any consumer of securityfs
> > > > > can
> > > > > be namespaced by registering for this notifier.
> > > > 
> > > > What I don't like about the fill_super is that it gets called
> > > > too
> > > > early:
> > > > 
> > > > [   67.058611] securityfs_ns_create_mount @ 102 target user_ns:
> > > > ffff95c010698c80; nr_extents: 0
> > > > [   67.059836] securityfs_fill_super @ 47  user_ns:
> > > > ffff95c010698c80;
> > > > nr_extents: 0
> > > > 
> > > > We are switching to the target user namespace in
> > > > securityfs_ns_create_mount.  The expected nr_extents at this
> > > > point
> > > > is 0, since user_ns hasn't been configured, yet. But then
> > > > security_fill_super is also called with nr_extents 0. We cannot
> > > > use
> > > > that, it's too early!
> > > 
> > > So the problem is that someone could mount securityfs before any
> > > idmappings are setup or what?
> > 
> > Yes, not exactly: we put a call to initialize IMA in
> > create_user_ns()
> > but it's too early to have the mappings, so we can't create the
> > securityfs entries in that call.  We need the inode to pick up the
> > root
> > owner from the s_user_ns mappings, so we can't create the dentries
> > for
> > the IMA securityfs entries until those mappings exist.
> > 
> > I'm assuming that by the time someone tries to mount securityfs
> > inside
> > the namespace, the mappings are set up, which is why triggering the
> > notifier to add the files on first mount seems like the best place
> > to
> > put it.
> > 
> > >  How does moving the setup to a later stage help at all? I'm
> > > struggling to make sense of this.
> > 
> > It's not moving all the setup, just the creation of the securityfs
> > entries.
> > 
> > >  When or even if idmappings are written isn't under imas control.
> > > Someone could mount securityfs without any idmappings setup. In
> > > that
> > > case they should get what they deserve, everything owner by
> > > overflowuid/overflowgid, no?
> > 
> > Right, in the current scheme of doing things, if they still haven't
> > written the mappings by the time they do the mount, they're just
> > going
> > to get nobody/nogroup as uid/gid, but that's their own fault.
> > 
> > > Or you can require in fill_super that kuid 0 and kgid 0 are
> > > mapped
> > > and fail if they aren't.
> > 
> > We can't create the securityfs entries in fill_super ... I already
> > tried and the locking just won't allow it.  And if we create them
> > ahead
> 
> What is the locking issue there exactly?

The main problem is we have no vfsmount and we can't create one in
there because the fill super is triggered by the vfsmount creation for
the actual mount.  It's all done under the sb->s_umount semaphore.

> I'm looking at ima_fs_ns_late_init() and there's nothing there that
> would cause obvious issues. You might not be able to use
> securityfs_create_*() in there for some reason but that just means
> you need to add a simple helper. Nearly every filesystem that needs
> to pre-create files does it in fill_super. So I really fail to see
> what the issue is currently. I mist just miss something obvious.

I think we might get it to work if we keep the root dentry in the
securityfs namespace entries instead of the vfsmount; I'll investigate.

James