[RFC v2 19/19] ima: Setup securityfs for IMA namespace

Fri Dec 3 18:06:13 UTC 2021

On 12/3/21 12:03, James Bottomley wrote:
> On Thu, 2021-12-02 at 21:31 -0500, Stefan Berger wrote:
> [...]
>>   static int securityfs_init_fs_context(struct fs_context *fc)
>>   {
>> +	int rc;
>> +
>> +	if (fc->user_ns->ima_ns->late_fs_init) {
>> +		rc = fc->user_ns->ima_ns->late_fs_init(fc->user_ns);
>> +		if (rc)
>> +			return rc;
>> +	}
>>   	fc->ops = &securityfs_context_ops;
>>   	return 0;
>>   }
> I know I suggested this, but to get this to work in general, it's going
> to have to not be specific to IMA, so it's going to have to become
> something generic like a notifier chain.  The other problem is it's
> only working still by accident:

I had thought about this also but the rationale was:

securityfs is compiled due to CONFIG_IMA_NS and the user namespace 
exists there and that has a pointer now to ima_namespace, which can have 
that callback. I assumed that other namespaced subsystems could also be 
reached then via such a callback, but I don't know.

I suppose any late filesystem init callchain would have to be connected 
to the user_namespace somehow?


>
>> +int ima_fs_ns_init(struct ima_namespace *ns)
>> +{
>> +	ns->mount = securityfs_ns_create_mount(ns->user_ns);
> This actually triggers on the call to securityfs_init_fs_context, but
> nothing happens because the callback is null.  Every subsequent use of
> fscontext will trigger this.  The point of a keyed supeblock is that
> fill_super is only called once per key, that's the place we should be
> doing this.   It should also probably be a blocking notifier so any
> consumer of securityfs can be namespaced by registering for this
> notifier.


What I don't like about the fill_super is that it gets called too early:

[   67.058611] securityfs_ns_create_mount @ 102 target user_ns: 
ffff95c010698c80; nr_extents: 0
[   67.059836] securityfs_fill_super @ 47  user_ns: ffff95c010698c80; 
nr_extents: 0

We are switching to the target user namespace in 
securityfs_ns_create_mount. The expected nr_extents at this point is 0, 
since user_ns hasn't been configured, yet. But then security_fill_super 
is also called with nr_extents 0. We cannot use that, it's too early!


>
>> +	if (IS_ERR(ns->mount)) {
>> +		ns->mount = NULL;
>> +		return -1;
>> +	}
>> +	ns->mount_count = 1;
> This is a bit nasty, too: we're spilling the guts of mount count
> tracking into IMA instead of encapsulating it inside securityfs.


Ok, I can make this disappear.


>
>> +
>> +	/* Adjust the trigger for user namespace's early teardown of
>> dependent
>> +	 * namespaces. Due to the filesystem there's an additional
>> reference
>> +	 * to the user namespace.
>> +	 */
>> +	ns->user_ns->refcount_teardown += 1;
>> +
>> +	ns->late_fs_init = ima_fs_ns_late_init;
>> +
>> +	return 0;
>> +}
> I think what should be happening is that we shouldn't so the
> simple_pin_fs, which creates the inodes, ahead of time; we should do it
> inside fill_super using a notifier, meaning it gets called once per

fill_super would only work for the init_user_ns from what I can see.


> key, creates the root dentry then triggers the notifier which
> instantiates all the namespaced entries.  We can still use
> simple_pin_fs for this because there's no locking across fill_super.
> This would mean fill_super would be called the first time the
> securityfs is mounted inside the namespace.


I guess I would need to know how fill_super would work or how it could 
be called late/delayed as well.


>
> If we do it this way, we can now make securityfs have its own mount and
> mount_count inside the user namespace, which it uses internally to the
> securityfs code, thus avoiding exposing them to ima or any other
> namespaced consumer.
>
> I also think we now don't need the securityfs_ns_ duplicated functions
> because the callback via the notifier chain now ensures we can use the
> namespace they were created in to distinguish between non namespaced
> and namespaced entries.

Is there then no need to pass a separate vfsmount * in anymore? Where 
would the vfsmount pointer reside? For now it's in ima_namespace, but it 
sounds like it should be in a more centralized place? Should it also be 
connected to the user_namespace so we can pick it up using get_user_ns()?


>
> So non-namespaced consumers of securityfs would do what they do now
> (calling the securityfs_create on initialization) and namespaced
> consumers would register a callback on the notifier which would get
> called once for every namespace the securityfs gets mounted in.
>
> I also theorize if we do it with notifiers, we could have a notifier on
> kill_sb to tear down all the entires.  If we do this, I think we don't
> have to pin any more.
>
> James
>
>

diff --git a/security/inode.c b/security/inode.c
index ed5f1c533776..49c9839642ed 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -44,6 +44,8 @@ static int securityfs_fill_super(struct super_block 
*sb, struct fs_context *fc)
         static const struct tree_descr files[] = {{""}};
         int error;

+       printk(KERN_INFO "%s @ %u  user_ns: %px; nr_extents: %d\n", 
__func__, __LINE__, fc->user_ns, fc->user_ns->uid_map.nr_extents);
+
         error = simple_fill_super(sb, SECURITYFS_MAGIC, files);
         if (error)
                 return error;
@@ -97,6 +99,8 @@ struct vfsmount *securityfs_ns_create_mount(struct 
user_namespace *user_ns)
         put_user_ns(fc->user_ns);
         fc->user_ns = get_user_ns(user_ns);

+       printk(KERN_INFO "%s @ %u target user_ns: %px; nr_extents: 
%d\n", __func__, __LINE__, fc->user_ns, fc->user_ns->uid_map.nr_extents);
+
         mnt = fc_mount(fc);
         put_fs_context(fc);
         return mnt;