[PATCH v12 02/26] securityfs: Extend securityfs with namespacing support

Stefan Berger stefanb at linux.ibm.com
Thu Jul 7 14:34:09 UTC 2022



On 5/20/22 22:23, Serge E. Hallyn wrote:
> On Wed, Apr 20, 2022 at 10:06:09AM -0400, Stefan Berger wrote:
>> Enable multiple instances of securityfs by keying each instance with a
>> pointer to the user namespace it belongs to.
>>
>> Since we do not need the pinning of the filesystem for the virtualization
>> case, limit the usage of simple_pin_fs() and simpe_release_fs() to the
>> case when the init_user_ns is active. This simplifies the cleanup for the
>> virtualization case where usage of securityfs_remove() to free dentries
>> is therefore not needed anymore.
>>
>> For the initial securityfs, i.e. the one mounted in the host userns mount,
>> nothing changes. The rules for securityfs_remove() are as before and it is
>> still paired with securityfs_create(). Specifically, a file created via
>> securityfs_create_dentry() in the initial securityfs mount still needs to
>> be removed by a call to securityfs_remove(). Creating a new dentry in the
>> initial securityfs mount still pins the filesystem like it always did.
>> Consequently, the initial securityfs mount is not destroyed on
>> umount/shutdown as long as at least one user of it still has dentries that
>> it hasn't removed with a call to securityfs_remove().
>>
>> Prevent mounting of an instance of securityfs in another user namespace
>> than it belongs to. Also, prevent accesses to files and directories by
>> a user namespace that is neither the user namespace it belongs to
>> nor an ancestor of the user namespace that the instance of securityfs
>> belongs to. Do not prevent access if securityfs was bind-mounted and
>> therefore the init_user_ns is the owning user namespace.
>>
>> Suggested-by: Christian Brauner <brauner at kernel.org>
>> Signed-off-by: Stefan Berger <stefanb at linux.ibm.com>
>> Signed-off-by: James Bottomley <James.Bottomley at HansenPartnership.com>
>>
>> ---
>> v11:
>>   - Formatted comment's first line to be '/*'
>> ---
>>   security/inode.c | 73 ++++++++++++++++++++++++++++++++++++++++--------
>>   1 file changed, 62 insertions(+), 11 deletions(-)
>>
>> diff --git a/security/inode.c b/security/inode.c
>> index 13e6780c4444..84c9396792a9 100644
>> --- a/security/inode.c
>> +++ b/security/inode.c
>> @@ -21,9 +21,38 @@
>>   #include <linux/security.h>
>>   #include <linux/lsm_hooks.h>
>>   #include <linux/magic.h>
>> +#include <linux/user_namespace.h>
>>   
>> -static struct vfsmount *mount;
>> -static int mount_count;
>> +static struct vfsmount *init_securityfs_mount;
>> +static int init_securityfs_mount_count;
>> +
>> +static int securityfs_permission(struct user_namespace *mnt_userns,
>> +				 struct inode *inode, int mask)
>> +{
>> +	int err;
>> +
>> +	err = generic_permission(&init_user_ns, inode, mask);
>> +	if (!err) {
>> +		/*
>> +		 * Unless bind-mounted, deny access if current_user_ns() is not
>> +		 * ancestor.
> 
> This comment has confused me the last few times I looked at this.  I see
> now you're using "bind-mounted" as a shortcut for saying "bind mounted from
> the init_user_ns into a child_user_ns container".  I do think that needs
> to be made clearer in this comment.


I rephrased the comment now.

    Stefan



More information about the Linux-security-module-archive mailing list