[PATCH v4 09/40] mount: attach mappings to mounts

Christian Brauner christian.brauner at ubuntu.com
Thu Dec 3 23:57:05 UTC 2020

In order to support per-mount idmappings vfsmounts will be marked with
user namespaces. The idmapping associated with that user namespace will
be used to map the ids of vfs objects when they are accessed through
that mount. By default all vfsmounts are marked with the initial user
namespace. The initial user namespace is used to indicate that a mount
is not idmapped. All operations behave as before.

Based on prior discussions we want to attach the whole user namespace
and not just a dedicated idmapping struct. This allows us to reuse all
the helpers that already exist for dealing with idmappings instead of
introducing a whole new range of helpers. In addition, if we decide in
the future that we are confident enough to enable unprivileged users to
setup idmapped mounts we can allow the user namespace an already
idmapped mounts has been marked with to be replaced with another one.
The permission checking would then take into account whether the caller
is privileged in the user namespace the mount is currently marked with
and that is about to be replaced with another one. For now, we will
enforce in later patches that once a mount has been idmapped it can't be
remapped. This keeps permission checking and life-cycle management
simple, especially since users can always create a new mount with a
different idmapping anyway.

The idea to attach user namespaces to vfsmounts has been floated around
in various forms at Linux Plumbers in ~2018 with the original idea
tracing back to a discussion during a conference in St. Petersburg
between Christoph, Tycho, and myself.

Cc: Christoph Hellwig <hch at lst.de>
Cc: David Howells <dhowells at redhat.com>
Cc: Al Viro <viro at zeniv.linux.org.uk>
Cc: linux-fsdevel at vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner at ubuntu.com>
/* v2 */
patch introduced
- Christoph Hellwig <hch at lst.de>:
  - Split internal implementation into separate patch and move syscall
    implementation later.

/* v3 */
- David Howells <dhowells at redhat.com>:
  - Remove MNT_IDMAPPED flag. We can simply check the pointer and use
    smp_load_acquire() in later patches.

- Tycho Andersen <tycho at tycho.pizza>:
  - Use READ_ONCE() in mnt_user_ns().

/* v4 */
- Serge Hallyn <serge at hallyn.com>:
  - Use "mnt_userns" to refer to a vfsmount's userns everywhere to make
    terminology consistent.

- Christoph Hellwig <hch at lst.de>:
  - Drop the READ_ONCE() from this patch. At this point in the series we
    don't allowing changing the vfsmount's userns. The infra to do that
    is only introduced as almost the last patch in the series and there
    we immediately use smp_load_acquire() and smp_store_release().
 fs/namespace.c        | 9 +++++++++
 include/linux/fs.h    | 1 +
 include/linux/mount.h | 6 ++++++
 3 files changed, 16 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index 51cfacfb3f82..ac55f5c8b2de 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -228,6 +228,7 @@ static struct mount *alloc_vfsmnt(const char *name)
+		mnt->mnt.mnt_userns = &init_user_ns;
 	return mnt;
@@ -567,6 +568,11 @@ int sb_prepare_remount_readonly(struct super_block *sb)
 static void free_vfsmnt(struct mount *mnt)
+	struct user_namespace *mnt_userns;
+	mnt_userns = mnt_user_ns(&mnt->mnt);
+	if (mnt_userns != &init_user_ns)
+		put_user_ns(mnt_userns);
 #ifdef CONFIG_SMP
@@ -1075,6 +1081,9 @@ static struct mount *clone_mnt(struct mount *old, struct dentry *root,
 	mnt->mnt.mnt_flags &= ~(MNT_WRITE_HOLD|MNT_MARKED|MNT_INTERNAL);
+	mnt->mnt.mnt_userns = mnt_user_ns(&old->mnt);
+	if (mnt->mnt.mnt_userns != &init_user_ns)
+		mnt->mnt.mnt_userns = get_user_ns(mnt->mnt.mnt_userns);
 	mnt->mnt.mnt_sb = sb;
 	mnt->mnt.mnt_root = dget(root);
 	mnt->mnt_mountpoint = mnt->mnt.mnt_root;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0c820837d240..b4935793605f 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2280,6 +2280,7 @@ struct file_system_type {
 #define FS_HAS_SUBTYPE		4
 #define FS_USERNS_MOUNT		8	/* Can be mounted by userns root */
 #define FS_DISALLOW_NOTIFY_PERM	16	/* Disable fanotify permission events */
+#define FS_ALLOW_IDMAP         32      /* FS has been updated to handle vfs idmappings. */
 #define FS_THP_SUPPORT		8192	/* Remove once all fs converted */
 #define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move() during rename() internally. */
 	int (*init_fs_context)(struct fs_context *);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index aaf343b38671..52de25e08319 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -72,8 +72,14 @@ struct vfsmount {
 	struct dentry *mnt_root;	/* root of the mounted tree */
 	struct super_block *mnt_sb;	/* pointer to superblock */
 	int mnt_flags;
+	struct user_namespace *mnt_userns;
 } __randomize_layout;
+static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt)
+	return mnt->mnt_userns;
 struct file; /* forward dec */
 struct path;

More information about the Linux-security-module-archive mailing list