[RFC PATCH v1 1/2] landlock: Fix handling of disconnected directories
Mickaël Salaün
mic at digikod.net
Tue Jul 8 17:36:57 UTC 2025
On Tue, Jul 01, 2025 at 08:38:07PM +0200, Mickaël Salaün wrote:
> We can get disconnected files or directories when they are visible and
> opened from a bind mount, before being renamed/moved from the source of
> the bind mount in a way that makes them inaccessible from the mount
> point (i.e. out of scope).
>
> Until now, access rights tied to files or directories opened through a
> disconnected directory were collected by walking the related hierarchy
> down to the root of this filesystem because the mount point couldn't be
> found. This could lead to inconsistent access results, and
> hard-to-debug renames, especially because such paths cannot be printed.
>
> For a sandboxed task to create a disconnected directory, it needs to
> have write access (i.e. FS_MAKE_REG, FS_REMOVE_FILE, and FS_REFER) to
> the underlying source of the bind mount, and read access to the related
> mount point. Because a sandboxed task cannot get more access than those
> defined by its Landlock domain, this could only lead to inconsistent
> access rights because of missing those that should be inherited from the
> mount point hierarchy and inheriting from the hierarchy of the mounted
> filesystem instead.
>
> Landlock now handles files/directories opened from disconnected
> directories like the mount point these disconnected directories were
> opened from. This gives the guarantee that access rights on a
> file/directory cannot be more than those at open time. The rationale is
> that disconnected hierarchies might not be visible nor accessible to a
> sandboxed task, and relying on the collected access rights from them
> could introduce unexpected results, especially for rename actions
> because of the access right comparison between the source and the
> destination (see LANDLOCK_ACCESS_FS_REFER). This new behavior is much
> less surprising to users and safer from an access point of view.
>
> Unlike follow_dotdot(), we don't need to check for each directory if it
> is part of the mount's root, but instead this is only checked when we
> reached a root dentry (not a mount point), or when the access
> request is about to be allowed. This limits the number of calls to
> is_subdir() which walks down the hierarchy (again). This also avoids
> checking path connection at the beginning of the walk for each mount
> point, which would be racy.
>
> Make path_connected() public to stay consistent with the VFS. This
> helper is used when we are about to allowed an access.
>
> This change increases the stack size with two Landlock layer masks
> backups that are needed to reset the collected access rights to the
> latest mount point.
>
> Because opened files have their access rights stored in the related file
> security properties, their is no impact for disconnected or unlinked
> files.
>
> A following commit will document handling of disconnected files and
> directories.
>
> Cc: Günther Noack <gnoack at google.com>
> Cc: Song Liu <song at kernel.org>
> Reported-by: Tingmao Wang <m at maowtm.org>
> Closes: https://lore.kernel.org/r/027d5190-b37a-40a8-84e9-4ccbc352bcdf@maowtm.org
> Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER")
> Fixes: cb2c7d1a1776 ("landlock: Support filesystem access-control")
> Signed-off-by: Mickaël Salaün <mic at digikod.net>
> ---
>
> This replaces this patch:
> landlock: Remove warning in collect_domain_accesses()
> https://lore.kernel.org/r/20250618134734.1673254-1-mic@digikod.net
>
> I'll probably split this commit into two to ease backport (same for
> tests).
>
> This patch series applies on top of my next branch:
> https://git.kernel.org/pub/scm/linux/kernel/git/mic/linux.git/log/?h=next
>
> TODO: Add documentation
>
> TODO: Add Landlock erratum
> ---
> fs/namei.c | 2 +-
> include/linux/fs.h | 1 +
> security/landlock/fs.c | 121 +++++++++++++++++++++++++++++++++++------
> 3 files changed, 105 insertions(+), 19 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 4bb889fc980b..7853a876fc1c 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -716,7 +716,7 @@ static bool nd_alloc_stack(struct nameidata *nd)
> * Rename can sometimes move a file or directory outside of a bind
> * mount, path_connected allows those cases to be detected.
> */
> -static bool path_connected(struct vfsmount *mnt, struct dentry *dentry)
> +bool path_connected(struct vfsmount *mnt, struct dentry *dentry)
> {
> struct super_block *sb = mnt->mnt_sb;
>
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 4ec77da65f14..3c0e324a9272 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3252,6 +3252,7 @@ extern struct file * open_exec(const char *);
> /* fs/dcache.c -- generic fs support functions */
> extern bool is_subdir(struct dentry *, struct dentry *);
> extern bool path_is_under(const struct path *, const struct path *);
> +extern bool path_connected(struct vfsmount *mnt, struct dentry *dentry);
>
> extern char *file_path(struct file *, char *, int);
>
> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 1d6c4e728f92..51f03eb82069 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -768,7 +768,9 @@ static bool is_access_to_paths_allowed(
> struct path walker_path;
> access_mask_t access_masked_parent1, access_masked_parent2;
> layer_mask_t _layer_masks_child1[LANDLOCK_NUM_ACCESS_FS],
> - _layer_masks_child2[LANDLOCK_NUM_ACCESS_FS];
> + _layer_masks_child2[LANDLOCK_NUM_ACCESS_FS],
> + _layer_masks_parent1_bkp[LANDLOCK_NUM_ACCESS_FS],
> + _layer_masks_parent2_bkp[LANDLOCK_NUM_ACCESS_FS];
> layer_mask_t(*layer_masks_child1)[LANDLOCK_NUM_ACCESS_FS] = NULL,
> (*layer_masks_child2)[LANDLOCK_NUM_ACCESS_FS] = NULL;
>
> @@ -800,6 +802,8 @@ static bool is_access_to_paths_allowed(
> access_masked_parent1 = access_masked_parent2 =
> landlock_union_access_masks(domain).fs;
> is_dom_check = true;
> + memcpy(&_layer_masks_parent2_bkp, layer_masks_parent2,
> + sizeof(_layer_masks_parent2_bkp));
> } else {
> if (WARN_ON_ONCE(dentry_child1 || dentry_child2))
> return false;
> @@ -807,6 +811,8 @@ static bool is_access_to_paths_allowed(
> access_masked_parent1 = access_request_parent1;
> access_masked_parent2 = access_request_parent2;
> is_dom_check = false;
> + memcpy(&_layer_masks_parent1_bkp, layer_masks_parent1,
> + sizeof(_layer_masks_parent1_bkp));
> }
>
> if (unlikely(dentry_child1)) {
> @@ -858,6 +864,14 @@ static bool is_access_to_paths_allowed(
> child1_is_directory, layer_masks_parent2,
> layer_masks_child2,
> child2_is_directory))) {
> + /*
> + * Rewinds walk for disconnected directories before any other state
> + * change.
> + */
> + if (unlikely(!path_connected(walker_path.mnt,
> + walker_path.dentry)))
> + goto reset_to_mount_root;
> +
> /*
> * Now, downgrades the remaining checks from domain
> * handled accesses to requested accesses.
> @@ -893,14 +907,42 @@ static bool is_access_to_paths_allowed(
> ARRAY_SIZE(*layer_masks_parent2));
>
> /* Stops when a rule from each layer grants access. */
> - if (allowed_parent1 && allowed_parent2)
> + if (allowed_parent1 && allowed_parent2) {
> + /*
> + * Rewinds walk for disconnected directories before any other state
> + * change.
> + */
> + if (unlikely(!path_connected(walker_path.mnt,
> + walker_path.dentry)))
> + goto reset_to_mount_root;
> +
> break;
> + }
> +
> jump_up:
> if (walker_path.dentry == walker_path.mnt->mnt_root) {
> if (follow_up(&walker_path)) {
> + /* Saves known good values. */
> + memcpy(&_layer_masks_parent1_bkp,
> + layer_masks_parent1,
> + sizeof(_layer_masks_parent1_bkp));
> + if (layer_masks_parent2)
> + memcpy(&_layer_masks_parent2_bkp,
> + layer_masks_parent2,
> + sizeof(_layer_masks_parent2_bkp));
> +
> /* Ignores hidden mount points. */
> goto jump_up;
> } else {
> + /*
> + * Rewinds walk for disconnected directories before any other
> + * state change.
> + */
> + if (unlikely(!path_connected(
> + walker_path.mnt,
> + walker_path.dentry)))
> + goto reset_to_mount_root;
> +
This hunk is useless, I'll remove it.
> /*
> * Stops at the real root. Denies access
> * because not all layers have granted access.
> @@ -909,20 +951,51 @@ static bool is_access_to_paths_allowed(
> }
> }
> if (unlikely(IS_ROOT(walker_path.dentry))) {
> - /*
> - * Stops at disconnected root directories. Only allows
> - * access to internal filesystems (e.g. nsfs, which is
> - * reachable through /proc/<pid>/ns/<namespace>).
> - */
> if (walker_path.mnt->mnt_flags & MNT_INTERNAL) {
> + /*
> + * Stops and allows access when reaching disconnected root
> + * directories that are part of internal filesystems (e.g. nsfs,
> + * which is reachable through /proc/<pid>/ns/<namespace>).
> + */
> allowed_parent1 = true;
> allowed_parent2 = true;
> + break;
> + } else {
> + /*
> + * Ignores current walk in walker_path.mnt when reaching
> + * disconnected root directories from bind mounts. Reset the
> + * collected access rights to the latest mount point (or @path)
> + * we walked through, and start again from the current root of
> + * the mount point. The newly collected access rights will be
> + * less than or equal to those at open time.
> + */
> + goto reset_to_mount_root;
> }
> - break;
> }
> parent_dentry = dget_parent(walker_path.dentry);
> dput(walker_path.dentry);
> walker_path.dentry = parent_dentry;
> + continue;
> +
> +reset_to_mount_root:
> + /* Restores latest known good values. */
> + memcpy(layer_masks_parent1, &_layer_masks_parent1_bkp,
> + sizeof(_layer_masks_parent1_bkp));
> + if (layer_masks_parent2)
> + memcpy(layer_masks_parent2, &_layer_masks_parent2_bkp,
> + sizeof(_layer_masks_parent2_bkp));
> +
> + /*
> + * Ignores previous results. They will be computed again with the next
> + * iteration.
> + */
> + allowed_parent1 = false;
> + allowed_parent2 = false;
> +
> + /* Restarts with the current mount point. */
> + dput(walker_path.dentry);
> + walker_path.dentry = walker_path.mnt->mnt_root;
> + dget(walker_path.dentry);
> }
> path_put(&walker_path);
>
> @@ -1030,13 +1103,13 @@ static access_mask_t maybe_remove(const struct dentry *const dentry)
> */
> static bool collect_domain_accesses(
> const struct landlock_ruleset *const domain,
> - const struct dentry *const mnt_root, struct dentry *dir,
> + const struct path *const mnt_dir, struct dentry *dir,
> layer_mask_t (*const layer_masks_dom)[LANDLOCK_NUM_ACCESS_FS])
> {
> - unsigned long access_dom;
> + access_mask_t access_dom;
> bool ret = false;
>
> - if (WARN_ON_ONCE(!domain || !mnt_root || !dir || !layer_masks_dom))
> + if (WARN_ON_ONCE(!domain || !mnt_dir || !dir || !layer_masks_dom))
> return true;
> if (is_nouser_or_private(dir))
> return true;
> @@ -1053,6 +1126,10 @@ static bool collect_domain_accesses(
> if (landlock_unmask_layers(find_rule(domain, dir), access_dom,
> layer_masks_dom,
> ARRAY_SIZE(*layer_masks_dom))) {
> + /* Ignores this walk if we end up in a disconnected directory. */
> + if (unlikely(!path_connected(mnt_dir->mnt, dir)))
> + goto cancel_walk;
> +
> /*
> * Stops when all handled accesses are allowed by at
> * least one rule in each layer.
> @@ -1061,13 +1138,23 @@ static bool collect_domain_accesses(
> break;
> }
>
> - /* Stops at the mount point or disconnected root directories. */
> - if (dir == mnt_root || IS_ROOT(dir))
> + /* Stops at the mount point. */
> + if (dir == mnt_dir->dentry)
> break;
>
> + /* Ignores this walk if we end up in a disconnected root directory. */
> + if (unlikely(IS_ROOT(dir)))
> + goto cancel_walk;
> +
> parent_dentry = dget_parent(dir);
> dput(dir);
> dir = parent_dentry;
> + continue;
> +
> +cancel_walk:
> + landlock_init_layer_masks(domain, LANDLOCK_MASK_ACCESS_FS,
> + layer_masks_dom, LANDLOCK_KEY_INODE);
> + break;
> }
> dput(dir);
> return ret;
> @@ -1198,13 +1285,11 @@ static int current_check_refer_path(struct dentry *const old_dentry,
> old_dentry->d_parent;
>
> /* new_dir->dentry is equal to new_dentry->d_parent */
> - allow_parent1 = collect_domain_accesses(subject->domain, mnt_dir.dentry,
> - old_parent,
> - &layer_masks_parent1);
> - allow_parent2 = collect_domain_accesses(subject->domain, mnt_dir.dentry,
> + allow_parent1 = collect_domain_accesses(
> + subject->domain, &mnt_dir, old_parent, &layer_masks_parent1);
> + allow_parent2 = collect_domain_accesses(subject->domain, &mnt_dir,
> new_dir->dentry,
> &layer_masks_parent2);
> -
> if (allow_parent1 && allow_parent2)
> return 0;
>
> --
> 2.50.0
>
>
More information about the Linux-security-module-archive
mailing list