[PATCH v1 06/11] landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER

Paul Moore paul at paul-moore.com
Thu Mar 17 01:26:48 UTC 2022


On Mon, Feb 21, 2022 at 4:15 PM Mickaël Salaün <mic at digikod.net> wrote:
>
> From: Mickaël Salaün <mic at linux.microsoft.com>
>
> Add a new LANDLOCK_ACCESS_FS_REFER access right to enable policy writers
> to allow sandboxed processes to link and rename files from and to a
> specific set of file hierarchies.  This access right should be composed
> with LANDLOCK_ACCESS_FS_MAKE_* for the destination of a link or rename,
> and with LANDLOCK_ACCESS_FS_REMOVE_* for a source of a rename.  This
> lift a Landlock limitation that always denied changing the parent of an
> inode.
>
> Renaming or linking to the same directory is still always allowed,
> whatever LANDLOCK_ACCESS_FS_REFER is used or not, because it is not
> considered a threat to user data.
>
> However, creating multiple links or renaming to a different parent
> directory may lead to privilege escalations if not handled properly.
> Indeed, we must be sure that the source doesn't gain more privileges by
> being accessible from the destination.  This is handled by making sure
> that the source hierarchy (including the referenced file or directory
> itself) restricts at least as much the destination hierarchy.  If it is
> not the case, an EXDEV error is returned, making it potentially possible
> for user space to copy the file hierarchy instead of moving or linking
> it.
>
> Instead of creating different access rights for the source and the
> destination, we choose to make it simple and consistent for users.
> Indeed, considering the previous constraint, it would be weird to
> require such destination access right to be also granted to the source
> (to make it a superset).
>
> See the provided documentation for additional details.
>
> New tests are provided with a following commit.
>
> Signed-off-by: Mickaël Salaün <mic at linux.microsoft.com>
> Link: https://lore.kernel.org/r/20220221212522.320243-7-mic@digikod.net
> ---
>  include/uapi/linux/landlock.h                |  27 +-
>  security/landlock/fs.c                       | 550 ++++++++++++++++---
>  security/landlock/limits.h                   |   2 +-
>  security/landlock/syscalls.c                 |   2 +-
>  tools/testing/selftests/landlock/base_test.c |   2 +-
>  tools/testing/selftests/landlock/fs_test.c   |   3 +-
>  6 files changed, 516 insertions(+), 70 deletions(-)

...

> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
> index 3886f9ad1a60..c7c7ce4e7cd5 100644
> --- a/security/landlock/fs.c
> +++ b/security/landlock/fs.c
> @@ -4,6 +4,7 @@
>   *
>   * Copyright © 2016-2020 Mickaël Salaün <mic at digikod.net>
>   * Copyright © 2018-2020 ANSSI
> + * Copyright © 2021-2022 Microsoft Corporation
>   */
>
>  #include <linux/atomic.h>
> @@ -269,16 +270,188 @@ static inline bool is_nouser_or_private(const struct dentry *dentry)
>                          unlikely(IS_PRIVATE(d_backing_inode(dentry))));
>  }
>
> -static int check_access_path(const struct landlock_ruleset *const domain,
> -               const struct path *const path,
> +static inline access_mask_t get_handled_accesses(
> +               const struct landlock_ruleset *const domain)
> +{
> +       access_mask_t access_dom = 0;
> +       unsigned long access_bit;

Would it be better to declare @access_bit as an access_mask_t type?
You're not using any macros like for_each_set_bit() in this function
so I believe it should be safe.

> +       for (access_bit = 0; access_bit < LANDLOCK_NUM_ACCESS_FS;
> +                       access_bit++) {
> +               size_t layer_level;

Considering the number of layers has dropped down to 16, it seems like
a normal unsigned int might be big enough for @layer_level :)

> +               for (layer_level = 0; layer_level < domain->num_layers;
> +                               layer_level++) {
> +                       if (domain->fs_access_masks[layer_level] &
> +                                       BIT_ULL(access_bit)) {
> +                               access_dom |= BIT_ULL(access_bit);
> +                               break;
> +                       }
> +               }
> +       }
> +       return access_dom;
> +}
> +
> +static inline access_mask_t init_layer_masks(
> +               const struct landlock_ruleset *const domain,
> +               const access_mask_t access_request,
> +               layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
> +{
> +       access_mask_t handled_accesses = 0;
> +       size_t layer_level;
> +
> +       memset(layer_masks, 0, sizeof(*layer_masks));
> +       if (WARN_ON_ONCE(!access_request))
> +               return 0;
> +
> +       /* Saves all handled accesses per layer. */
> +       for (layer_level = 0; layer_level < domain->num_layers;
> +                       layer_level++) {
> +               const unsigned long access_req = access_request;
> +               unsigned long access_bit;
> +
> +               for_each_set_bit(access_bit, &access_req,
> +                               ARRAY_SIZE(*layer_masks)) {
> +                       if (domain->fs_access_masks[layer_level] &
> +                                       BIT_ULL(access_bit)) {
> +                               (*layer_masks)[access_bit] |=
> +                                       BIT_ULL(layer_level);
> +                               handled_accesses |= BIT_ULL(access_bit);
> +                       }
> +               }
> +       }
> +       return handled_accesses;
> +}
> +
> +/*
> + * Check that a destination file hierarchy has more restrictions than a source
> + * file hierarchy.  This is only used for link and rename actions.
> + */
> +static inline bool is_superset(bool child_is_directory,
> +               const layer_mask_t (*const
> +                       layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
> +               const layer_mask_t (*const
> +                       layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
> +               const layer_mask_t (*const
> +                       layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
> +{
> +       unsigned long access_bit;
> +
> +       for (access_bit = 0; access_bit < ARRAY_SIZE(*layer_masks_dst_parent);
> +                       access_bit++) {
> +               /* Ignores accesses that only make sense for directories. */
> +               if (!child_is_directory && !(BIT_ULL(access_bit) & ACCESS_FILE))
> +                       continue;
> +
> +               /*
> +                * Checks if the destination restrictions are a superset of the
> +                * source ones (i.e. inherited access rights without child
> +                * exceptions).
> +                */
> +               if ((((*layer_masks_src_parent)[access_bit] & (*layer_masks_child)[access_bit]) |
> +                                       (*layer_masks_dst_parent)[access_bit]) !=
> +                               (*layer_masks_dst_parent)[access_bit])
> +                       return false;
> +       }
> +       return true;
> +}
> +
> +/*
> + * Removes @layer_masks accesses that are not requested.
> + *
> + * Returns true if the request is allowed, false otherwise.
> + */
> +static inline bool scope_to_request(const access_mask_t access_request,
> +               layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
> +{
> +       const unsigned long access_req = access_request;
> +       unsigned long access_bit;
> +
> +       if (WARN_ON_ONCE(!layer_masks))
> +               return true;
> +
> +       for_each_clear_bit(access_bit, &access_req, ARRAY_SIZE(*layer_masks))
> +               (*layer_masks)[access_bit] = 0;
> +       return !memchr_inv(layer_masks, 0, sizeof(*layer_masks));
> +}
> +
> +/*
> + * Returns true if there is at least one access right different than
> + * LANDLOCK_ACCESS_FS_REFER.
> + */
> +static inline bool is_eacces(
> +               const layer_mask_t (*const
> +                       layer_masks)[LANDLOCK_NUM_ACCESS_FS],
>                 const access_mask_t access_request)
>  {

Granted, I don't have as deep of an understanding of Landlock as you
do, but the function name "is_eacces" seems a little odd given the
nature of the function.  Perhaps "is_fsrefer"?

> -       layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
> -       bool allowed = false, has_access = false;
> +       unsigned long access_bit;
> +       /* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */
> +       const unsigned long access_check = access_request &
> +               ~LANDLOCK_ACCESS_FS_REFER;
> +
> +       if (!layer_masks)
> +               return false;
> +
> +       for_each_set_bit(access_bit, &access_check, ARRAY_SIZE(*layer_masks)) {
> +               if ((*layer_masks)[access_bit])
> +                       return true;
> +       }

Is calling for_each_set_bit() overkill here?  @access_check should
only ever have at most one bit set (LANDLOCK_ACCESS_FS_REFER), yes?

> +       return false;
> +}
> +
> +/**
> + * check_access_path_dual - Check a source and a destination accesses
> + *
> + * @domain: Domain to check against.
> + * @path: File hierarchy to walk through.
> + * @child_is_directory: Must be set to true if the (original) leaf is a
> + *     directory, false otherwise.
> + * @access_request_dst_parent: Accesses to check, once @layer_masks_dst_parent
> + *     is equal to @layer_masks_src_parent (if any).
> + * @layer_masks_dst_parent: Pointer to a matrix of layer masks per access
> + *     masks, identifying the layers that forbid a specific access.  Bits from
> + *     this matrix can be unset according to the @path walk.  An empty matrix
> + *     means that @domain allows all possible Landlock accesses (i.e. not only
> + *     those identified by @access_request_dst_parent).  This matrix can
> + *     initially refer to domain layer masks and, when the accesses for the
> + *     destination and source are the same, to request layer masks.
> + * @access_request_src_parent: Similar to @access_request_dst_parent but for an
> + *     initial source path request.  Only taken into account if
> + *     @layer_masks_src_parent is not NULL.
> + * @layer_masks_src_parent: Similar to @layer_masks_dst_parent but for an
> + *     initial source path walk.  This can be NULL if only dealing with a
> + *     destination access request (i.e. not a rename nor a link action).
> + * @layer_masks_child: Similar to @layer_masks_src_parent but only for the
> + *     linked or renamed inode (without hierarchy).  This is only used if
> + *     @layer_masks_src_parent is not NULL.
> + *
> + * This helper first checks that the destination has a superset of restrictions
> + * compared to the source (if any) for a common path.  It then checks that the
> + * collected accesses and the remaining ones are enough to allow the request.
> + *
> + * Returns:
> + * - 0 if the access request is granted;
> + * - -EACCES if it is denied because of access right other than
> + *   LANDLOCK_ACCESS_FS_REFER;
> + * - -EXDEV if the renaming or linking would be a privileged escalation
> + *   (according to each layered policies), or if LANDLOCK_ACCESS_FS_REFER is
> + *   not allowed by the source or the destination.
> + */
> +static int check_access_path_dual(const struct landlock_ruleset *const domain,
> +               const struct path *const path,
> +               bool child_is_directory,
> +               const access_mask_t access_request_dst_parent,
> +               layer_mask_t (*const
> +                       layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
> +               const access_mask_t access_request_src_parent,
> +               layer_mask_t (*layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
> +               layer_mask_t (*layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
> +{
> +       bool allowed_dst_parent = false, allowed_src_parent = false, is_dom_check;
>         struct path walker_path;
> -       size_t i;
> +       access_mask_t access_masked_dst_parent, access_masked_src_parent;
>
> -       if (!access_request)
> +       if (!access_request_dst_parent && !access_request_src_parent)
>                 return 0;
>         if (WARN_ON_ONCE(!domain || !path))
>                 return 0;
> @@ -287,22 +460,20 @@ static int check_access_path(const struct landlock_ruleset *const domain,
>         if (WARN_ON_ONCE(domain->num_layers < 1))
>                 return -EACCES;
>
> -       /* Saves all layers handling a subset of requested accesses. */
> -       for (i = 0; i < domain->num_layers; i++) {
> -               const unsigned long access_req = access_request;
> -               unsigned long access_bit;
> -
> -               for_each_set_bit(access_bit, &access_req,
> -                               ARRAY_SIZE(layer_masks)) {
> -                       if (domain->fs_access_masks[i] & BIT_ULL(access_bit)) {
> -                               layer_masks[access_bit] |= BIT_ULL(i);
> -                               has_access = true;
> -                       }
> -               }
> +       BUILD_BUG_ON(!layer_masks_dst_parent);

I know the kbuild robot already flagged this, but checking function
parameters with BUILD_BUG_ON() does seem a bit ... unusual :)

> +       if (layer_masks_src_parent) {
> +               if (WARN_ON_ONCE(!layer_masks_child))
> +                       return -EACCES;
> +               access_masked_dst_parent = access_masked_src_parent =
> +                       get_handled_accesses(domain);
> +               is_dom_check = true;
> +       } else {
> +               if (WARN_ON_ONCE(layer_masks_child))
> +                       return -EACCES;
> +               access_masked_dst_parent = access_request_dst_parent;
> +               access_masked_src_parent = access_request_src_parent;
> +               is_dom_check = false;
>         }
> -       /* An access request not handled by the domain is allowed. */
> -       if (!has_access)
> -               return 0;
>
>         walker_path = *path;
>         path_get(&walker_path);
> @@ -312,11 +483,50 @@ static int check_access_path(const struct landlock_ruleset *const domain,
>          */
>         while (true) {
>                 struct dentry *parent_dentry;
> +               const struct landlock_rule *rule;
> +
> +               /*
> +                * If at least all accesses allowed on the destination are
> +                * already allowed on the source, respectively if there is at
> +                * least as much as restrictions on the destination than on the
> +                * source, then we can safely refer files from the source to
> +                * the destination without risking a privilege escalation.
> +                * This is crucial for standalone multilayered security
> +                * policies.  Furthermore, this helps avoid policy writers to
> +                * shoot themselves in the foot.
> +                */
> +               if (is_dom_check && is_superset(child_is_directory,
> +                                       layer_masks_dst_parent,
> +                                       layer_masks_src_parent,
> +                                       layer_masks_child)) {
> +                       allowed_dst_parent =
> +                               scope_to_request(access_request_dst_parent,
> +                                               layer_masks_dst_parent);
> +                       allowed_src_parent =
> +                               scope_to_request(access_request_src_parent,
> +                                               layer_masks_src_parent);
> +
> +                       /* Stops when all accesses are granted. */
> +                       if (allowed_dst_parent && allowed_src_parent)
> +                               break;
> +
> +                       /*
> +                        * Downgrades checks from domain handled accesses to
> +                        * requested accesses.
> +                        */
> +                       is_dom_check = false;
> +                       access_masked_dst_parent = access_request_dst_parent;
> +                       access_masked_src_parent = access_request_src_parent;
> +               }
> +
> +               rule = find_rule(domain, walker_path.dentry);
> +               allowed_dst_parent = unmask_layers(rule, access_masked_dst_parent,
> +                               layer_masks_dst_parent);
> +               allowed_src_parent = unmask_layers(rule, access_masked_src_parent,
> +                               layer_masks_src_parent);
>
> -               allowed = unmask_layers(find_rule(domain, walker_path.dentry),
> -                               access_request, &layer_masks);
> -               if (allowed)
> -                       /* Stops when a rule from each layer grants access. */
> +               /* Stops when a rule from each layer grants access. */
> +               if (allowed_dst_parent && allowed_src_parent)
>                         break;

If "(allowed_dst_parent && allowed_src_parent)" is true, you break out
of the while loop only to do a path_put(), check the two booleans once
more, and then return zero, yes?  Why not just do the path_put() and
return zero here?


--
paul-moore.com



More information about the Linux-security-module-archive mailing list