[PATCH v1 06/11] landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER
Mickaël Salaün
mic at digikod.net
Thu Mar 17 12:04:29 UTC 2022
On 17/03/2022 02:26, Paul Moore wrote:
> On Mon, Feb 21, 2022 at 4:15 PM Mickaël Salaün <mic at digikod.net> wrote:
>>
>> From: Mickaël Salaün <mic at linux.microsoft.com>
>>
>> Add a new LANDLOCK_ACCESS_FS_REFER access right to enable policy writers
>> to allow sandboxed processes to link and rename files from and to a
>> specific set of file hierarchies. This access right should be composed
>> with LANDLOCK_ACCESS_FS_MAKE_* for the destination of a link or rename,
>> and with LANDLOCK_ACCESS_FS_REMOVE_* for a source of a rename. This
>> lift a Landlock limitation that always denied changing the parent of an
>> inode.
>>
>> Renaming or linking to the same directory is still always allowed,
>> whatever LANDLOCK_ACCESS_FS_REFER is used or not, because it is not
>> considered a threat to user data.
>>
>> However, creating multiple links or renaming to a different parent
>> directory may lead to privilege escalations if not handled properly.
>> Indeed, we must be sure that the source doesn't gain more privileges by
>> being accessible from the destination. This is handled by making sure
>> that the source hierarchy (including the referenced file or directory
>> itself) restricts at least as much the destination hierarchy. If it is
>> not the case, an EXDEV error is returned, making it potentially possible
>> for user space to copy the file hierarchy instead of moving or linking
>> it.
>>
>> Instead of creating different access rights for the source and the
>> destination, we choose to make it simple and consistent for users.
>> Indeed, considering the previous constraint, it would be weird to
>> require such destination access right to be also granted to the source
>> (to make it a superset).
>>
>> See the provided documentation for additional details.
>>
>> New tests are provided with a following commit.
>>
>> Signed-off-by: Mickaël Salaün <mic at linux.microsoft.com>
>> Link: https://lore.kernel.org/r/20220221212522.320243-7-mic@digikod.net
>> ---
>> include/uapi/linux/landlock.h | 27 +-
>> security/landlock/fs.c | 550 ++++++++++++++++---
>> security/landlock/limits.h | 2 +-
>> security/landlock/syscalls.c | 2 +-
>> tools/testing/selftests/landlock/base_test.c | 2 +-
>> tools/testing/selftests/landlock/fs_test.c | 3 +-
>> 6 files changed, 516 insertions(+), 70 deletions(-)
>
> ...
>
>> diff --git a/security/landlock/fs.c b/security/landlock/fs.c
>> index 3886f9ad1a60..c7c7ce4e7cd5 100644
>> --- a/security/landlock/fs.c
>> +++ b/security/landlock/fs.c
>> @@ -4,6 +4,7 @@
>> *
>> * Copyright © 2016-2020 Mickaël Salaün <mic at digikod.net>
>> * Copyright © 2018-2020 ANSSI
>> + * Copyright © 2021-2022 Microsoft Corporation
>> */
>>
>> #include <linux/atomic.h>
>> @@ -269,16 +270,188 @@ static inline bool is_nouser_or_private(const struct dentry *dentry)
>> unlikely(IS_PRIVATE(d_backing_inode(dentry))));
>> }
>>
>> -static int check_access_path(const struct landlock_ruleset *const domain,
>> - const struct path *const path,
>> +static inline access_mask_t get_handled_accesses(
>> + const struct landlock_ruleset *const domain)
>> +{
>> + access_mask_t access_dom = 0;
>> + unsigned long access_bit;
>
> Would it be better to declare @access_bit as an access_mask_t type?
> You're not using any macros like for_each_set_bit() in this function
> so I believe it should be safe.
Right, I'll change that.
>
>> + for (access_bit = 0; access_bit < LANDLOCK_NUM_ACCESS_FS;
>> + access_bit++) {
>> + size_t layer_level;
>
> Considering the number of layers has dropped down to 16, it seems like
> a normal unsigned int might be big enough for @layer_level :)
We could switch to u8, but I prefer to stick to size_t for array indexes
which enable to reduce the cognitive workload related to the size of
such array. ;) I guess there is enough info for compilers to optimize
such code anyway.
>
>> + for (layer_level = 0; layer_level < domain->num_layers;
>> + layer_level++) {
>> + if (domain->fs_access_masks[layer_level] &
>> + BIT_ULL(access_bit)) {
>> + access_dom |= BIT_ULL(access_bit);
>> + break;
>> + }
>> + }
>> + }
>> + return access_dom;
>> +}
>> +
>> +static inline access_mask_t init_layer_masks(
>> + const struct landlock_ruleset *const domain,
>> + const access_mask_t access_request,
>> + layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
>> +{
>> + access_mask_t handled_accesses = 0;
>> + size_t layer_level;
>> +
>> + memset(layer_masks, 0, sizeof(*layer_masks));
>> + if (WARN_ON_ONCE(!access_request))
>> + return 0;
>> +
>> + /* Saves all handled accesses per layer. */
>> + for (layer_level = 0; layer_level < domain->num_layers;
>> + layer_level++) {
>> + const unsigned long access_req = access_request;
>> + unsigned long access_bit;
>> +
>> + for_each_set_bit(access_bit, &access_req,
>> + ARRAY_SIZE(*layer_masks)) {
>> + if (domain->fs_access_masks[layer_level] &
>> + BIT_ULL(access_bit)) {
>> + (*layer_masks)[access_bit] |=
>> + BIT_ULL(layer_level);
>> + handled_accesses |= BIT_ULL(access_bit);
>> + }
>> + }
>> + }
>> + return handled_accesses;
>> +}
>> +
>> +/*
>> + * Check that a destination file hierarchy has more restrictions than a source
>> + * file hierarchy. This is only used for link and rename actions.
>> + */
>> +static inline bool is_superset(bool child_is_directory,
>> + const layer_mask_t (*const
>> + layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
>> + const layer_mask_t (*const
>> + layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
>> + const layer_mask_t (*const
>> + layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
>> +{
>> + unsigned long access_bit;
>> +
>> + for (access_bit = 0; access_bit < ARRAY_SIZE(*layer_masks_dst_parent);
>> + access_bit++) {
>> + /* Ignores accesses that only make sense for directories. */
>> + if (!child_is_directory && !(BIT_ULL(access_bit) & ACCESS_FILE))
>> + continue;
>> +
>> + /*
>> + * Checks if the destination restrictions are a superset of the
>> + * source ones (i.e. inherited access rights without child
>> + * exceptions).
>> + */
>> + if ((((*layer_masks_src_parent)[access_bit] & (*layer_masks_child)[access_bit]) |
>> + (*layer_masks_dst_parent)[access_bit]) !=
>> + (*layer_masks_dst_parent)[access_bit])
>> + return false;
>> + }
>> + return true;
>> +}
>> +
>> +/*
>> + * Removes @layer_masks accesses that are not requested.
>> + *
>> + * Returns true if the request is allowed, false otherwise.
>> + */
>> +static inline bool scope_to_request(const access_mask_t access_request,
>> + layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
>> +{
>> + const unsigned long access_req = access_request;
>> + unsigned long access_bit;
>> +
>> + if (WARN_ON_ONCE(!layer_masks))
>> + return true;
>> +
>> + for_each_clear_bit(access_bit, &access_req, ARRAY_SIZE(*layer_masks))
>> + (*layer_masks)[access_bit] = 0;
>> + return !memchr_inv(layer_masks, 0, sizeof(*layer_masks));
>> +}
>> +
>> +/*
>> + * Returns true if there is at least one access right different than
>> + * LANDLOCK_ACCESS_FS_REFER.
>> + */
>> +static inline bool is_eacces(
>> + const layer_mask_t (*const
>> + layer_masks)[LANDLOCK_NUM_ACCESS_FS],
>> const access_mask_t access_request)
>> {
>
> Granted, I don't have as deep of an understanding of Landlock as you
> do, but the function name "is_eacces" seems a little odd given the
> nature of the function. Perhaps "is_fsrefer"?
Hmm, this helper does multiple things which are necessary to know if we
need to return -EACCES or -EXDEV. Renaming it to is_fsrefer() would
require to inverse the logic and use boolean negations in the callers
(because of ordering). Renaming to something like without_fs_refer()
would not be completely correct because we also check if there is no
layer_masks, which indicated that it doesn't contain an access right
that should return -EACCES. This helper is named as such because the
underlying semantic is to check for such error code, which is a tricky.
I can rename it co contains_eacces() or something, but a longer name
would require to cut the caller lines to fit 80 columns. :|
>
>> - layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
>> - bool allowed = false, has_access = false;
>> + unsigned long access_bit;
>> + /* LANDLOCK_ACCESS_FS_REFER alone must return -EXDEV. */
>> + const unsigned long access_check = access_request &
>> + ~LANDLOCK_ACCESS_FS_REFER;
>> +
>> + if (!layer_masks)
>> + return false;
>> +
>> + for_each_set_bit(access_bit, &access_check, ARRAY_SIZE(*layer_masks)) {
>> + if ((*layer_masks)[access_bit])
>> + return true;
>> + }
>
> Is calling for_each_set_bit() overkill here? @access_check should
> only ever have at most one bit set (LANDLOCK_ACCESS_FS_REFER), yes?
No, it is the contrary, the bitmask is inverted and this loop check for
non-FS_REFER access rights that should then return -EACCES. For
instance, if a sandbox handles (and then restricts) MAKE_REG and REFER,
a request to link a regular file would contains both of these bits, and
the kernel should return -EACCES if MAKE_REG is not granted or -EXDEV if
the request is only denied because of REFER. The reparent_* tests check
the consistency of this behavior (with the exception of a
RENAME_EXCHANGE case, see [1]).
[1] https://lore.kernel.org/r/20220222175332.384545-1-mic@digikod.net
>
>> + return false;
>> +}
>> +
>> +/**
>> + * check_access_path_dual - Check a source and a destination accesses
>> + *
>> + * @domain: Domain to check against.
>> + * @path: File hierarchy to walk through.
>> + * @child_is_directory: Must be set to true if the (original) leaf is a
>> + * directory, false otherwise.
>> + * @access_request_dst_parent: Accesses to check, once @layer_masks_dst_parent
>> + * is equal to @layer_masks_src_parent (if any).
>> + * @layer_masks_dst_parent: Pointer to a matrix of layer masks per access
>> + * masks, identifying the layers that forbid a specific access. Bits from
>> + * this matrix can be unset according to the @path walk. An empty matrix
>> + * means that @domain allows all possible Landlock accesses (i.e. not only
>> + * those identified by @access_request_dst_parent). This matrix can
>> + * initially refer to domain layer masks and, when the accesses for the
>> + * destination and source are the same, to request layer masks.
>> + * @access_request_src_parent: Similar to @access_request_dst_parent but for an
>> + * initial source path request. Only taken into account if
>> + * @layer_masks_src_parent is not NULL.
>> + * @layer_masks_src_parent: Similar to @layer_masks_dst_parent but for an
>> + * initial source path walk. This can be NULL if only dealing with a
>> + * destination access request (i.e. not a rename nor a link action).
>> + * @layer_masks_child: Similar to @layer_masks_src_parent but only for the
>> + * linked or renamed inode (without hierarchy). This is only used if
>> + * @layer_masks_src_parent is not NULL.
>> + *
>> + * This helper first checks that the destination has a superset of restrictions
>> + * compared to the source (if any) for a common path. It then checks that the
>> + * collected accesses and the remaining ones are enough to allow the request.
>> + *
>> + * Returns:
>> + * - 0 if the access request is granted;
>> + * - -EACCES if it is denied because of access right other than
>> + * LANDLOCK_ACCESS_FS_REFER;
>> + * - -EXDEV if the renaming or linking would be a privileged escalation
>> + * (according to each layered policies), or if LANDLOCK_ACCESS_FS_REFER is
>> + * not allowed by the source or the destination.
>> + */
>> +static int check_access_path_dual(const struct landlock_ruleset *const domain,
>> + const struct path *const path,
>> + bool child_is_directory,
>> + const access_mask_t access_request_dst_parent,
>> + layer_mask_t (*const
>> + layer_masks_dst_parent)[LANDLOCK_NUM_ACCESS_FS],
>> + const access_mask_t access_request_src_parent,
>> + layer_mask_t (*layer_masks_src_parent)[LANDLOCK_NUM_ACCESS_FS],
>> + layer_mask_t (*layer_masks_child)[LANDLOCK_NUM_ACCESS_FS])
>> +{
>> + bool allowed_dst_parent = false, allowed_src_parent = false, is_dom_check;
>> struct path walker_path;
>> - size_t i;
>> + access_mask_t access_masked_dst_parent, access_masked_src_parent;
>>
>> - if (!access_request)
>> + if (!access_request_dst_parent && !access_request_src_parent)
>> return 0;
>> if (WARN_ON_ONCE(!domain || !path))
>> return 0;
>> @@ -287,22 +460,20 @@ static int check_access_path(const struct landlock_ruleset *const domain,
>> if (WARN_ON_ONCE(domain->num_layers < 1))
>> return -EACCES;
>>
>> - /* Saves all layers handling a subset of requested accesses. */
>> - for (i = 0; i < domain->num_layers; i++) {
>> - const unsigned long access_req = access_request;
>> - unsigned long access_bit;
>> -
>> - for_each_set_bit(access_bit, &access_req,
>> - ARRAY_SIZE(layer_masks)) {
>> - if (domain->fs_access_masks[i] & BIT_ULL(access_bit)) {
>> - layer_masks[access_bit] |= BIT_ULL(i);
>> - has_access = true;
>> - }
>> - }
>> + BUILD_BUG_ON(!layer_masks_dst_parent);
>
> I know the kbuild robot already flagged this, but checking function
> parameters with BUILD_BUG_ON() does seem a bit ... unusual :)
Yeah, I like such guarantee but it may not work without __always_inline.
I moved this check in the previous WARN_ON_ONCE().
>
>> + if (layer_masks_src_parent) {
>> + if (WARN_ON_ONCE(!layer_masks_child))
>> + return -EACCES;
>> + access_masked_dst_parent = access_masked_src_parent =
>> + get_handled_accesses(domain);
>> + is_dom_check = true;
>> + } else {
>> + if (WARN_ON_ONCE(layer_masks_child))
>> + return -EACCES;
>> + access_masked_dst_parent = access_request_dst_parent;
>> + access_masked_src_parent = access_request_src_parent;
>> + is_dom_check = false;
>> }
>> - /* An access request not handled by the domain is allowed. */
>> - if (!has_access)
>> - return 0;
>>
>> walker_path = *path;
>> path_get(&walker_path);
>> @@ -312,11 +483,50 @@ static int check_access_path(const struct landlock_ruleset *const domain,
>> */
>> while (true) {
>> struct dentry *parent_dentry;
>> + const struct landlock_rule *rule;
>> +
>> + /*
>> + * If at least all accesses allowed on the destination are
>> + * already allowed on the source, respectively if there is at
>> + * least as much as restrictions on the destination than on the
>> + * source, then we can safely refer files from the source to
>> + * the destination without risking a privilege escalation.
>> + * This is crucial for standalone multilayered security
>> + * policies. Furthermore, this helps avoid policy writers to
>> + * shoot themselves in the foot.
>> + */
>> + if (is_dom_check && is_superset(child_is_directory,
>> + layer_masks_dst_parent,
>> + layer_masks_src_parent,
>> + layer_masks_child)) {
>> + allowed_dst_parent =
>> + scope_to_request(access_request_dst_parent,
>> + layer_masks_dst_parent);
>> + allowed_src_parent =
>> + scope_to_request(access_request_src_parent,
>> + layer_masks_src_parent);
>> +
>> + /* Stops when all accesses are granted. */
>> + if (allowed_dst_parent && allowed_src_parent)
>> + break;
>> +
>> + /*
>> + * Downgrades checks from domain handled accesses to
>> + * requested accesses.
>> + */
>> + is_dom_check = false;
>> + access_masked_dst_parent = access_request_dst_parent;
>> + access_masked_src_parent = access_request_src_parent;
>> + }
>> +
>> + rule = find_rule(domain, walker_path.dentry);
>> + allowed_dst_parent = unmask_layers(rule, access_masked_dst_parent,
>> + layer_masks_dst_parent);
>> + allowed_src_parent = unmask_layers(rule, access_masked_src_parent,
>> + layer_masks_src_parent);
>>
>> - allowed = unmask_layers(find_rule(domain, walker_path.dentry),
>> - access_request, &layer_masks);
>> - if (allowed)
>> - /* Stops when a rule from each layer grants access. */
>> + /* Stops when a rule from each layer grants access. */
>> + if (allowed_dst_parent && allowed_src_parent)
>> break;
>
> If "(allowed_dst_parent && allowed_src_parent)" is true, you break out
> of the while loop only to do a path_put(), check the two booleans once
> more, and then return zero, yes? Why not just do the path_put() and
> return zero here?
Correct, that would work, but I prefer not to duplicate the logic of
granting access if it doesn't make the code more complex, which I think
is not the case here, and I'm reluctant to duplicate path_get/put()
calls. This loop break is a small optimization to avoid walking the path
one more step, and writing it this way looks cleaner and less
error-prone from my point of view.
More information about the Linux-security-module-archive
mailing list