[PATCH v24 07/12] landlock: Support filesystem access-control
Jann Horn
jannh at google.com
Mon Nov 23 21:19:32 UTC 2020
On Mon, Nov 23, 2020 at 10:16 PM Mickaël Salaün <mic at digikod.net> wrote:
> On 23/11/2020 20:44, Jann Horn wrote:
> > On Sat, Nov 21, 2020 at 11:06 AM Mickaël Salaün <mic at digikod.net> wrote:
> >> On 21/11/2020 08:00, Jann Horn wrote:
> >>> On Thu, Nov 12, 2020 at 9:52 PM Mickaël Salaün <mic at digikod.net> wrote:
> >>>> Thanks to the Landlock objects and ruleset, it is possible to identify
> >>>> inodes according to a process's domain. To enable an unprivileged
> >>>> process to express a file hierarchy, it first needs to open a directory
> >>>> (or a file) and pass this file descriptor to the kernel through
> >>>> landlock_add_rule(2). When checking if a file access request is
> >>>> allowed, we walk from the requested dentry to the real root, following
> >>>> the different mount layers. The access to each "tagged" inodes are
> >>>> collected according to their rule layer level, and ANDed to create
> >>>> access to the requested file hierarchy. This makes possible to identify
> >>>> a lot of files without tagging every inodes nor modifying the
> >>>> filesystem, while still following the view and understanding the user
> >>>> has from the filesystem.
> >>>>
> >>>> Add a new ARCH_EPHEMERAL_INODES for UML because it currently does not
> >>>> keep the same struct inodes for the same inodes whereas these inodes are
> >>>> in use.
> >>>>
> >>>> This commit adds a minimal set of supported filesystem access-control
> >>>> which doesn't enable to restrict all file-related actions. This is the
> >>>> result of multiple discussions to minimize the code of Landlock to ease
> >>>> review. Thanks to the Landlock design, extending this access-control
> >>>> without breaking user space will not be a problem. Moreover, seccomp
> >>>> filters can be used to restrict the use of syscall families which may
> >>>> not be currently handled by Landlock.
> >>>>
> >>>> Cc: Al Viro <viro at zeniv.linux.org.uk>
> >>>> Cc: Anton Ivanov <anton.ivanov at cambridgegreys.com>
> >>>> Cc: James Morris <jmorris at namei.org>
> >>>> Cc: Jann Horn <jannh at google.com>
> >>>> Cc: Jeff Dike <jdike at addtoit.com>
> >>>> Cc: Kees Cook <keescook at chromium.org>
> >>>> Cc: Richard Weinberger <richard at nod.at>
> >>>> Cc: Serge E. Hallyn <serge at hallyn.com>
> >>>> Signed-off-by: Mickaël Salaün <mic at linux.microsoft.com>
> >>>> ---
> >>>>
> >>>> Changes since v23:
> >>>> * Enforce deterministic interleaved path rules. To have consistent
> >>>> layered rules, granting access to a path implies that all accesses
> >>>> tied to inodes, from the requested file to the real root, must be
> >>>> checked. Otherwise, stacked rules may result to overzealous
> >>>> restrictions. By excluding the ability to add exceptions in the same
> >>>> layer (e.g. /a allowed, /a/b denied, and /a/b/c allowed), we get
> >>>> deterministic interleaved path rules. This removes an optimization
> >>>
> >>> I don't understand the "deterministic interleaved path rules" part.
> >>
> >> I explain bellow.
> >>
> >>>
> >>>
> >>> What if I have a policy like this?
> >>>
> >>> /home/user READ
> >>> /home/user/Downloads READ+WRITE
> >>>
> >>> That's a reasonable policy, right?
> >>
> >> Definitely, I forgot this, thanks for the outside perspective!
> >>
> >>>
> >>> If I then try to open /home/user/Downloads/foo in WRITE mode, the loop
> >>> will first check against the READ+WRITE rule for /home/user, that
> >>> check will pass, and then it will check against the READ rule for /,
> >>> which will deny the access, right? That seems bad.
> >>
> >> Yes that was the intent.
> >>
> >>>
> >>>
> >>> The v22 code ensured that for each layer, the most specific rule (the
> >>> first we encounter on the walk) always wins, right? What's the problem
> >>> with that?
> >>
> >> This can be explained with the interleaved_masked_accesses test:
> >> https://github.com/landlock-lsm/linux/blob/landlock-v24/tools/testing/selftests/landlock/fs_test.c#L647
> >>
> >> In this case there is 4 stacked layers:
> >> layer 1: allows s1d1/s1d2/s1d3/file1
> >> layer 2: allows s1d1/s1d2/s1d3
> >> denies s1d1/s1d2
> >> layer 3: allows s1d1
> >> layer 4: allows s1d1/s1d2
> >>
> >> In the v23, access to file1 would be allowed until layer 3, but layer 4
> >> would merge a new rule for the s1d2 inode. Because we don't record where
> >> exactly the access come from, we can't tell that layer 2 allowed access
> >> thanks to s1d3 and that its s1d2 rule was ignored. I think this behavior
> >> doesn't make sense from the user point of view.
> >
> > Aah, I think I'm starting to understand the issue now. Basically, with
> > the current UAPI, the semantics have to be "an access is permitted if,
> > for each policy layer, at least one rule encountered on the pathwalk
> > permits the access; rules that deny the access are irrelevant". And if
> > it turns out that someone needs to be able to deny access to specific
> > inodes, we'll have to extend struct landlock_path_beneath_attr.
>
> Right, I'll add this to the documentation (aligned with the new
> implementation).
>
> >
> > That reminds me... if we do need to make such a change in the future,
> > it would be easier in terms of UAPI compatibility if
> > landlock_add_rule() used copy_struct_from_user(), which is designed to
> > create backwards and forwards compatibility with other version of UAPI
> > headers. So adding that now might save us some headaches later.
>
> I used copy_struct_from_user() before v21, but Arnd wasn't a fan of
> having type and size arguments, so we simplified the UAPI in the v21 by
> removing the size argument. The type argument is enough to extend the
> structure, but indeed, we lose the forward compatibility. Relying on one
> syscall per rule type seems too much, though.
You have a point there, I guess having a type argument is enough. (And
if userspace tries to load a ruleset with "deny" rules that isn't
supported by the current kernel, userspace will have to deal with that
in some way anyway.)
So thinking about it more, I guess the current version is probably
actually fine, too.
More information about the Linux-security-module-archive
mailing list