Commit 13c164b1a186 - regression for LSMs/SELinux?

Fri Sep 25 13:37:56 UTC 2020

On Thu, Sep 24, 2020 at 4:16 PM Stephen Smalley
<stephen.smalley.work at gmail.com> wrote:
> On Thu, Sep 24, 2020 at 4:36 AM Ondrej Mosnacek <omosnace at redhat.com> wrote:
> >
> > On Wed, Sep 23, 2020 at 3:55 AM Ian Kent <raven at themaw.net> wrote:
> > > On Tue, 2020-09-22 at 09:33 +0200, Ondrej Mosnacek wrote:
> > > > On Mon, Sep 21, 2020 at 6:30 PM Al Viro <viro at zeniv.linux.org.uk>
> > > > wrote:
> > > > > On Mon, Sep 21, 2020 at 06:09:22PM +0200, Christoph Hellwig wrote:
> > > > > > [adding Linus and Al]
> > > > > >
> > > > > > On Mon, Sep 21, 2020 at 04:51:35PM +0200, Ondrej Mosnacek wrote:
> > > > > > > Hi folks,
> > > > > > >
> > > > > > > It seems that after commit 13c164b1a186 ("autofs: switch to
> > > > > > > kernel_write") there is now an extra LSM permission required
> > > > > > > (for the
> > > > > > > current task to write to the automount pipe) for processes
> > > > > > > accessing
> > > > > > > some yet-to-to-be mounted directory on which an autofs mount is
> > > > > > > set
> > > > > > > up. The call chain is:
> > > > > > > [...]
> > > > > > > autofs_wait() ->
> > > > > > > autofs_notify_daemon() ->
> > > > > > > autofs_write() ->
> > > > > > > kernel_write() ->
> > > > > > > rw_verify_area() ->
> > > > > > > security_file_permission()
> > > > > > >
> > > > > > > The bug report that led me to this commit is at [1].
> > > > > > >
> > > > > > > Technically, this is a regression for LSM users, since this is
> > > > > > > a
> > > > > > > kernel-internal operation and an LSM permission for the current
> > > > > > > task
> > > > > > > shouldn't be required. Can this patch be reverted? Perhaps
> > > > > > > __kernel_{read|write}() could instead be renamed to
> > > > > > > kernel_*_nocheck()
> > > > > > > so that the name is more descriptive?
> > > > > >
> > > > > > So we obviously should not break existing user space and need to
> > > > > > fix
> > > > > > this ASAP.  The trivial "fix" would be to export __kernel_write
> > > > > > again
> > > > > > and switch autofs to use it.  The other option would be a FMODE
> > > > > > flag
> > > > > > to bypass security checks, only to be set if the callers ensures
> > > > > > they've been valided (i.e. in autofs_prepare_pipe).
> > > >
> > > > IMHO that sounds like an overkill in this scenario. I don't think it
> > > > makes sense to do the LSM check here (or at least not against the
> > > > current task's creds), because it is not the current task that wants
> > > > to communicate with the daemon, it just wants to to access some
> > > > directory on the system that just happens to be special to the
> > > > kernel,
> > > > which needs to do some communication on the side to service this
> > > > request. So if we do want to do any LSM check here, there should at
> > > > least be some "bool internal" flag passed to the LSM, signalizing
> > > > that
> > > > this is an internal read/write operation that wasn't directly
> > > > initiated/requested by the current process. SELinux could then either
> > > > use the kernel secid instead of the current task's secid or skip the
> > > > check completely in such case.
> > >
> > > Perhaps, but see below.
> > >
> > > >
> > > > I'd like Stephen to weigh in on this, but it looks he might be on
> > > > vacation right now...
> > > >
> > > > > > Any opinions?
> > > > >
> > > > > Reexport for now.  Incidentally, what is LSM doing rejecting writes
> > > > > into a pipe?
> > > >
> > > > With SELinux at least, what is allowed or denied is defined in the
> > > > policy. And the policy usually defaults to everything denied and then
> > > > you add rules to allow what needs (and makes sense) to be allowed.
> > > > Since until kernel 5.8 random processes didn't need to write to pipes
> > > > created by the automount daemon, it has never been explicitly allowed
> > > > and so the automounting now fails. It is in no way obvious that all
> > > > processes should have the permission to talk to the automount daemon
> > > > just to traverse the filesystem...
> > >
> > > I think you might have misunderstood what lead to this, just a bit.
> > >
> > > Previously the __kern_write() function was used for this communication
> > > and Christoph's patch changed that to use kern_write() instead.
> > >
> > > In theory that's a good idea because kern_write() adds some additional
> > > sanity checks, one being a call to rw_verify_area() which is where the
> > > security_file_permission() call fails.
> > >
> > > So previously any random process could avoid these checks by calling
> > > __kern_write() so the change to kern_write() is, in theory, that's a
> > > good thing and simply reverting that hunk in Christoph's patch
> > > probably isn't the best thing to do.
> >
> > I understand that and I'm not proposing the revert as a long-term fix.
> > For a long-term solution I propose using kernel_write() and extending
> > it to allow the caller to suppress (just) the
> > security_file_permission() call. Then each caller would have to decide
> > whether the LSM check makes sense in that situation or not. It seems
> > safer against future mistakes than leaving it up to the caller to call
> > security_file_permission() explicitly (I predict that no new user
> > would even realize that the call might be needed).
> >
> > >
> > > But any random process does need to be able to write to the automount
> > > daemon pipe for trailing path components and the root dentry of autofs
> > > mounts, depending on case.
> > >
> > > So it's true that any write to any autofs dentry probably doesn't
> > > need to be allowed but I question what that gets us in terms of
> > > security improvement over allowing pipe writes for automount_t
> > > labelled pipes in selinux policy since they must be within an autofs
> > > mounted file system.
> >
> > The difference is not in security, but in usability. The kernel
> > communicating with the autofs daemon is an internal detail that
> > shouldn't need special rules in policy. Even if we want to do any LSM
> > checking here, the subject should be kernel_t, not the current running
> > process. The process doesn't have any control on whether the kernel
> > does the communication and it doesn't control the content of the
> > communication, so the permission check as it is doesn't make any
> > sense. People writing the policy should be burdened by low-level
> > details about how the kernel works internally as little as possible.
> >
> > >
> > > But Stephen has a different recommendation (and that appears to
> > > consider the cause I outlined above) so I'll wait to see what others
> > > think about the recommendations.
> >
> > As I said above, I think Stephen's approach is less future-proof. And
> > it seems that rw_verify_area() has many other callers, most/all of
> > which probably service actual requests from userspace and we'd need to
> > retain the security_file_permission() call in those cases.
> >
> > Let me try to put my proposal into a patch, so we have something
> > concrete to talk about...
>
> Up-thread I thought Linus indicated he didn't really want a flag to
> disable pemission checking due to potential abuse (and I agree).

IIUC he was against adding an FMODE flag, while I was rather
suggesting a new function parameter (I realize it probably wasn't
clear from what I wrote). The latter is less prone to accidents and
any new user would have to set that parameter to *something* and so
would be forced to think about whether he needs an LSM check or not.
Whereas with a separate _nosec()/nolsmcheck()/_whatever() function
they would likely just use the kernel_write() function they already
know and may not even realize that there is some decision to be made.

But that's just my opinion, I would still be happier with a separate
function than reverting to __kernel_write().

> Historically we have taken one of two approaches for these situations:
> 1) Provide a separate interface like kernel_write() for use when we
> don't want permission checking and don't have it call the security
> hook at all.  If you prefer kernel_write_nosec() that's fine but I
> think that's somewhat implicit in the fact that it is a
> kernel-initiated write, not a userspace write (which would hopefully
> go through vfs_write() or similar and end up calling the hook).

Unfortunately not all of the kernel_write() calls seem to be of the
kind that we want to skip the permission checking. One in particular
that caught my attention is this btrfs one:
https://elixir.bootlin.com/linux/v5.9-rc6/source/fs/btrfs/send.c#L561

Basically a userspace process passes some fd via ioctl and kernel then
writes something in it. If we don't do any check here (and BTW we in
fact want the current's creds to apply here), then the process could
pass any open fd and bypass any LSM write permissions to the file
behind that fd. Possibly there are some checks that restrict what can
be passed there (I didn't check), but it would still be fragile to
rely on them.

So I guess we can't just skip the check whenever the kernel_*()
function is called.

> 2) Temporarily override creds to the init_cred or the result of
> prepare_kernel_creds() before calling any credential-checking
> functions and then revert creds afterward.

Most users of kernel_write() seem to also create/open the file
themselves, usually using the current's creds, so this would break
consistency when applied universally. Autofs (and also
bpfilter_send_req()) are special in that they first get the fd from a
userspace process and then write into it from other tasks' syscalls.
Then there are one or two users that create the file using
shmem_kernel_file_setup(), which does use prepare_kernel_creds(), but
the resulting file is marked S_PRIVATE, so all LSM checks are already
skipped for it anyway.

>
> The problem with #2 is that it still requires that the policy allow
> kernel_t (or its equivalent) to be able to write to these pipes, which
> wasn't previously necessary and thus might not be allowed in all
> policies (e.g. Android?).  #1 avoids any need for policy for these
> operations.

Yeah, I'm starting to think checking "kernel creds -> userspace
object" access vectors on purpose doesn't make much sense... you would
pretty much want to always allow that (no point in kernel blocking its
own operations) so just skipping the check in those cases where using
the current's creds doesn't make sense seems to be a better choice.

BTW, I don't have the candidate patch written yet, as I haven't had
time and also need to process new information after having reviewed
the existing callers.

-- 
Ondrej Mosnacek
Software Engineer, Platform Security - SELinux kernel
Red Hat, Inc.