[PATCH v1 bpf-next 0/5] af_unix: Allow BPF LSM to scrub SCM_RIGHTS at sendmsg().

Tue May 6 16:08:23 UTC 2025

On Tue, 6 May 2025 at 11:15, Christian Brauner <brauner at kernel.org> wrote:
>
> On Tue, May 06, 2025 at 12:49:11AM +0200, Kumar Kartikeya Dwivedi wrote:
> > On Mon, 5 May 2025 at 23:58, Kuniyuki Iwashima <kuniyu at amazon.com> wrote:
> > >
> > > As long as recvmsg() or recvmmsg() is used with cmsg, it is not
> > > possible to avoid receiving file descriptors via SCM_RIGHTS.
> > >
> > > This behaviour has occasionally been flagged as problematic.
> > >
> > > For instance, as noted on the uAPI Group page [0], an untrusted peer
> > > could send a file descriptor pointing to a hung NFS mount and then
> > > close it.  Once the receiver calls recvmsg() with msg_control, the
> > > descriptor is automatically installed, and then the responsibility
> > > for the final close() now falls on the receiver, which may result
> > > in blocking the process for a long time.
> > >
> > > systemd calls cmsg_close_all() [1] after each recvmsg() to close()
> > > unwanted file descriptors sent via SCM_RIGHTS.
> > >
> > > However, this cannot work around the issue because the last fput()
> > > could occur on the receiver side once sendmsg() with SCM_RIGHTS
> > > succeeds.  Also, even filtering by LSM at recvmsg() does not work
> > > for the same reason.
> > >
> > > Thus, we need a better way to filter SCM_RIGHTS on the sender side.
> > >
> > > This series allows BPF LSM to inspect skb at sendmsg() and scrub
> > > SCM_RIGHTS fds by kfunc.
> > >
> > > Link: https://uapi-group.org/kernel-features/#disabling-reception-of-scm_rights-for-af_unix-sockets #[0]
> > > Link: https://github.com/systemd/systemd/blob/v257.5/src/basic/fd-util.c#L612-L628 #[1]
> > >
> >
> > This sounds pretty useful!
> >
> > I think you should mention the cases of possible DoS on close() or
> > flooding, e.g. with FUSE controlled fd/NFS hangs in the commit log
> > itself.
> > I think it's been an open problem for a while now with no good solution.
> > Currently systemd's FDSTORE=1 for PID 1 is susceptible to the same
> > problem, even if the underlying service isn't root.
> >
> > I think it is also useful for restricting what individual file
> > descriptors can be passed around by a process.
> > Say restricting usage of an fd to a process and its children, but not
> > allowing it to be shared with others.
> > Send side hook is the right point to enforce it.
> >
> > Therefore exercising scm_fp_list would be a good idea.
>
> No, that's a terrible idea. If the receiver expects 10 file descriptors
> and suddenly some magically disappear or the order gets messed up that's
> terrible for security. It's either close all or nothing.

I was talking about exercising/reading it in the selftest, not
exposing anything new.

Yes, the policy should be close all or nothing, but it can still be
used to deny sendmsg when one of the descriptors being passed isn't in
the allowed set.
You just return 0 or an error. No need to scrub, no need to disappear
some fds and let the message pass, which can be problematic.

>
> > We should provide some more examples of the filtering policy in the selftests.
> > Maybe a simple example, e.g. only memfd or a pipe fd can be passed,
> > and nothing else.
> > It would require checking file->f_ops.
>
> There's not going to be poking around in file->f_ops for this.

I don't think any poking is required. There's no need to expose anything extra.

Really, all that is needed is for an LSM hook to exist and the program
to say success or failure.
Even the scrub fds stuff can be dropped.

The program can simply inspect the scm_fp_list and if it doesn't look
ok, deny the sendmsg.
It's already there inside unix_skb_parms.

It just means the program can look at the file (there's no helper
needed to be exposed) and make a decision, just like in the rest of
the BPF LSM hooks.
I think a socket option makes sense too, but ideally we can have both
the hook and the socket option.

The socket option has the advantage that user space can set it itself
conveniently, without having to load a BPF program.
Meanwhile the hook can be more fine grained in decision making and be
imposed by some central entity.

Does this sound reasonable? I don't think it requires anything beyond
simply defining the hook and letting a program run there.
No poking into VFS internals etc. or silently dropping file
descriptors and letting it succeed.

So mostly patch 1-2 and then another to add a setsockopt flag.

>
> [...]