[PATCH v1 bpf-next 0/5] af_unix: Allow BPF LSM to scrub SCM_RIGHTS at sendmsg().

Tue May 6 18:14:07 UTC 2025

From: Kumar Kartikeya Dwivedi <memxor at gmail.com>
Date: Tue, 6 May 2025 18:08:23 +0200
> On Tue, 6 May 2025 at 11:15, Christian Brauner <brauner at kernel.org> wrote:
> >
> > On Tue, May 06, 2025 at 12:49:11AM +0200, Kumar Kartikeya Dwivedi wrote:
> > > On Mon, 5 May 2025 at 23:58, Kuniyuki Iwashima <kuniyu at amazon.com> wrote:
> > > >
> > > > As long as recvmsg() or recvmmsg() is used with cmsg, it is not
> > > > possible to avoid receiving file descriptors via SCM_RIGHTS.
> > > >
> > > > This behaviour has occasionally been flagged as problematic.
> > > >
> > > > For instance, as noted on the uAPI Group page [0], an untrusted peer
> > > > could send a file descriptor pointing to a hung NFS mount and then
> > > > close it.  Once the receiver calls recvmsg() with msg_control, the
> > > > descriptor is automatically installed, and then the responsibility
> > > > for the final close() now falls on the receiver, which may result
> > > > in blocking the process for a long time.
> > > >
> > > > systemd calls cmsg_close_all() [1] after each recvmsg() to close()
> > > > unwanted file descriptors sent via SCM_RIGHTS.
> > > >
> > > > However, this cannot work around the issue because the last fput()
> > > > could occur on the receiver side once sendmsg() with SCM_RIGHTS
> > > > succeeds.  Also, even filtering by LSM at recvmsg() does not work
> > > > for the same reason.
> > > >
> > > > Thus, we need a better way to filter SCM_RIGHTS on the sender side.
> > > >
> > > > This series allows BPF LSM to inspect skb at sendmsg() and scrub
> > > > SCM_RIGHTS fds by kfunc.
> > > >
> > > > Link: https://uapi-group.org/kernel-features/#disabling-reception-of-scm_rights-for-af_unix-sockets #[0]
> > > > Link: https://github.com/systemd/systemd/blob/v257.5/src/basic/fd-util.c#L612-L628 #[1]
> > > >
> > >
> > > This sounds pretty useful!
> > >
> > > I think you should mention the cases of possible DoS on close() or
> > > flooding, e.g. with FUSE controlled fd/NFS hangs in the commit log
> > > itself.
> > > I think it's been an open problem for a while now with no good solution.
> > > Currently systemd's FDSTORE=1 for PID 1 is susceptible to the same
> > > problem, even if the underlying service isn't root.
> > >
> > > I think it is also useful for restricting what individual file
> > > descriptors can be passed around by a process.
> > > Say restricting usage of an fd to a process and its children, but not
> > > allowing it to be shared with others.
> > > Send side hook is the right point to enforce it.
> > >
> > > Therefore exercising scm_fp_list would be a good idea.
> >
> > No, that's a terrible idea. If the receiver expects 10 file descriptors
> > and suddenly some magically disappear or the order gets messed up that's
> > terrible for security. It's either close all or nothing.
> 
> I was talking about exercising/reading it in the selftest, not
> exposing anything new.
> 
> Yes, the policy should be close all or nothing, but it can still be
> used to deny sendmsg when one of the descriptors being passed isn't in
> the allowed set.
> You just return 0 or an error. No need to scrub, no need to disappear
> some fds and let the message pass, which can be problematic.
> 
> >
> > > We should provide some more examples of the filtering policy in the selftests.
> > > Maybe a simple example, e.g. only memfd or a pipe fd can be passed,
> > > and nothing else.
> > > It would require checking file->f_ops.
> >
> > There's not going to be poking around in file->f_ops for this.
> 
> I don't think any poking is required. There's no need to expose anything extra.
> 
> Really, all that is needed is for an LSM hook to exist and the program
> to say success or failure.
> Even the scrub fds stuff can be dropped.
> 
> The program can simply inspect the scm_fp_list and if it doesn't look
> ok, deny the sendmsg.
> It's already there inside unix_skb_parms.
> 
> It just means the program can look at the file (there's no helper
> needed to be exposed) and make a decision, just like in the rest of
> the BPF LSM hooks.
> I think a socket option makes sense too, but ideally we can have both
> the hook and the socket option.
> 
> The socket option has the advantage that user space can set it itself
> conveniently, without having to load a BPF program.
> Meanwhile the hook can be more fine grained in decision making and be
> imposed by some central entity.
> 
> Does this sound reasonable? I don't think it requires anything beyond
> simply defining the hook and letting a program run there.
> No poking into VFS internals etc. or silently dropping file
> descriptors and letting it succeed.
> 
> So mostly patch 1-2 and then another to add a setsockopt flag.

Right, patch 1-2 is enough to let BPF LSM to filter each skb.

I'll also add the socket option too.

Thanks!