[PATCH v1 bpf-next 0/5] af_unix: Allow BPF LSM to scrub SCM_RIGHTS at sendmsg().

Kumar Kartikeya Dwivedi memxor at gmail.com
Mon May 5 22:49:11 UTC 2025


On Mon, 5 May 2025 at 23:58, Kuniyuki Iwashima <kuniyu at amazon.com> wrote:
>
> As long as recvmsg() or recvmmsg() is used with cmsg, it is not
> possible to avoid receiving file descriptors via SCM_RIGHTS.
>
> This behaviour has occasionally been flagged as problematic.
>
> For instance, as noted on the uAPI Group page [0], an untrusted peer
> could send a file descriptor pointing to a hung NFS mount and then
> close it.  Once the receiver calls recvmsg() with msg_control, the
> descriptor is automatically installed, and then the responsibility
> for the final close() now falls on the receiver, which may result
> in blocking the process for a long time.
>
> systemd calls cmsg_close_all() [1] after each recvmsg() to close()
> unwanted file descriptors sent via SCM_RIGHTS.
>
> However, this cannot work around the issue because the last fput()
> could occur on the receiver side once sendmsg() with SCM_RIGHTS
> succeeds.  Also, even filtering by LSM at recvmsg() does not work
> for the same reason.
>
> Thus, we need a better way to filter SCM_RIGHTS on the sender side.
>
> This series allows BPF LSM to inspect skb at sendmsg() and scrub
> SCM_RIGHTS fds by kfunc.
>
> Link: https://uapi-group.org/kernel-features/#disabling-reception-of-scm_rights-for-af_unix-sockets #[0]
> Link: https://github.com/systemd/systemd/blob/v257.5/src/basic/fd-util.c#L612-L628 #[1]
>

This sounds pretty useful!

I think you should mention the cases of possible DoS on close() or
flooding, e.g. with FUSE controlled fd/NFS hangs in the commit log
itself.
I think it's been an open problem for a while now with no good solution.
Currently systemd's FDSTORE=1 for PID 1 is susceptible to the same
problem, even if the underlying service isn't root.

I think it is also useful for restricting what individual file
descriptors can be passed around by a process.
Say restricting usage of an fd to a process and its children, but not
allowing it to be shared with others.
Send side hook is the right point to enforce it.

Therefore exercising scm_fp_list would be a good idea.
We should provide some more examples of the filtering policy in the selftests.
Maybe a simple example, e.g. only memfd or a pipe fd can be passed,
and nothing else.
It would require checking file->f_ops.
I don't think "scrub all file descriptors" is the only possible usage scenario.
In the case of FDSTORE=1, it might be "everything except fuse or NFS fds" etc.

Eventually if file local storage happens, more interesting policies
may be possible.

> [...]



More information about the Linux-security-module-archive mailing list