[PATCH v1 bpf-next 0/5] af_unix: Allow BPF LSM to scrub SCM_RIGHTS at sendmsg().
Kuniyuki Iwashima
kuniyu at amazon.com
Tue May 6 00:21:27 UTC 2025
From: Kumar Kartikeya Dwivedi <memxor at gmail.com>
Date: Tue, 6 May 2025 00:49:11 +0200
> On Mon, 5 May 2025 at 23:58, Kuniyuki Iwashima <kuniyu at amazon.com> wrote:
> >
> > As long as recvmsg() or recvmmsg() is used with cmsg, it is not
> > possible to avoid receiving file descriptors via SCM_RIGHTS.
> >
> > This behaviour has occasionally been flagged as problematic.
> >
> > For instance, as noted on the uAPI Group page [0], an untrusted peer
> > could send a file descriptor pointing to a hung NFS mount and then
> > close it. Once the receiver calls recvmsg() with msg_control, the
> > descriptor is automatically installed, and then the responsibility
> > for the final close() now falls on the receiver, which may result
> > in blocking the process for a long time.
> >
> > systemd calls cmsg_close_all() [1] after each recvmsg() to close()
> > unwanted file descriptors sent via SCM_RIGHTS.
> >
> > However, this cannot work around the issue because the last fput()
> > could occur on the receiver side once sendmsg() with SCM_RIGHTS
> > succeeds. Also, even filtering by LSM at recvmsg() does not work
> > for the same reason.
> >
> > Thus, we need a better way to filter SCM_RIGHTS on the sender side.
> >
> > This series allows BPF LSM to inspect skb at sendmsg() and scrub
> > SCM_RIGHTS fds by kfunc.
> >
> > Link: https://uapi-group.org/kernel-features/#disabling-reception-of-scm_rights-for-af_unix-sockets #[0]
> > Link: https://github.com/systemd/systemd/blob/v257.5/src/basic/fd-util.c#L612-L628 #[1]
> >
>
> This sounds pretty useful!
>
> I think you should mention the cases of possible DoS on close() or
> flooding, e.g. with FUSE controlled fd/NFS hangs in the commit log
> itself.
> I think it's been an open problem for a while now with no good solution.
> Currently systemd's FDSTORE=1 for PID 1 is susceptible to the same
> problem, even if the underlying service isn't root.
Good point, will add the description in v2.
>
> I think it is also useful for restricting what individual file
> descriptors can be passed around by a process.
> Say restricting usage of an fd to a process and its children, but not
> allowing it to be shared with others.
> Send side hook is the right point to enforce it.
Agreed.
Actually, I tried per-fd filtering first and failed somehow so
wanted some advice from BPF folks :)
For example, I implemented kfunc like:
__bpf_kfunc int bpf_unix_scrub_file(struct sk_buff *skb, struct file *filp)
{
/* scrub fd matching file if exists */
}
and tried filp == NULL -> scrub all so that I can gradually extend
the functionality, but verifier didn't allow passing NULL.
Also, once a fd is scrubbed, I do not want to leave the array entry
empty to avoid adding unnecessary "if (fpl->fp[i] == -1)" test in
other places.
struct scm_fp_list *fpl = UNIXCB(skb).fp;
/* scrubbed fpl->fp[i] here. */
fpl->fp[i] = fpl->fp[fpl->count - 1];
fpl->count--;
But this could confuse BPF prog if it was iterating fpl->fp[] in for
loop and I was wondering how the interface should be like.
* Keep the empty index and ignore at core code ?
* Provide a fd iterator ?
* Scrub based on index ? matching fd ? or struct file ?
* -1 works as ALL_INDEX or ALL_FDS but NULL doesn't
* Invoke BPF LSM per-fd ?
* Maybe no as sender/receiver pair is always same for the same skb
I guess keeping the empty index as is and index based scrubbing
would be simpler and cleaner ?
>
> Therefore exercising scm_fp_list would be a good idea.
> We should provide some more examples of the filtering policy in the selftests.
> Maybe a simple example, e.g. only memfd or a pipe fd can be passed,
> and nothing else.
> It would require checking file->f_ops.
Yes, and I thought we need fd-to-file kfunc or BPF helper, but I was
not sure which would be better as both functionality should be stable.
But given the user needs to inspect the raw scm_fp_list, kfunc is better ?
* bpf_fd_to_file()
or
* bpf_unix_get_scm_rights() -> return struct file ?
plus
* bpf_unix_scrub_scm_rights() -> scrub based on fd or file ?
>
> I don't think "scrub all file descriptors" is the only possible usage scenario.
> In the case of FDSTORE=1, it might be "everything except fuse or NFS fds" etc.
>
> Eventually if file local storage happens, more interesting policies
> may be possible.
>
More information about the Linux-security-module-archive
mailing list