[PATCH] nbd: override creds to kernel when calling sock_{send,recv}msg()
Ming Lei
ming.lei at redhat.com
Fri Oct 10 01:56:24 UTC 2025
On Thu, Oct 09, 2025 at 03:45:42PM +0200, Ondrej Mosnacek wrote:
> sock_{send,recv}msg() internally calls security_socket_{send,recv}msg(),
> which does security checks (e.g. SELinux) for socket access against the
> current task. However, _sock_xmit() in drivers/block/nbd.c may be called
> indirectly from a userspace syscall, where the NBD socket access would
> be incorrectly checked against the calling userspace task (which simply
> tries to read/write a file that happens to reside on an NBD device).
>
> To fix this, temporarily override creds to kernel ones before calling
> the sock_*() functions. This allows the security modules to recognize
> this as internal access by the kernel, which will normally be allowed.
>
> A way to trigger the issue is to do the following (on a system with
> SELinux set to enforcing):
>
> ### Create nbd device:
> truncate -s 256M /tmp/testfile
> nbd-server localhost:10809 /tmp/testfile
>
> ### Connect to the nbd server:
> nbd-client localhost
>
> ### Create mdraid array
> mdadm --create -l 1 -n 2 /dev/md/testarray /dev/nbd0 missing
-EACCESS is triggered when reading data from mdadm process:
@security[mdadm, -13,
handshake_exit+221615650
handshake_exit+221615650
handshake_exit+221616465
security_socket_sendmsg+5
sock_sendmsg+106
handshake_exit+221616150
sock_sendmsg+5
__sock_xmit+162
nbd_send_cmd+597
nbd_handle_cmd+377
nbd_queue_rq+63
blk_mq_dispatch_rq_list+653
__blk_mq_do_dispatch_sched+184
__blk_mq_sched_dispatch_requests+333
blk_mq_sched_dispatch_requests+38
blk_mq_run_hw_queue+239
blk_mq_dispatch_plug_list+382
blk_mq_flush_plug_list.part.0+55
__blk_flush_plug+241
__submit_bio+353
submit_bio_noacct_nocheck+364
submit_bio_wait+84
__blkdev_direct_IO_simple+232
blkdev_read_iter+162
vfs_read+591
ksys_read+95
do_syscall_64+92
entry_SYSCALL_64_after_hwframe+120
]: 1
The issue is started to expose since f1daaaf0c1fa ("block: add plug while submitting IO").
>
> ### Stop the array
> mdadm --stop /dev/md/testarray
>
> ### Disconnect the nbd device
> nbd-client -d /dev/nbd0
>
> ### Reconnect to nbd devices:
> nbd-client localhost
The above steps don't matter actually.
>
> After these steps, assuming the SELinux policy doesn't allow the
> unexpected access pattern, errors will be visible on the kernel console:
>
> [ 93.997980] nbd2: detected capacity change from 0 to 524288
> [ 100.314271] md/raid1:md126: active with 1 out of 2 mirrors
> [ 100.314301] md126: detected capacity change from 0 to 522240
> [ 100.317288] block nbd2: Send control failed (result -13) <-----
> [ 100.317306] block nbd2: Request send failed, requeueing <-----
> [ 100.318765] block nbd2: Receive control failed (result -32) <-----
> [ 100.318783] block nbd2: Dead connection, failed to find a fallback
> [ 100.318794] block nbd2: shutting down sockets
> [ 100.318802] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.318817] Buffer I/O error on dev md126, logical block 0, async page read
> [ 100.322000] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.322016] Buffer I/O error on dev md126, logical block 0, async page read
> [ 100.323244] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.323253] Buffer I/O error on dev md126, logical block 0, async page read
> [ 100.324436] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.324444] Buffer I/O error on dev md126, logical block 0, async page read
> [ 100.325621] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.325630] Buffer I/O error on dev md126, logical block 0, async page read
> [ 100.326813] I/O error, dev nbd2, sector 2048 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.326822] Buffer I/O error on dev md126, logical block 0, async page read
> [ 100.326834] md126: unable to read partition table
> [ 100.329872] I/O error, dev nbd2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.329889] Buffer I/O error on dev nbd2, logical block 0, async page read
> [ 100.331186] I/O error, dev nbd2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.331195] Buffer I/O error on dev nbd2, logical block 0, async page read
> [ 100.332371] I/O error, dev nbd2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.332379] Buffer I/O error on dev nbd2, logical block 0, async page read
> [ 100.333550] I/O error, dev nbd2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
> [ 100.333559] Buffer I/O error on dev nbd2, logical block 0, async page read
> [ 100.334721] nbd2: unable to read partition table
> [ 100.350993] nbd2: unable to read partition table
>
> The corresponding SELinux denial on Fedora/RHEL will look like this
> (assuming it's not silenced):
> type=AVC msg=audit(1758104872.510:116): avc: denied { write } for pid=1908 comm="mdadm" laddr=::1 lport=32772 faddr=::1 fport=10809 scontext=system_u:system_r:mdadm_t:s0-s0:c0.c1023 tcontext=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 tclass=tcp_socket permissive=0
>
> Cc: Ming Lei <ming.lei at redhat.com>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2348878
> Signed-off-by: Ondrej Mosnacek <omosnace at redhat.com>
> ---
> drivers/block/nbd.c | 16 ++++++++++++++++
> 1 file changed, 16 insertions(+)
>
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 6463d0e8d0cef..d50055c974a6b 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -52,6 +52,7 @@
> static DEFINE_IDR(nbd_index_idr);
> static DEFINE_MUTEX(nbd_index_mutex);
> static struct workqueue_struct *nbd_del_wq;
> +static struct cred *nbd_cred;
> static int nbd_total_devices = 0;
>
> struct nbd_sock {
> @@ -554,6 +555,7 @@ static int __sock_xmit(struct nbd_device *nbd, struct socket *sock, int send,
> int result;
> struct msghdr msg = {} ;
> unsigned int noreclaim_flag;
> + const struct cred *old_cred;
>
> if (unlikely(!sock)) {
> dev_err_ratelimited(disk_to_dev(nbd->disk),
> @@ -562,6 +564,8 @@ static int __sock_xmit(struct nbd_device *nbd, struct socket *sock, int send,
> return -EINVAL;
> }
>
> + old_cred = override_creds(nbd_cred);
> +
> msg.msg_iter = *iter;
>
> noreclaim_flag = memalloc_noreclaim_save();
> @@ -586,6 +590,8 @@ static int __sock_xmit(struct nbd_device *nbd, struct socket *sock, int send,
>
> memalloc_noreclaim_restore(noreclaim_flag);
>
> + revert_creds(old_cred);
> +
> return result;
> }
>
> @@ -2669,7 +2675,15 @@ static int __init nbd_init(void)
> return -ENOMEM;
> }
>
> + nbd_cred = prepare_kernel_cred(&init_task);
> + if (!nbd_cred) {
> + destroy_workqueue(nbd_del_wq);
> + unregister_blkdev(NBD_MAJOR, "nbd");
> + return -ENOMEM;
> + }
> +
> if (genl_register_family(&nbd_genl_family)) {
> + put_cred(nbd_cred);
> destroy_workqueue(nbd_del_wq);
> unregister_blkdev(NBD_MAJOR, "nbd");
> return -EINVAL;
> @@ -2706,6 +2720,8 @@ static void __exit nbd_cleanup(void)
>
> nbd_dbg_close();
>
> + put_cred(nbd_cred);
> +
> mutex_lock(&nbd_index_mutex);
> idr_for_each(&nbd_index_idr, &nbd_exit_cb, &del_list);
> mutex_unlock(&nbd_index_mutex);
Yeah, as commented by Stephen and Paul, put_cred() need to be moved after
destroy_workqueue(nbd_del_wq) in which wq function nbd disk is removed and
recv wq is destroyed.
Otherwise, this patch looks fine from block layer viewpoint, and I verified
that it does fix the -EACCESS failure for madadm to read from nbd.
Thanks,
Ming
More information about the Linux-security-module-archive
mailing list