[PATCH] RDMA/uverbs: Consider capability of the process that opens the file
Parav Pandit
parav at nvidia.com
Fri Apr 4 14:53:30 UTC 2025
Hi Eric, Jason,
I would like to resume and conclude this discussion.
> From: Jason Gunthorpe <jgg at nvidia.com>
> Sent: Wednesday, March 19, 2025 4:27 AM
> On Tue, Mar 18, 2025 at 03:00:15PM -0500, Eric W. Biederman wrote:
>
> > There are also a lot of places where inifinband uses raw read/write on
> > file descriptors. I think last time I looked infiniband wasn't even
> > using ioctl.
>
> Yeah, that's all deprecated now, and it had some major security issue with the
> 'setuid cat' attack. IIRC it was mitigated by disallowing read/write from a
> process with different credentials than the process that opened the FD. This
> caused regressions which were resolved by moving to ioctl.
>
> Today you can compile the read/write interface out of the kernel - for the last
> uh 6 years or so the userspace has exclusively used ioctl.
>
> > > You would not say that if process B creates a CAP_NET_RAW socket FD
> > > and passes it to process A without CAP_NET_RAW then A should not be
> > > able to use the FD.
> >
> > But that is exactly what the infiniband security check were are
> > talking about appears to be doing. It is using the credentials of
> > process A and failing after it was passed by process B.
>
> I'm not sure what you are refering too? The model should be that the process
> invoking the system call is the one that provides the capability set.
>
> It is entirely possible that the code is wrong, but the above was the intention.
>
> > Taking from your example above. If process B with CAP_NET_RAW creates
> > a FD for opening queue pairs and passes it to process A without
> > CAP_NET_RAW then A is not able to create queue pairs.
>
> Yes that's right, because the FD itself has no security properties at all, it is just
> a conduit for calling into the kernel.
>
> Process A cannot create raw queue pairs in the same way that Process A
> cannot create raw sockets, it doesn't matter what where the FD came from.
>
> > That is what the code in
> > drivers/infiniband/core/ubvers_cmd.c:create_qp() currenty says.
>
> I'm not sure what you are referring to here? That function is called on the
> system call path, and at least the intention was that this:
>
> case IB_QPT_RAW_PACKET:
> if (!capable(CAP_NET_RAW))
> return -EPERM;
> break;
>
> Would check the current task invoking the system call to see if that task has
> the required capability.
>
> Jason
To summarize,
1. A process can open an RDMA resource (such as a raw QP, raw flow entry, or similar 'raw' resource)
through the fd using ioctl(), if it has the appropriate capability, which in this case is CAP_NET_RAW.
This is similar to a process that opens a raw socket.
2. Given that RDMA uses ioctl() for resource creation, there isn't a security concern surrounding
the read()/write() system calls.
3. If process A, which does not have CAP_NET_RAW, passes the opened fd to another privileged
process B, which has CAP_NET_RAW, process B can open the raw RDMA resource.
This is still within the kernel-defined security boundary, similar to a raw socket.
4. If process A, which has the CAP_NET_RAW capability, passes the file descriptor to Process B, which does not have CAP_NET_RAW, Process B will not be able to open the raw RDMA resource.
Do we agree on this Eric?
Assuming yes, to extend this, further,
5. the process's capability check should be done in the right user namespace.
(instead of current in default user ns).
The right user namespace is the one which created the net namespace.
This is because rdma networking resources are governed by the net namespace.
Above #5 aligns with the example from existing kernel doc snippet below [1] and few kernel examples of [2].
For example, suppose that a process attempts to change
the hostname (sethostname(2)), a resource governed by the UTS
namespace. In this case, the kernel will determine which user
namespace owns the process's UTS namespace, and check whether the
process has the required capability (CAP_SYS_ADMIN) in that user
namespace.
[1] https://man7.org/linux/man-pages/man7/user_namespaces.7.html
[2] examples snippet that follows above guidance of #5.
File: drivers/infiniband/core/device.c
Function: ib_device_set_netns_put()
For net namespace:
if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
ret = -EPERM;
goto ns_err;
}
File: fs/namespace.c
For mount namespace:
if (!ns_capable(from->mnt_ns->user_ns, CAP_SYS_ADMIN))
goto out;
if (!ns_capable(to->mnt_ns->user_ns, CAP_SYS_ADMIN))
goto out;
For uts ns:
static int utsns_install(struct nsset *nsset, struct ns_common *new)
{
struct nsproxy *nsproxy = nsset->nsproxy;
struct uts_namespace *ns = to_uts_ns(new);
if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
!ns_capable(nsset->cred->user_ns, CAP_SYS_ADMIN))
return -EPERM;
For net ns:
File: net/core/dev_ioctl.c
case SIOCSHWTSTAMP:
if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
return -EPERM;
fallthrough;
static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
{
int ret;
if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
return -EPERM;
More information about the Linux-security-module-archive
mailing list