[PATCH] RDMA/uverbs: Consider capability of the process that opens the file

Parav Pandit parav at nvidia.com
Mon Apr 21 11:04:57 UTC 2025


> From: Serge E. Hallyn <serge at hallyn.com>
> Sent: Monday, April 21, 2025 8:43 AM
> 
> On Fri, Apr 04, 2025 at 02:53:30PM +0000, Parav Pandit wrote:
> > Hi Eric, Jason,
> 
> Hi,
> 
> I'm jumping back up the thread as I think this email best details the things I'm
> confused about :)  Three questions below in two different stanzas.
> 
> > To summarize,
> >
> > 1. A process can open an RDMA resource (such as a raw QP, raw flow
> > entry, or similar 'raw' resource) through the fd using ioctl(), if it has the
> appropriate capability, which in this case is CAP_NET_RAW.
> 
> Why does it need CAP_NET_RAW to create the resource, if the resource won't
> be usable by a process without CAP_NET_RAW later anyway?  
Once the resource is created, and the fd is shared (like a raw socket fd), it will be usable by a process without CAP_NET_RAW.
Is that a concern? If yes, how is it solved for raw socket fd? It appears to me that it is not.

> Is that legacy
> for the read/write (vs ioctl) case?  
No.

> Or is it to limit the number of opened
> resources?  Or some other reason?
> 
The resource enables to do raw operation, hence the capability check of the process for having NET_RAW cap.

> Is the resource which is created tied to the net namespce of the process
> which created it?
The resource is tied to the process. So if rdma device on which the resource is created, if rdma device net ns changes, then this resource will be destroyed.
The resource is associated with the process. Therefore, if the RDMA device on which the resource was created changes its network namespace, all the resources will be destroyed for the process.

> 
> > This is similar to a process that opens a raw socket.
> >
> > 2. Given that RDMA uses ioctl() for resource creation, there isn't a
> > security concern surrounding the read()/write() system calls.
> >
> > 3. If process A, which does not have CAP_NET_RAW, passes the opened fd
> > to another privileged process B, which has CAP_NET_RAW, process B can
> open the raw RDMA resource.
> > This is still within the kernel-defined security boundary, similar to a raw
> socket.
> >
> > 4. If process A, which has the CAP_NET_RAW capability, passes the file
> descriptor to Process B, which does not have CAP_NET_RAW, Process B will
> not be able to open the raw RDMA resource.
> >
> > Do we agree on this Eric?
> >
> > Assuming yes, to extend this, further,
> >
> > 5. the process's capability check should be done in the right user
> namespace.
> > (instead of current in default user ns).
> > The right user namespace is the one which created the net namespace.
> 
> "the one which created THE net namespace" - which net namespace?   The
> one in which the process which created the resource belonged, or the one in
> which the current process (calling ioctl) belongs?
When the ioctl() is invoked for resource creation, this process has its net namespace.
And this net ns has owner user ns.

In my understanding, the capability check is for the process's capability for a specific resource _type_.
And have nothing to do with the individual resource itself.

A sane flow in my view is,
a. create user namespace
b. create net namespace
c. move rdma device to net namespace done in #b
d. launch a process in user_ns of #a and net ns of #b and let it operate the device from there.

If the process after step #d, creates new net ns, or new user ns, any new ioctl() for resource creation, will check caps against the latest net/user ns.

> 
> > This is because rdma networking resources are governed by the net
> namespace.
> >
> > Above #5 aligns with the example from existing kernel doc snippet below [1]
> and few kernel examples of [2].
> >
> > For example, suppose that a process attempts to change
> >        the hostname (sethostname(2)), a resource governed by the UTS
> >        namespace.  In this case, the kernel will determine which user
> >        namespace owns the process's UTS namespace, and check whether the
> >        process has the required capability (CAP_SYS_ADMIN) in that user
> >        namespace.
> >
> > [1] https://man7.org/linux/man-pages/man7/user_namespaces.7.html
> >
> > [2] examples snippet that follows above guidance of #5.
> >
> > File: drivers/infiniband/core/device.c
> > Function: ib_device_set_netns_put()
> > For net namespace:
> >
> >          if (!netlink_ns_capable(skb, net->user_ns, CAP_NET_ADMIN)) {
> >                  ret = -EPERM;
> >                  goto ns_err;
> >          }
> >
> > File: fs/namespace.c
> > For mount namespace:
> >         if (!ns_capable(from->mnt_ns->user_ns, CAP_SYS_ADMIN))
> >                 goto out;
> >         if (!ns_capable(to->mnt_ns->user_ns, CAP_SYS_ADMIN))
> >                 goto out;
> >
> > For uts ns:
> >  static int utsns_install(struct nsset *nsset, struct ns_common *new)
> > {
> >          struct nsproxy *nsproxy = nsset->nsproxy;
> >          struct uts_namespace *ns = to_uts_ns(new);
> >
> >          if (!ns_capable(ns->user_ns, CAP_SYS_ADMIN) ||
> >              !ns_capable(nsset->cred->user_ns, CAP_SYS_ADMIN))
> >                  return -EPERM;
> >
> > For net ns:
> > File: net/core/dev_ioctl.c
> >          case SIOCSHWTSTAMP:
> >                  if (!ns_capable(net->user_ns, CAP_NET_ADMIN))
> >                          return -EPERM;
> >                  fallthrough;
> >
> > static int do_arpt_get_ctl(struct sock *sk, int cmd, void __user
> > *user, int *len) {
> >          int ret;
> >
> >          if (!ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN))
> >                  return -EPERM;



More information about the Linux-security-module-archive mailing list