[PATCH] RDMA/uverbs: Consider capability of the process that opens the file

Fri Apr 25 18:35:29 UTC 2025

On Fri, Apr 25, 2025 at 12:34:21PM -0500, Eric W. Biederman wrote:
> > What about something like CAP_SYS_RAWIO? I don't think we would ever
> > make that a per-userns thing, but as a thought experiment, do we check
> > current->XXX->user_ns or still check ibdev->netns->XX->user_ns?
> >
> 
> Oh.  CAP_SYS_RAWIO is totally is something you can have.  In fact
> the first process in a user namespace starts out with CAP_SYS_RAWIO.
> That said it is CAP_SYS_RAWIO with respect to the user namespace.
> 
> What would be almost certainly be a bug is for any permission check
> to be relaxed to ns_capable(resource->user_ns, CAP_SYS_RAWIO).

So a process "has" it but the kernel never accepts it?

> I don't know what an infiniband character device refers to.  Is it an
> attachment of a physical cable to the box like a netdevice?  Is it an
> infiniband queue-pair?

It refers to a single struct ib_device in the kernel. It is kind of a
like a namespace in that all the commands executed and uobjects
created on the FD are relative to the struct ib_device.

> The names (device major and minor) not living in a network namespace
> mean that there can be problems for CRIU to migrate a infiniband device,
> as it's device major and minor number are not guaranteed to be
> available.  Perhaps that doesn't matter, as the name you open is on a
> filesystem.  *Shrug*

I don't see a path for CRIU and rdma, there is too much hardware
state.. Presumably if anyone ever did it they'd have to ignore that
the major/minor changes.

> > static int ib_uverbs_open(struct inode *inode, struct file *filp)
> > {
> > 	if (!rdma_dev_access_netns(ib_dev, current->nsproxy->net_ns)) {
> > 		ret = -EPERM;
> >
> 
> > bool rdma_dev_access_netns(const struct ib_device *dev, const struct net *net)
> > {
> > 	return (ib_devices_shared_netns ||
> > 		net_eq(read_pnet(&dev->coredev.rdma_net), net));
> >
> > So you can say we 'captured' the net_ns into the FD as there is some
> > struct file->....->ib_dev->..->net_ns that does not change
> >
> > Thus ib_dev->...->user_ns is going to always be the user_ns of the
> > netns of the process that opened the FD.
> 
> Nope.
> 
> There is no check against current->cred->user_ns.  So the check has
> nothing to do with the credentials of the process that opened the
> character device.

I said "user_ns of the netns"?  Credentials of the process is something
else?

It sounds like we just totally ignore current->cred->user_ns from the
rdma subsystem perspective?

> > So.. hopefully final question.. When we are in a system call context
> > and want to check CAP_NET_XX should we also require that the current
> > process has the same net ns as the ib_dev?
> 
> I want to say in general only for opening the ib_device.
> 
> I don't know what to say for the case where ib_devices_shared_netns is
> true. In that case the ib_device doesn't have a network namespace at
> all, so at best it would appear to be a nonsense check.

In shared mode it has no namespace containment at all, presumably any
capable checks should continue to be done on the init_net?

> I think you need to restrict the relaxation to the case where
> ib_devices_shared_netns is false.

I think we will never check anything other than init_net in shared
mode.

> The network stack in general uses netlink to talk to network devices
> (sockets are another matter), so this whole using character devices
> to talk to devices is very weird to me.

It isn't that different. In netlink you get the FD through socket, in
char dev you get it through open.

In netdev you can't "open" eth1 but in most other subsystem you can
get a FD that encapsulates a physical device.

Jason