[PATCH] RDMA/uverbs: Consider capability of the process that opens the file

Fri Apr 25 13:14:35 UTC 2025

> From: Jason Gunthorpe <jgg at nvidia.com>
> Sent: Thursday, April 24, 2025 7:44 PM
> To: Parav Pandit <parav at nvidia.com>
> Cc: Eric W. Biederman <ebiederm at xmission.com>; Serge E. Hallyn
> <serge at hallyn.com>; linux-rdma at vger.kernel.org; linux-security-
> module at vger.kernel.org; Leon Romanovsky <leonro at nvidia.com>
> Subject: Re: [PATCH] RDMA/uverbs: Consider capability of the process that
> opens the file
> 
> On Thu, Apr 24, 2025 at 09:08:17AM +0000, Parav Pandit wrote:
> > > Since ib_device has a namespace and ufile is tied to a ib_device,
> > > can we ever have a situation where the ib_device has a different
> > > namespace than the ufile's? This would mean we changed the
> namespace
> > > of the ib_device, and IIRC, that means we revoked/disassociated the ufile?
> So the answer is no?
> > > This means #4 and #5 are the same thing.
> > >
> > Right.
> >
> > > Can a uobject affiliated netdev have a different namespace than the
> > > ib_device?
> > When a uobject when created, it is not affiliated to netdev.
> 
> I'm asking about when it does have a netdev. When you create/modify a QP
> and give it a gid index, for instance.
We don't have a check.
Usually once the rdma device and associated uverbs char device are assigned to a respective net and mount ns.
So to protect against such privileged user error, net ns enforcement would be good to add for exclusive mode.

More below.
> 
> > > The netdevs arise from the gid table, and the gid table population
> > > should strictly follow the ib_device namespace, yes?
> 
> > I wish it this way, but unfortunately, rdma still have ancient shared
> > mode for example single rdma device + macvlan.  Until that is
> > deprecated, let the gid table entry's netdev drives the QP modify as
> > done today.
> 
> I have been ignoring shared mode in all of this analysis. I don't think you can
> make sane statements about container security in shared mode.
> 
True. one option is to check in modify_qp() and other friend callers net ns checks 
with netdev and current->nsproxy->net.
(and not of ibdev).

Or 2nd option (preferred) is: to perform the checks against only in the exclusive mode, 
against the net_ns of the ibdev.
Finding this better, as you also suggest that further below.

> > > Can current have a different namespace than the ib_device? I guess
> > > yes, the FD can be passed around. However this would mean that the
> > > FD caller should not be able to get any gid table handles as none of
> > > its ifindexes will work. So
> > > #1 is != #3/#4/#5
> >
> > Well, it can pass the fd after the ifindex is resolved (i.e. after modify_qp).
> > If fd is passed before modify qp in different net ns, its can get access too
> because rdma device got shared.
> 
> That's all fine. The uobject retains its affiliated netdev.
> 
> > But that is the case with raw socket too.  The difference is, every
> > send() call checks the ifindex, vs here its checked when raw qp is
> > created.
> 
> Also I think fine
> 
> > We can add the additional check in the sysfs and in modify qp, but
> > very long ago (2019), we envisioned that users should use only the
> > exclusive mode.  And hence, those checks were not added.
> 
> I think we should ignore shared mode, it doesn't work sanely with
> namespaces.
> 
> > > What other NS users are there?
> 
> > Incoming rx IB mad packets are looked up in the GID's attached netdev's net
> ns.
> 
> Ultimately a GID index should not be delivered to a userspace that does not
> have that GID index in the objects affiliated net namespace.
> I wonder if we are missing some validation here
> 
I prefer to avoid holding a reference to the net namespace under uobject creation time.
I think its wrong to hold reference to net and user ns in syscall() object creation flow, but cant prove at the moment.

Better to enforce net ns checks when adding the GID.
This is where GID, netdev association occurs.
So if the netns of netdev and ibdev do not match in exclusive mode, we wont even add the GID.

And for some reason if netdev moves out of net ns, we remove such GID entries using usual callback via del_gid.

> > In-kernel ulps (nfs, smc) do not seem to have the interest, but they
> > do not created uobjects nor they access any uverbs fd.
> 
> IIRC we have open issues with NFS/SRP/etc and namespaces, the kernel ULP
> doesn't have a way to use a namespace?
> 
Nfs and srp initiator has net ns awareness. Rest all are in default net ns.

> Jason

Maybe I missed the user ns conclusion in discussing net ns enforcement.

So let me summarize my understanding:

1. In uobject creation syscall, I will add the check current->nsproxy->net->user_ns capability using ns_capable().
And we don't hold any reference for user ns.
This will be only done for the selected objects who need cap enforcement.
Can we proceed with this for user ns cap enforcement?

2. For net ns protection in exclusive mode, few enforcements to be done for 
ib device modify_qp, sysfs, gid query. This will be a separate, unrelated patch(es) to user ns.

3. Do not enforce things in shared net ns mode.

For #1 and #2, will send two different patch set.

Does this path look ok?