[PATCH] RDMA/uverbs: Consider capability of the process that opens the file

Thu Apr 24 09:08:17 UTC 2025

> From: Jason Gunthorpe <jgg at nvidia.com>
> Sent: Wednesday, April 23, 2025 10:16 PM
> 
> On Wed, Apr 23, 2025 at 03:56:39PM +0000, Parav Pandit wrote:
> > > > And I wonder if using the uobjects affiliated netdev's namespace
> > > > is OK?
> > >
> > We don't refer to the netdev of the rdma. Because netdev is not there in
> many cases.
> > Its just rdma device.
> 
> The ib_device itself also has a net namespace these days.
> 
> I really worry that a single uobject has too many choices for the namespace:
> 
>  1) The one provided by current during a system call
>  2) The one that was active in current when the uobject was created
>  3) The one that is linked to a netdev associated with the uobject when it was
> created
>  4) The one that is linked to the ufile's underlying ib_device
>  5) The one that was active in current when the ufile was opened.
> 
> In all practical cases we expect that all of the above are the same thing, so this
> is looking at fringe cases where userspace is changing the namespaces during
> the lifecycle of the FD.
> 
> So.. Some basic questions.
> 
> Since ib_device has a namespace and ufile is tied to a ib_device, can we ever
> have a situation where the ib_device has a different namespace than the
> ufile's? This would mean we changed the namespace of the ib_device, and
> IIRC, that means we revoked/disassociated the ufile? So the answer is no?
> This means #4 and #5 are the same thing.
>
Right.

> Can a uobject affiliated netdev have a different namespace than the
> ib_device? 
When a uobject when created, it is not affiliated to netdev.
In many cases, a rdma device does not even have netdev at all.
Some examples of it are, IB device without ipoib, efa, an SF rdma device.
So uboject does not need to worry about netdev at all.
At best it is the net ns of #1 or #2.

> The netdevs arise from the gid table, and the gid table population
> should strictly follow the ib_device namespace, yes? 
I wish it this way, but unfortunately, rdma still have ancient shared mode for example single rdma device + macvlan.
Until that is deprecated, let the gid table entry's netdev drives the QP modify as done today.

> So, I think the answer is
> generally no, but there are going to be transient cases where a gid table entry
> is in progress to delete while a netdev is moving to another namespace? This
> means #3/#4/#5 are the same thing.
>
True. 

> Can current have a different namespace than the ib_device? I guess yes, the
> FD can be passed around. However this would mean that the FD caller should
> not be able to get any gid table handles as none of its ifindexes will work. So
> #1 is != #3/#4/#5
> 
Well, it can pass the fd after the ifindex is resolved (i.e. after modify_qp).
If fd is passed before modify qp in different net ns, its can get access too because rdma device got shared.
But that is the case with raw socket too.
The difference is, every send() call checks the ifindex, vs here its checked when raw qp is created.

We can add the additional check in the sysfs and in modify qp, but very long ago (2019), we envisioned that users should use only the exclusive mode.
And hence, those checks were not added.

> And finally the FD can be passed around after the uobject is created so #2 !=
> #1.
>
Right. So the optimal place to attach to user_ns seems #2.

> So, I would say the correct namespace path to use depends entirely on what it
> is you are checking.
>
When the uobject is created, that's where the enforcement should happen against the current process.

> 1) During uobject creation CAP_NET_RAW is checked against current.
>    Perhaps we should further insist that current == ib_device's NS
>    as well?
This can be done when we are in exclusive mode which is not the default to avoid breaking backward compatibility.

> 2) During gid_table lookup for any reason. Use current to translate
>    the ifindex to a netdevice. Match the netdevice against the gid
>    table.
We always use the ifindex and net_ns of the netdev of the GID. So this is ok.

> Effectively fails if current != ib_device's NS.
Only possible when in exclusive mode.

> 3) Routing lookups/etc should use the namespace of the netdevice of
>    the gid index being looked up.
> 
This is already done.

> What other NS users are there?
Incoming rx IB mad packets are looked up in the GID's attached netdev's net ns.
In-kernel ulps (nfs, smc) do not seem to have the interest, but they do not created uobjects nor they access any uverbs fd.
So they will be protected differently anyway.
I had patches in 2019 time where kernel caller passes the net_ns as they hold the reference to it.
But no interest in users of it, so skipped it.
Last time someone from Western Digital reached out for nvme fabrics but no_show after that.

Regardless, those users are unrelated to user_ns uobjects.
> 
> > > Going back to the original proposal I don't know how ready the code
> > > is to handle callers that are not root.  This is both a question of
> > > semantics (is it safe in theory) and a question of implementation
> > > (are there unfixed bugs that no one cares about because only root has
> been using the code).
> 
> We need to look at each change, but I think most of it is fine.
> 
> Jason