[PATCH] RDMA/uverbs: Consider capability of the process that opens the file

Eric W. Biederman ebiederm at xmission.com
Tue Apr 8 14:44:13 UTC 2025


Jason Gunthorpe <jgg at nvidia.com> writes:

> On Mon, Apr 07, 2025 at 11:16:35AM +0000, Parav Pandit wrote:
>> > > This all makes my head hurt. The right user namespace is the one that
>> > > is currently active for the invoking process, I couldn't understand
>> > > why we have net namespaces refer to user namespaces :\
>> > 
>> > A user at any time can create a new user namespace, without creating a new
>> > network namespace, and have privilege in that user namespace, over
>> > resources owned by the user namespace.
>>  
>> > So if a user can create a new user namespace, then say "hey I have
>> > CAP_NET_ADMIN over current_user_ns, so give me access to the RDMA
>> > resources belonging to my current_net_ns", that's a problem.
>
> But why is that possible? If the current user name space does not have
> CAP_NET_ADMIN then why can it create a new user name space that does?

Because it isn't CAP_NET_ADMIN.  The capabilities are per user
namespace.

AKA the pair (&init_user_ns, CAP_NET_ADMIN) is what you think of when
you think of CAP_NET_ADMIN.

The reason for this is a lot of things that capabilities guard are only
semantically a problem because it would confuse preexisting suid root
binaries.  Binding to the low ports (for example) is no more special
than binding to any other port, except that assumptions can be made
about who has bound to the low ports.

So if you can restrict binding to the low ports only to network
namespaces that you and your children control so there is no change
of confusing a suid root application that it is a legitimate operation
to perform.

In networking terms the user namespace and the subordinate namespace
created with user namespace permissions are a bit like a tunnel.  The
users of a tunnel can do anything inside their tunnel assign IP
addresses etc, and no one will care as long as it all stays inside the
tunnel.

So in essence the question is do you have capabilities within the tunnel
or do you have capabilities outside of a tunnel.


Do to historical silliness there is a practical concern about code that
only root could run.  People tend not to worry if there are bugs that
allow such code to do unintended things.  So even if semantically it is
safe to allow such code, generally the code needs a bit of an audit to
make certain there are not bugs or implementation assumptions that will
be violated when allowing additional functionality in a user namespace.

> And if userspace does have CAP_NET_ADMIN what is the issue with
> creating more user namespaces that also have it?
>
>> > So that's why the check should be ns_capable(device->net->user-ns,
>> > CAP_NET_ADMIN) and not ns_capable(current_user_ns, CAP_NET_ADMIN).
>> >
>> Given the check is of the process (and hence user and net ns) and not of the rdma device itself,
>> Shouldn't we just check,
>> 
>> ns_capable(current->nsproxy->user_ns, ...)
>> 
>> This ensures current network namespace's owning user ns is consulted.
>
> It sounds like the design does not store the capabilities inside the
> current user_ns, but it logically stores them in other NSs. Ie all the
> net related capabilities are in the netns.
>
> Presumably then we have a mapping of every capability to the proper
> namespace to store it?

Store is the wrong concept.  Namespaces remember which user namespace
they were created from.  This allows the capability checks to require
that you have the capability in the user namespace that created them,
or in a parent user namespace.

There exists a full set of capabilities that can be present in
a user namespace.  The initial process in a user namespace is given
all of those capabilities in it's struct cred.  Just like the init
process is given all capabilities at system start.  The difference
is that when all you have are capabilities that are limited to
a user namespace they don't allow anything to be done (other than
creating namespaces) unless some namespaces are created from that
user namespace.

> If the container has a user namespace and the net ns uses the same
> user namespace then you get the appearance of user namespace
> controlled capabilities...

Essentially yes.

That network namespace requires CAP_NET_ADMIN in the user namespace
it was created within (or a parent user namespace), for it's capability
checks.

Eric






More information about the Linux-security-module-archive mailing list