[PATCH 1/3] capabilities: user namespace capabilities

John Johansen john.johansen at canonical.com
Fri May 17 11:59:41 UTC 2024


On 5/17/24 03:51, Jonathan Calmels wrote:
> On Thu, May 16, 2024 at 03:07:28PM GMT, John Johansen wrote:
>> agreed, though it really is application dependent. Some applications handle
>> the denial at userns creation better, than the capability after. Others
>> like anything based on QTWebEngine will crash on denial of userns creation
>> but handle denial of the capability within the userns just fine, and some
>> applications just crash regardless.
> 
> Yes this is application specific, but I would argue that the latter is
> much more preferable. For example, having one application crash in a
> container is probably ok, but not being able to start the container in
> the first place is probably not. Similarly, preventing the network
> namespace creation breaks services which rely on systemd’s
> PrivateNetwork, even though they most likely use it to prevent any
> networking from being done.
> 
Agred the solution has to be application/usage model specific. Some of
them are easy, and others not so much.

>> The userns cred from the LSM hook can be modified, yes it is currently
>> specified as const but is still under construction so it can be safely
>> modified the LSM hook just needs a small update.
>>
>> The advantage of doing it under the LSM is an LSM can have a richer policy
>> around what can use them and tracking of what is allowed. That is to say the
>> LSM has the capability of being finer grained than doing it via capabilities.
> 
> Sure, we could modify the LSM hook to do all sorts of things, but
> leveraging it would be quite cumbersome, will take time to show up in
> userspace, or simply never be adopted.
> We’re already seeing it in Ubuntu which started requiring Apparmor profiles.
> 

yes, I would argue that is a metric of adoption.

> This new capability set would be a universal thing that could be
> leveraged today without modification to userspace. Moreover, it’s a
> simple framework that can be extended.

I would argue that is a problem. Userspace has to change for this to be
secure. Is it an improvement over the current state yes.

> As you mentioned, LSMs are even finer grained, and that’s the idea,
> those could be used hand in hand eventually. You could envision LSM
> hooks controlling the userns capability set, and thus enforce policies
> on the creation of nested namespaces without limiting the other tasks’
> capabilities.
> 
>> I am not opposed to adding another mechanism to control user namespaces,
>> I am just not currently convinced that capabilities are the right
>> mechanism.
> 
> Well that’s the thing, from past conversations, there is a lot of
> disagreement about restricting namespaces. By restricting the
> capabilities granted by namespaces instead, we’re actually treating the
> root cause of most concerns.
> 
no disagreement there. This is actually Ubuntu's posture with user namespaces
atm. Where the user namespace is allowed but the capabilities within it
are denied.

It does however when not handled correctly result in some very odd failures
and would be easier to debug if the use of user namespaces were just
cleanly denied.

> Today user namespaces are "special" and always grant full caps. Adding a
> new capability set to limit this behavior is logical; same way it's done
> for usual process transitions.
> Essentially this set is to namespaces what the inheritable set is to
> root.
> 
its not so much the capabilities set as the inheritable part that is
problematic. Yes I am well aware of where that is required but I question
that capabilities provides the needed controls here.

>> this should be bounded by the creating task's bounding set, other wise
>> the capability model's bounding invariant will be broken, but having the
>> capabilities that the userns want to access in the task's bounding set is
>> a problem for all the unprivileged processes wanting access to user
>> namespaces.
> 
> This is possible with the security bit introduced in the second patch.
> The idea of having those separate is that a service which has dropped
> its capabilities can still create a fully privileged user namespace.

yes, which is the problem. Not that we don't do that with say setuid
applications, but the difference is that they were known to be doing
something dangerous and took measures around that.

We are starting from a different posture here. Where applications have
assumed that user namespaces where safe and no measures were needed.
Tools like unshare and bwrap if set to allow user namespaces in their
fcaps will allow exploits a trivial by-pass.

> For example, systemd’s machined drops capabilities from its bounding set,
> yet it should be able to create unprivileged containers.
> The invariant is sound because a child userns can never regain what it
> doesn’t have in its bounding set. If it helps you can view the userns
> set as a “namespace bounding set” since it defines the future bounding
> sets of namespaced tasks.
> 
sure I get it, some of the use cases work, some not so well

>> If I am reading this right for unprivileged processes the capabilities in
>> the userns are bounded by the processes permitted set before the userns is
>> created?
> 
> Yes, unprivileged processes that want to raise a capability in their
> userns set need it in their permitted set (as well as their bounding
> set). This is similar to inheritable capabilities.

Right.

> Recall that processes start with a full set of userns capabilities, so
> if you drop a userns capability (or something else did, e.g.
> init/pam/sysctl/parent) you will never be able to regain it, and
> namespaces you create won't have it included.

sure, that part of the behavior is fine

> Now, if you’re root (or cap privileged) you can always regain it.
> 
yes

What I was trying to get at is two points.
1. The written description wasn't clear enough, leaving room for
    ambiguity.
2. That I quest that the behavior should be allowed given the
    current set of tools that use user namespaces. It reduces exploit
    codes ability to directly use unprivileged user namespaces but
    makes it all to easy to by-pass the restriction because of the
    behavior of the current tool set. ie. user space has to change.

>> This is only being respected in PR_CTL, the user mode helper is straight
>> setting the caps.
> 
> Usermod helper requires CAP_SYS_MODULE and CAP_SETPCAP in the initns so
> the permitted set is irrelevant there. It starts with a full set but from
> there you can only lower caps, so the invariant holds.
> 
sure, I get what is happening. Again the description needs work. It was
ambiguous as to whether it was applying to the fcaps or only the pcaps.

But again, I believe the fcaps behavior is wrong, because of the state of
current software. If this had been a proposal where there was no existing
software infrastructure I would be starting from a different stance.



More information about the Linux-security-module-archive mailing list