LSM namespacing API

Mon Sep 1 17:31:43 UTC 2025

On 9/1/2025 9:01 AM, Dr. Greg wrote:
> On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:
>
> Good morning, I hope the week is starting well for everyone.
>
> Now that everyone is getting past the summer holiday season, it would
> seem useful to specifically clarify some of the LSM namespace
> implementation details.
>
>> On 8/21/25 07:26, Serge E. Hallyn wrote:
>>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
>>>> On 8/19/25 10:47, Stephen Smalley wrote:
>>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul at paul-moore.com> 
>>>>> wrote:
>>>>>> Hello all,
>>>>>>
>>>>>> As most of you are likely aware, Stephen Smalley has been working on
>>>>>> adding namespace support to SELinux, and the work has now progressed
>>>>>> to the point where a serious discussion on the API is warranted.  For
>>>>>> those of you are unfamiliar with the details or Stephen's patchset, or
>>>>>> simply need a refresher, he has some excellent documentation in his
>>>>>> work-in-progress repo:
>>>>>>
>>>>>> * https://github.com/stephensmalley/selinuxns
>>>>>>
>>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>>>>>> about SELinux namespacing, you can watch the presentation here:
>>>>>>
>>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>>>>>
>>>>>> In the past you've heard me state, rather firmly at times, that I
>>>>>> believe namespacing at the LSM framework layer to be a mistake,
>>>>>> although if there is something that can be done to help facilitate the
>>>>>> namespacing of individual LSMs at the framework layer, I would be
>>>>>> supportive of that.  I think that a single LSM namespace API, similar
>>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like
>>>>>> us to have a discussion to see if we all agree on that, and if so,
>>>>>> what such an API might look like.
>>>>>>
>>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where
>>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>>>> could opt into via callbacks.  John is directly CC'd on this mail, so
>>>>>> I'll let him expand on this idea.
>>>>>>
>>>>>> While I agree with John that a fs based API is problematic (see all of
>>>>>> our discussions around the LSM syscalls), I'm concerned that a single
>>>>>> clone*(2) flag will significantly limit our flexibility around how
>>>>>> individual LSMs are namespaced, something I don't want to see happen.
>>>>>> This makes me wonder about the potential for expanding
>>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>>>>>> provide a single LSM framework API for an unshare operation while also
>>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>>>>>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>>>>>> individual LSMs can opt-in to the API fairly easily by providing a
>>>>>> setselfattr() LSM callback.
>>>>>>
>>>>>> Thoughts?
>>>>> I think we want to be able to unshare a specific security module
>>>>> namespace without unsharing the others, i.e. just SELinux or just
>>>>> AppArmor.
>>>> yes which is part of the problem with the single flag. That choice
>>>> would be entirely at the policy level, without any input from userspace.
>>> AIUI Paul's suggestion is the user can pre-set the details of which
>>> lsms to unshare and how with the lsm_set_self_attr(), and then a
>>> single CLONE_LSM effects that.
>> yes, I was specifically addressing the conversation I had with Paul at
>> LSS that Paul brought up. That is
>>
>>   At LSS-NA this year, John Johansen and I had a brief discussion where
>>   he suggested a single LSM wide clone*(2) flag that individual LSM's
>>   could opt into via callbacks.
>>
>> the idea there isn't all that different than what Paul proposed. You
>> could have a single flag, if you can provide ancillary information. But
>> a single flag on its own isn't sufficient.
> If one thing has come out of this thread, it would seem to be the fact
> that there is going to be little commonality in the requirements that
> various LSM's will have for the creation of a namespace.
>
> Given that, the most infrastructure that the LSM should provide would
> be a common API for a resource orchestrator to request namespace
> separation and to provide a framework for configuring the namespace
> prior to when execution begins in the context of the namespace.
>
> The first issue to resolve would seem to be what namespace separation
> implies.
>
> John, if I interpret your comments in this discussion correctly, your
> contention is that when namespace separation is requested, all of the
> LSM's that implement namespaces will create a subordinate namespace,
> is that a correct assumption?
>
> It would seem, consistent with the 'stacking' concept, that any LSM
> with namespace capability that chooses not to separate, will result in
> denial of the separation request.  That in turn will imply the need to
> unwind or delete any namespace context that other LSM's may have
> allocated before the refusal occurred.

Were it true that 'stacking' rated the status of a 'concept'.

An LSM that is capable of namespacing (the definition of which is
elusive at this time) should be allowed to decline participation
in a namespace creation. That, or there needs to be a convention
for "null" namespaces, by which an LSM can pretend that it isn't
involved in the new namespace. I think the latter smells funny
and would invite "security people don't understand performance"
remarks. No LSM should be allowed to prevent another from using
namespaces.

>
> This model also implies that the orchestrator requesting the
> separation will need to pass a set of parameters describing the
> characteristics of each namespace, described by the LSM identifier
> that they pertain to.  Since there may be a need to configure multiple
> namespaces there would be a requirement to pass an array or list of
> these parameter sets.

Just like lsm_set_self_attr(2).

> There will also be a need to inject, possibly substantial amounts of
> policy or model information into the namespace, before execution in
> the context of the namespace begins.

Yup. A major downside of loadable policy.

> There will also be a need to decide whether namespace separation
> should occur at the request of the orchestrator or at the next fork,
> the latter model being what the other resource namespaces use.  We
> believe the argument for direct separation can be made by looking at
> the gymnastics that orchestrators need to jump through with the
> 'change-on-fork' model.
>
> Case in point, it would seem realistic that a process with sufficient
> privilege, may desire to place itself in a new LSM namespace context
> in a manner that does not require re-execution of itself.
>
> With respect to separation, the remaining issue is if a new security
> capability bit needs to be implemented to gate namespace separation.
> John, based on your comments, I believe you would support this need?

I don't like the notion of a new capability for this. But then,
I object to almost every new capability proposed. Existing namespaces
don't need their own capabilities. I don't see this case as special.

>
>> You can do a subset with a single flag and only policy directing things,
>> but that would cut container managers out of the decision. Without a
>> universal container identifier that really limits what you can do. In
>> another email I likend it to the MCS label approach to the container
>> where you have a single security policy for the container and each
>> container gets to be a unique instance of that policy. Its not a perfect
>> analogy as with namespace policy can be loaded into the namespace making
>> it unique. I don't think the approach is right because not all namespaces
>> implement a loadable policy, and even when they do I think we can do a
>> better job if the container manager is allowed to provide additional
>> context with the namespacing request.
> In order to be relevant, the configuration of LSM namespaces need to
> be under control of a resource orchestrator or container manager.

I do not approve of kernel features that are pointless without specific
user space support. If it can't be used in ways other than those
defined by a particular user space component they really don't belong
in the kernel. 

>
> What we hear from people doing Kubernetes, at scale, is a desire to be
> able to request that a container be run somewhere in the hardware
> resource pool and for that container to implement a security model
> specific to the needs of the workload running in that container.  In a
> manner that is orthogonal from other security policies that may be in
> effect for other workloads, on the host or in other containers.

That sounds to me like they want per-container security policy. That
would require that the kernel have the 'concept' of a container. That's
not something I expect to see in my lifetime.

>
> Hopefully the above will be of assistance in furthering discussion.
>
> Have a good week.
>
> As always,
> Dr. Greg
>
> The Quixote Project - Flailing at the Travails of Cybersecurity
>               https://github.com/Quixote-Project
>