LSM namespacing API

Tue Sep 2 10:55:39 UTC 2025

On 9/1/25 09:01, Dr. Greg wrote:
> On Thu, Aug 21, 2025 at 07:57:11AM -0700, John Johansen wrote:
> 
> Good morning, I hope the week is starting well for everyone.
> 
> Now that everyone is getting past the summer holiday season, it would
> seem useful to specifically clarify some of the LSM namespace
> implementation details.
> 
>> On 8/21/25 07:26, Serge E. Hallyn wrote:
>>> On Thu, Aug 21, 2025 at 12:46:10AM -0700, John Johansen wrote:
>>>> On 8/19/25 10:47, Stephen Smalley wrote:
>>>>> On Tue, Aug 19, 2025 at 10:56???AM Paul Moore <paul at paul-moore.com>
>>>>> wrote:
>>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> As most of you are likely aware, Stephen Smalley has been working on
>>>>>> adding namespace support to SELinux, and the work has now progressed
>>>>>> to the point where a serious discussion on the API is warranted.  For
>>>>>> those of you are unfamiliar with the details or Stephen's patchset, or
>>>>>> simply need a refresher, he has some excellent documentation in his
>>>>>> work-in-progress repo:
>>>>>>
>>>>>> * https://github.com/stephensmalley/selinuxns
>>>>>>
>>>>>> Stephen also gave a (pre-recorded) presentation at LSS-NA this year
>>>>>> about SELinux namespacing, you can watch the presentation here:
>>>>>>
>>>>>> * https://www.youtube.com/watch?v=AwzGCOwxLoM
>>>>>>
>>>>>> In the past you've heard me state, rather firmly at times, that I
>>>>>> believe namespacing at the LSM framework layer to be a mistake,
>>>>>> although if there is something that can be done to help facilitate the
>>>>>> namespacing of individual LSMs at the framework layer, I would be
>>>>>> supportive of that.  I think that a single LSM namespace API, similar
>>>>>> to our recently added LSM syscalls, may be such a thing, so I'd like
>>>>>> us to have a discussion to see if we all agree on that, and if so,
>>>>>> what such an API might look like.
>>>>>>
>>>>>> At LSS-NA this year, John Johansen and I had a brief discussion where
>>>>>> he suggested a single LSM wide clone*(2) flag that individual LSM's
>>>>>> could opt into via callbacks.  John is directly CC'd on this mail, so
>>>>>> I'll let him expand on this idea.
>>>>>>
>>>>>> While I agree with John that a fs based API is problematic (see all of
>>>>>> our discussions around the LSM syscalls), I'm concerned that a single
>>>>>> clone*(2) flag will significantly limit our flexibility around how
>>>>>> individual LSMs are namespaced, something I don't want to see happen.
>>>>>> This makes me wonder about the potential for expanding
>>>>>> lsm_set_self_attr(2) to support a new LSM attribute that would support
>>>>>> a namespace "unshare" operation, e.g. LSM_ATTR_UNSHARE.  This would
>>>>>> provide a single LSM framework API for an unshare operation while also
>>>>>> providing a mechanism to pass LSM specific via the lsm_ctx struct if
>>>>>> needed.  Just as we do with the other LSM_ATTR_* flags today,
>>>>>> individual LSMs can opt-in to the API fairly easily by providing a
>>>>>> setselfattr() LSM callback.
>>>>>>
>>>>>> Thoughts?
>>>>>
>>>>> I think we want to be able to unshare a specific security module
>>>>> namespace without unsharing the others, i.e. just SELinux or just
>>>>> AppArmor.
>>>>
>>>> yes which is part of the problem with the single flag. That choice
>>>> would be entirely at the policy level, without any input from userspace.
>>>
>>> AIUI Paul's suggestion is the user can pre-set the details of which
>>> lsms to unshare and how with the lsm_set_self_attr(), and then a
>>> single CLONE_LSM effects that.
> 
>> yes, I was specifically addressing the conversation I had with Paul at
>> LSS that Paul brought up. That is
>>
>>    At LSS-NA this year, John Johansen and I had a brief discussion where
>>    he suggested a single LSM wide clone*(2) flag that individual LSM's
>>    could opt into via callbacks.
>>
>> the idea there isn't all that different than what Paul proposed. You
>> could have a single flag, if you can provide ancillary information. But
>> a single flag on its own isn't sufficient.
> 
> If one thing has come out of this thread, it would seem to be the fact
> that there is going to be little commonality in the requirements that
> various LSM's will have for the creation of a namespace.
> 

yes

> Given that, the most infrastructure that the LSM should provide would
> be a common API for a resource orchestrator to request namespace
> separation and to provide a framework for configuring the namespace
> prior to when execution begins in the context of the namespace.
> 

hrmmm, certainly a common API. Any task could theoretically use the API
it doesn't have to be a resource orchestrator, but I suppose you could
call it such.

I also dont know that we need to provide a framework for configuring
the namespace prior to when execcution begins in the context of the
namespace. It might be a nice to have, but configuring of LSMs is
very LSM specific.

We don't even have a common LSM policy load interface atm, though there
is a proposal. Configuration is a step beyond that. Would it be nice
to have, sure. Are we going to get that far, I don't know.

> The first issue to resolve would seem to be what namespace separation
> implies.
> 
> John, if I interpret your comments in this discussion correctly, your
> contention is that when namespace separation is requested, all of the
> LSM's that implement namespaces will create a subordinate namespace,
> is that a correct assumption?
> 
No, not necessarily. The task can request to "unshare/create" LSMs
similar to requesting a set of system namespaces. Then every LSM,
whether part of the request or not get to do their thing. If every
LSM agrees, then a transition hook will process and each LSM will
again do its thing. This would likely be what was requested but its
possible that an LSM not in the request will do something, based
on its model.

In the end usespace gets to make a request, each security policy is
responsible for staying withing its security model/policy.

> It would seem, consistent with the 'stacking' concept, that any LSM
> with namespace capability that chooses not to separate, will result in
> denial of the separation request.  That in turn will imply the need to

Not necessarily. They could allow and choose not to transition. Or they
could not create a namespace but update some state.

> unwind or delete any namespace context that other LSM's may have
> allocated before the refusal occurred.

The request does need to be split into a permission hook and a
transition hook similar to exec. If any LSM in the permission hook
denies, the request is denied. If any LSM in the transition hook fails
again the request will fail, and the LSMs would get their regular clean
up hook called for the object associated.

> 
> This model also implies that the orchestrator requesting the
> separation will need to pass a set of parameters describing the
> characteristics of each namespace, described by the LSM identifier
> that they pertain to.  Since there may be a need to configure multiple
> namespaces there would be a requirement to pass an array or list of
> these parameter sets.
> 
yes it will require a list/array see lsm_set_self_attr(2)

> There will also be a need to inject, possibly substantial amounts of
> policy or model information into the namespace, before execution in
> the context of the namespace begins.
> 
Allowing for this and requiring this are two different things. Like I
said above we don't even currently have a common policy load interface.
Configuration is another step beyond policy load.

> There will also be a need to decide whether namespace separation
> should occur at the request of the orchestrator or at the next fork,

Or allow both, but yes a decision needs to be made

> the latter model being what the other resource namespaces use.  We
> believe the argument for direct separation can be made by looking at
> the gymnastics that orchestrators need to jump through with the
> 'change-on-fork' model.
>
Looking at current system namespacing we have clone/unshare which
really or on fork. setns enters existing namespaces.

We either need to create new variants of clone/unshare or potentially
have an LSM syscall that setups addition parameters that then are
triggered by clone/unshare. If going the latter route then its just
a matter whether the LSM call returns a handle that can be operated
on or not.

> Case in point, it would seem realistic that a process with sufficient
> privilege, may desire to place itself in a new LSM namespace context
> in a manner that does not require re-execution of itself.
> 
yes, but it is questionable whether security policy should allow that.
At the very least security policy should be consulted and may deny
it.

> With respect to separation, the remaining issue is if a new security
> capability bit needs to be implemented to gate namespace separation.
> John, based on your comments, I believe you would support this need?
> 
No, I don't think a capability (as in posix.1e) per say is needed. I
think an LSM permission request is.

>> You can do a subset with a single flag and only policy directing things,
>> but that would cut container managers out of the decision. Without a
>> universal container identifier that really limits what you can do. In
>> another email I likend it to the MCS label approach to the container
>> where you have a single security policy for the container and each
>> container gets to be a unique instance of that policy. Its not a perfect
>> analogy as with namespace policy can be loaded into the namespace making
>> it unique. I don't think the approach is right because not all namespaces
>> implement a loadable policy, and even when they do I think we can do a
>> better job if the container manager is allowed to provide additional
>> context with the namespacing request.
> 
> In order to be relevant, the configuration of LSM namespaces need to
> be under control of a resource orchestrator or container manager.
> 
No, the must be under the control of the LSMs.

> What we hear from people doing Kubernetes, at scale, is a desire to be
> able to request that a container be run somewhere in the hardware
> resource pool and for that container to implement a security model
> specific to the needs of the workload running in that container.  In a
> manner that is orthogonal from other security policies that may be in
> effect for other workloads, on the host or in other containers.
> 
sure, assuming the host policy allows it. Otherwise it is just a host
policy by-pass, which can not be allowed. K8s people have a specific
use case, they need to configure the host for that use case. They can
not expect that use case to work on host that has been configured
for say an MLS security constraint.

> Hopefully the above will be of assistance in furthering discussion.
> 
> Have a good week.
> 
> As always,
> Dr. Greg
> 
> The Quixote Project - Flailing at the Travails of Cybersecurity
>                https://github.com/Quixote-Project