LSM namespacing API

Thu Aug 21 08:07:50 UTC 2025

On 8/20/25 19:05, Serge E. Hallyn wrote:
> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
>> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
>> <stephen.smalley.work at gmail.com> wrote:
>>>
>>> I think we want to be able to unshare a specific security module
>>> namespace without unsharing the others, i.e. just SELinux or just
>>> AppArmor.
>>> Not sure if your suggestion above supports that already but wanted to note it.
>>
>> The lsm_set_self_attr(2) approach allows for LSM specific unshare
>> operations.  Take the existing LSM_ATTR_EXEC attribute as an example,
>> two LSMs have implemented support (AppArmor and SELinux), and
>> userspace can independently set the attribute as desired for each LSM.
> 
> Overall I really like the idea.
> 
>>> Serge pointed out that we also will need an API to attach to an
>>> existing SELinux namespace, which I captured here:
>>> https://github.com/stephensmalley/selinuxns/issues/19
>>> This is handled for other Linux namespaces by opening a pseudo file
>>> under /proc/pid/ns and invoking setns(2), so not sure how we want to
>>> do it.
>>
>> One option would be to have a the LSM framework return a LSM namespace
>> "handle" for a given LSM using lsm_get_self_attr(2) and then do a
>> setns(2)-esque operation using lsm_set_self_attr(2) with that
>> "handle".  We would need to figure out what would constitute a
>> "handle" but let's just mark that as TBD for now with this approach (I
>> think better options are available).
> 
> The use case which would be complicated (not blocked) by this, is
> 
> * a runtime creates a process p1
>    * p1 unshares its lsm namespace
> * runtime forks a debug/admin process p2
>    * p2 wants to enter p1's namespace
> 
> Of course the runtime could work around it by, before relinquishing
> control of p1 to a new executable, returning the lsm_get_self_attr()
> data to over a pipe.
> 
> Note I don't think we should support setting another task's namespace,
> only getting its namespace ID.
> 
its not reasonably doable without a significant update to the creds
architecture. Its an orthogal feature, being able to set another task's
credentials and as such can be saved for another argument. So very
much in agreement, lets not allow that as part of the design.

>> Since we have an existing LSM namespace combination, with processes
>> running inside of it, it might be sufficient to simply support moving
>> into an existing LSM namespace set with setns(2) using only a pidfd
>> and a new CLONE_LSMNS flag (or similar, upstream might want this as
>> CLONE_NEWLSM).  This would simply set the LSM namespace set for the
>> setns(2) caller to match that of the target pidfd.  We still wouldn't
>> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
> 
> A part of me is telling (another part of) me that being able to setns
> to a subset of the lsms could lead to privilege escapes through
> weird policy configurations for the various LSMs.  In which case,
> an all-or-nothing LSM setns might actually be preferable.
> 
> I haven't thought of a concrete example, though.
> 
Not just potentially, and not just security/LSM namespaces. Really

the LSMs need to be able to determine whether/which namespaces (including
system namespaces) need to move together as a set.