LSM namespacing API

Thu Aug 21 14:18:03 UTC 2025

On 8/21/25 02:56, Mickaël Salaün wrote:
> On Wed, Aug 20, 2025 at 04:47:15PM -0400, Paul Moore wrote:
>> On Wed, Aug 20, 2025 at 10:44 AM Mickaël Salaün <mic at digikod.net> wrote:
>>> On Tue, Aug 19, 2025 at 02:51:00PM -0400, Paul Moore wrote:
>>>> On Tue, Aug 19, 2025 at 1:47 PM Stephen Smalley
>>>> <stephen.smalley.work at gmail.com> wrote:
>>
>> ...
>>
>>>> Since we have an existing LSM namespace combination, with processes
>>>> running inside of it, it might be sufficient to simply support moving
>>>> into an existing LSM namespace set with setns(2) using only a pidfd
>>>> and a new CLONE_LSMNS flag (or similar, upstream might want this as
>>>> CLONE_NEWLSM).  This would simply set the LSM namespace set for the
>>>
>>> Bike shedding but, I would prefer CLONE_NEWSEC or something without LSM
>>> because the goal is not to add a new LSM but a new "security" namespace.
>>
>> I disagree with your statement about the goal.  In fact I would argue
>> that one of the goals is to explicitly *not* create a generic
>> "security" namespace.  Defining a single, LSM-wide namespace, is
>> already an almost impossible task, extending it to become a generic
>> "security" namespace seems maddening.
> 
> I didn't suggest a generic "security" namespace that would include
> non-LSM access checks, just using the name "security" instead of "LSM",
> but never mind.
> 
>>
>>>> setns(2) caller to match that of the target pidfd.  We still wouldn't
>>>> want to support CLONE_LSMNS/CLONE_NEWLSM for clone*().
>>>
>>> Why making clone*() support this flag would be an issue?
>>
>> With the understanding that I'm not going to support a single LSM-wide
>> namespace (see my previous comments), we would need multiple flags for
> 
> I'm confused about the goal of this thread...  When I read namespace I
> think about the user space interface that enables to tie a set of
> processes to ambient kernel objects.  I'm not suggesting to force all
> LSM to handle namespaces, but to have a unified user space interface
> (i.e. namespace flag, file descriptor...) that can be used by user space
> to request a new "context" that may or may not be used by running LSMs.
> 

Yes to a unified interface, no to an LSM wide namespace. The interface
could request of the LSM to namespace, but its up to the LSM what it
will do. If it creates a namespace, whether that namespace is hierarchical,
or flat.

You would at the end of the call likely get a proxy object to a set of
individual LSM namespace contexts. Not that different than you have a
set of different system namespaces, mount, pid, user, ...

>> clone*(), one for each LSM that wanted to implement a namespace.
> 
> My understanding of this proposal was to create a LSM-wide namespace,
> and one of the reason was to avoid one namespace per LSM.  As I

no each LSM will do its own thing wrt namespacing. The proposal is just
to provide a common API and minimal infra around it.

> explained in my previous email, I think it would make sense and could be
> convincing.
> 
I have to agree with Paul that we won't generically agree on what an LSM
namespace should be.

>> While clone3() has expanded the number of flag bits from clone(),
>> there is still a limitation of 64-bits and I'm fairly certain the
>> other kernel devs are not going to be supportive of a flag for each
>> LSM that wants one.
>>
>> Maybe we could argue for our own u64 in cl_args, or create our own
>> lsm_clone(2) syscall that mimics clone3(2) with better LSM support,
>> but neither of these seem like great ideas at the moment.
> 
> My idea was that using CLONE_NEWLSM would just fork the current/initial
> namespace used by LSMs to tie security policies/configurations to
> processes, but as John already said, it would be the responsibility of
> each LSM to either inherit and keep in sync the parent policy (e.g.
> SELinux) or start with a blank/default one (e.g. Yama).
> 
Its not just these options though. The container manager may want to
"drop/add" an LSM.
Eg. one fedora/RH booting an Ubuntu container your host has selinux
the container wants apparmor.

In reality you have both selinux and apparmor active on the system,
but selinux is an enforcing state, and apparmor is in a no-policy
state.

selinux could deny creating the namespace, it could return its current
state, or it could mask itself by creating a namespace for the container
with the default unconfined_t policy, but its current state is still
there bounding the container, the container just doesn't see it.

On the AppArmor side at the request for a new namespace with apparmor
it needs to decide what to do independent of what selinux does. Yes
if configured correctly it should setup its policy namespace for the
container, but it has choices just like selinux that are driven
by policy as well as the userspace request for a specific combination
of LSMs for the cntainer.

> One way to configure a newly created namespace could be to load a
> configuration in the parent namespace (e.g. with one of the new LSM
> config syscall and a dedicated flag) that would only be applied to child
> namespaces when they are created, similarly to attr/exec for execve(2).
host injecting policy into the container certainly could be supported
but I think that would be a per LSM thing.

attr/exec flags Paul was discussing (correct me if I am wrong), where
a way to specify which LSMs should but part of the unshare. So the
whole I want a container to support Ubuntu or RH and need these LSMs.

> I think this is what you meant with the LSM_UNSHARE flag, right?
> 
Per my above understanding the LSM_UNSHARE flag is then just a
namespacing that indicates you want to unshare the LSM and use the afore
mentioned attrs.

I don't think it is actually needed, but maybe desirable for consistency.
If you have already set the above attrs, that already indicates what
you want to do with the namespace at clone/unshare.

This then gets fed into every LSM (whether in the attrs or not). So they
can make current policy decision, and then if allowed, as second hook
with the info, so that they can each setup and return with their context
setup. Not really all that different from exec.

>>
>>>> Any other ideas?
>>>
>>> The goal of a namespace is to configure absolute references (e.g. file
>>> path, network address, PID, time).  I think it would make sense to have
>>> an LSM/MAC/SEC namespace that would enforce a consistent access control
>>> on every processes in this namespace.
>>
>> Once again, I'm not going to support the idea of a namespace at the
>> LSM framework layer, individual LSMs are better suited to implementing
>> their own namespacing concepts.  However, I do support the LSM
>> framework providing an API and/or helpers to help make it easier for
>> individual LSMs and userspace to create/manage individual LSM
>> namespaces.
> 
> Should we still talk about "namespace" or use another name?
> 
its namespaces for LSMs, just not an LSM namespace.

>>
>>> A related namespace file
>>> descriptor could then be used with an LSM-specific syscall to configure
>>> the policy related to a specific namespace (instead of only the current
>>> namespace)
>>
>> That is a reasonable request, and I think the same underlying solution
>> that we would use for setns(2) could also be used here.
> 
> I'm not sure having a set of namespace file descriptors without related
> clone flags would be acceptable, at least for what we currently call
> Linux "namespace".

well Paul did propose a single Clone_LSM flag that would cover them ;-).

Agree with Paul that a per LSM flag would be unlikely and just raise the
whole, security is crazy why can't you agree on one "fun".