LSM namespacing API

Thu Apr 2 21:04:25 UTC 2026

On Thu, Apr 2, 2026 at 7:00 AM Dr. Greg <greg at enjellic.com> wrote:
> On Sun, Mar 29, 2026 at 08:56:37PM -0400, Paul Moore wrote:
> > On Sun, Mar 29, 2026 at 12:09???PM Dr. Greg <greg at enjellic.com> wrote:
> > > On Tue, Mar 24, 2026 at 05:31:09PM -0400, Paul Moore wrote:
> > > > On Tue, Mar 3, 2026 at 11:46???AM Paul Moore <paul at paul-moore.com> wrote:

...

> Christian had proposed patches for a generic mechanism to create
> LSM security namespace blobs, is implementation of that in scope for
> this effort?

That isn't what Christian proposed, although I can understand how a
quick glance at the patchset would lead you to believe that (I had the
same misunderstanding while skimming my inbox on my phone while
traveling).  I suggest reviewing Christian's post again as well as the
related Landlock patchset which is the first to use the hooks
Christian proposed.

> > > It would seem that the flags variable might be a good option to use to
> > > handle this 2-stage transition, for example LSM_NS_INIT and
> > > LSM_NS_CHANGE, respectively, to specify the initialization and
> > > execution phases of the transition.
>
> > No.  The lsm_unshare() syscall is intended to mimic the existing
> > unshare() syscall as a single step process from a user's
> > perspective.  If it returns successfully the caller will be in a new
> > LSM namespace as defined by the individual LSM specified in the
> > syscall.
>
> OK, we can reason forward with that paradigm.
>
> An orchestrator issues the unshare call for an LSM namespace and upon
> return from the system call the calling task is in a new namespace for
> that particular LSM ...

Yes.

> ... the goal of which is presumably to implement a
> security policy/model different than what had been in force
> previously.

Maybe.  That is dependent on the individual LSM, I don't want to
encode any assumptions on this at the LSM framework layer.

> So the process is in a new LSM specific namespace, but still
> implementing the policy from the previous namespace, until the
> orchestrator can load the new policy and then trigger the LSM to
> change from its previous policy to the newly loaded policy.
>
> Is this consistent with your vision as to how all of this will work?

No.  What an individual LSM does upon creation of a new namespace via
lsm_unshare() is entirely up to that LSM.  The LSM may choose to bound
the new namespace by the parent's policy, or it may choose a
non-hierarchical relationship where the new namespace remains entirely
separate from the parent.  The LSM may start the new namespace in an
uninitialized state (similar to early boot), initialized with a
default policy, initialized with the parent's policy, or something
else.

> > > The other unanswered issue, or perhaps we missed it, are the security
> > > controls that should be associated with the unshare call.
>
> > Each LSM is free to implement whatever access controls it deems
> > necessary in its lsm_unshare() callback.
>
> Just to be clear.
>
> When you refer to 'lsm_unshare() callback' are you referring to a new
> LSM security hook to be be implemented that will allow all of the
> active LSM's to pass judgement on whether or not the unshare should be
> allowed to complete successfully?

No.  The lsm_unshare() callback is the individual LSM provided
function that the LSM framework calls when the lsm_unshare() syscall
is invoked.  Put another way, the lsm_unshare() callback is the
function specified by a LSM, using the LSM_HOOK_INIT() macro, that is
called by the lsm_unshare() syscall.

> > > Will there be a new LSM hook that allows other LSM's to veto the
> > > creation of a namespace either for itself or for another LSM?
> >
> > I would expect the lsm_unshare() syscall to operate similarly to the
> > lsm_set_self_attr() syscall in this regard.
>
> The reference to handling this like lsm_set_self_attr() is unclear.
>
> With lsm_set_self_attr() there is no reason for another LSM to deny
> setting what is an LSM specific attribute, as you note above, each LSM
> gets to decide if the request to set an attribute for the LSM should
> be accepted or denied.

No.  LSM "A" gets to decide if LSM "A" can create a new namespace
using the lsm_unshare() syscall, LSM "B" does not get to enforce any
policy on LSM "A"'s decision.

> Since lsm_unshare() is changing the overall platform security state,
> it seems consistent with the design of the LSM for other LSM's to be
> able to veto this action.

No.  This is not consistent with either the design or general
conventions associated with LSM development.

> Once again, this seems like an action that would be consistent with
> the notion of the lockdown LSM,

No.

> > > Should there be an option to completely compile LSM namespaces out of
> > > the kernel?
>
> > That doesn't belong in the LSM framework layer, that is up to the
> > individual LSMs.
>
> You noted above the desire for lsm_unshare to be consistent with other
> namespaces.
>
> The current kernel paradigm is to allow classes of namespace
> resources, ie. CONFIG_UTS_NS, CONFIG_TIME_NS et.al., to be compiled in
> our out of the kernel.
>
> It seems that CONFIG_LSM_NS would be consistent with that model.

CONFIG_UTS_NS does not have multiple radically different
implementations underneath it.  Comparing any of the existing Kconfig
namespace knobs to what we are attempting to do with the LSM framework
is going to be difficult due to some inherent differences between the
two things.

The lsm_unshare() syscall is simply an API abstraction intended to
make it easier for userspace to interact with the individual LSMs;
instead of dealing with multiple different namespacing APIs, one for
each LSM, lsm_unshare() provides a single interface to make app devs'
lives easier.

If a individual LSM wants to provide a Kconfig knob to toggle their
namespace support they are welcome to do so, lsm_unshare() should
exist regardless and return an error code if the desired LSM does not
implement namespace support in the particular kernel build.

> > > > * Implement /proc/pid/ns/lsm and setns(CLONE_NEWLSM)
> > > >
> > > > As discussed previously, this allows us to move a process into an
> > > > existing, established LSM namespace set.  The caller cannot
> > > > selectively choose which individual LSM namespaces they join from the
> > > > given LSM namespace set, they receive the same LSM namespace
> > > > configuration as the target process.
> > >
> > > As an initial aside.  It would be assumed that a positive result of a
> > > setns call would be to cause the calling process to atomically change
> > > its security namespace set.  This would further suggest the need to
> > > have the security namespace creation process also execute atomically
> > > in a multi-LSM namespace change environment.
>
> > In the setns case no new LSM namespaces should be created, the process
> > simply joins an existing set of LSM namespaces.
>
> The issue isn't about new namespaces being created, the issue is
> atomicity of a change to a new set of security policies.
>
> With setns an atomic transition is implemented.
>
> The proposed lsm_unshare() behavior results in a period of time when
> multiple and varying security policies are active, depending on
> various race issues in the orchestrator implementation.
>
> This opens the door to a raft of potential security issues that we can
> have a new acronym for, Time Of Implementation Time Of Use (TOITOU).

I would expect that any LSM implementing namespaces would have
sufficient protections/locking in place to ensure that processes and
namespaces remain in a consistent state outside of the
protected/locked regions.  It is reasonable for one process to attempt
the creation of a new namespace while another attempts to join the
namespace of the process creating the new namespace.  This is not
really a new problem in systems programming, and is one reason why
synchronization mechanisms exist.  Once again, we do not want to force
any particular solution at the LSM framework layer as the
synchonization mechanisms will likely be very LSM dependent.

> > > ... That is the concept of whether or not a setns
> > > call, for any resource namespace, should also force a security
> > > namespace change if the security namespace of the calling process
> > > differs from that of the target process.
>
> > That decision is left to the individual LSMs.
>
> That is reasonable.
>
> In order to support that model, there would seem to be a need to have
> a new LSM call in the setns code that allows LSM's to determine
> whether or not a change in the active security namespace set should be
> forced, correct?

Possibly.  I think we need to see some RFC code to see how this would
look, but I think the LSM implementation inside the setns() syscall
would need to be done in two stages: the first to "prepare" the join
operation where permissions checks are performed (if desired by the
individual LSM) and any operations that could fail are done; the
second stage would be very basic and simply finish the join operation
without any risk of failure.  An individual LSM could fail the join
operation for a variety of reasons in stage 1, causing the entire
setns() operation to fail, but once we progress to stage 2 the
operation should succeed.

At this point I'm not too bothered by how we do this as it is an
implementation detail buried within the setns() implementation and not
really an API issue.  We could create a single LSM hook that is called
within sys_setns(), or we could leverage the existing two-stage
process within sys_setns() and implement the two LSM stages as two LSM
hooks.  The first option would be more complicated from a LSM
perspective, but cleaner from a nsproxy.c perspective (that alone
could make it the more preferable option).  The latter option would
result in cleaner, thinner LSM hooks, but it would likley add
complexity to ns_common and/or nsset.  As I said earlier, this is a
decision that will likely be decided by how the code ends up looking.

> If so, is implementation of this in scope for the lsm_unshare()
> infrastructure?

No.  The lsm_unshare() syscall would only operate on one LSM at a time
so a two stage process isn't needed at the LSM framework layer.  It is
possible that an individual LSM may want to implement a two-stage
transaction in their lsm_unshare() callback, but that is their
decision.

> To close, at the risk of being the devils advocate.
>
> Given that the sentiment is to force almost all of these
> issues/decisions into the individual LSM's, what is the advantage of
> having a common lsm_unshare() system call?

A single uniform API for userspace applications that wish to make use
of LSM namespaces.  Ideally we want to leverage the existing kernel
APIs, e.g. procfs and setns(), but others, e.g. clone(), remain
impractical due to a combination of technical and political reasons
(we've already discussed some of the former, the latter is a rathole
discussion I'm not going to engage in at the moment).

> In the proposed model, a resource orchestrator is going to need to
> have extensive knowledge over the mechanics of all the LSM's that
> implement namespace functionality.

Maybe.  I don't think orchestrators will need to have "extensive"
knowledge of the individual LSMs, although this largely depends on
what you define as "extensive".

I also want to get ahead of this and say that I have absolutely zero
desire to debate this point with you at the moment.  It's an argument
without end and the discussion is unlikely to yield anything specific
enough to be helpful.

> At a very minimum, intrinsic to
> the concept of security namespaces, there will be a need to load a new
> policy or model into the namespace, an action that will be deeply LSM
> specific.

Possibly, as this is once again very LSM dependent.  Some LSMs may not
need a new policy loaded when they create a new namespace.

I will also, once again, point you at the LSM policy loading syscall
ideas.  While on hold, we've already discussed that they should be
namespace aware and potentially have the ability to trigger new LSM
namespace creation.

> At this point, the only common functionality may be the allocation of
> a new LSM namespace 'blob'.

Now you are starting to get it.  The LSM framework exists primarily as
a multiplexing layer hidden beneath an API.  Originally the API was
only for internal kernel users, but recently we started providing a
userspace syscall API.

-- 
paul-moore.com