[PATCH v2 0/2] Landlock multithreaded enforcement

Günther Noack gnoack at google.com
Wed Oct 1 11:23:14 UTC 2025


This patch set adds the LANDLOCK_RESTRICT_SELF_TSYNC flag to
landlock_restrict_self().  With this flag, the passed Landlock ruleset
will not only be applied to the calling thread, but to all threads
which belong to the same process.

Motivation
==========

TL;DR: The libpsx/nptl(7) signal hack which we use in user space for
multi-threaded Landlock enforcement is incompatible with Landlock's
signal scoping support.  Landlock can restrict the use of signals
across Landlock domains, but we need signals ourselves in user space
in ways that are not permitted any more under these restrictions.

Enabling Landlock proves to be difficult in processes that are already
multi-threaded at the time of enforcement:

* Enforcement in only one thread is usually a mistake because threads
  do not normally have proper security boundaries between them.

* Also, multithreading is unavoidable in some circumstances, such as
  when using Landlock from a Go program.  Go programs are already
  multithreaded by the time that they enter the "func main()".

So far, the approach in Go[1] was to use libpsx[2].  This library
implements the mechanism described in nptl(7) [3]: It keeps track of
all threads with a linker hack and then makes all threads do the same
syscall by registering a signal handler for them and invoking it.

With commit 54a6e6bbf3be ("landlock: Add signal scoping"), Landlock
gained the ability to restrict the use of signals across different
Landlock domains.

Landlock's signal scoping support is incompatible with the libpsx
approach of enabling Landlock:

(1) With libpsx, although all threads enforce the same ruleset object,
    they technically do the operation separately and end up in
    distinct Landlock domains.  This breaks signaling across threads
    when using LANDLOCK_SCOPE_SIGNAL.

(2) Cross-thread Signals are themselves needed to enforce further
    nested Landlock domains across multiple threads.  So nested
    Landlock policies become impossible there.

In addition to Landlock itself, cross-thread signals are also needed
for other seemingly-harmless API calls like the setuid(2) [4] and for
the use of libcap (co-developed with libpsx), which have the same
problem where the underlying syscall only applies to the calling
thread.

Implementation details
======================

Enforcement prerequisites
-------------------------

Normally, the prerequisite for enforcing a Landlock policy is to
either have CAP_SYS_ADMIN or the no_new_privs flag.  With
LANDLOCK_RESTRICT_SELF_TSYNC, the no_new_privs flag will automatically
be applied for sibling threads if they don't already fulfill these
requirements.

Pseudo-signals
--------------

Landlock domains are stored in struct cred, and a task's struct cred
can only be modified by the task itself [6].

To make that work, we use task_work_add() to register a pseudo-signal
for each of the affected threads.  At signal execution time, these
tasks will coordinate to switch out their Landlock policy in lockstep
with each other, guaranteeing all-or-nothing semantics.

This implementation can be thought of as a kernel-side implementation
of the userspace hack that glibc/NPTL use for setuid(2) [3] [4], and
which libpsx implements for libcap [2].

Finding all sibling threads
---------------------------

In order to avoid grabbing the global task_list_lock, we employ the
scheme proposed by Jann Horn in [7]:

1. Loop through the list of sibling threads
2. Schedule a pseudo-signal for each and make each thread wait in the
   pseudo-signal
3. Go back to 1. and look for more sibling thread that we have not
   seen yet

Do this until no more new threads are found.  As all threads were
waiting in their pseudo-signals, they can not spawn additional threads
and we found them all.

Coordination between tasks
--------------------------

As tasks run their pseudo-signal task work, they coordinate through
the following completions:

 - all_prepared (with counter num_preparing)
 
   When done, all new sibling threads in the inner loop(!) of finding
   new threads are now in their pseudo-signal handlers and have
   prepared the struct cred object to commit (or written an error into
   the shared "preparation_error").

   The lifetime of all_prepared is only the inner loop of finding new
   threads.

 - ready_to_commit

   When done, the outer loop of finding new threads is done and all
   sibling threads have prepared their struct cred object.  Marked
   completed by the calling thread.

 - all_finished

   When done, all sibling threads are done executing their
   pseudo-signal handlers.

Use of credentials API
----------------------

Under normal circumstances, sibling threads share the same struct cred
object.  To avoid unnecessary duplication, if we find that a thread
uses the same struct cred as the calling thread, we side-step the
normal use of the credentials API [6] and place a pointer to that
existing struct cred instead of creating a new one using
prepare_creds() in the sibling thread.

Noteworthy discussion points
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

* We are side-stepping the normal credentials API [6], by re-wiring an
  existing struct cred object instead of calling prepare_creds().

  We can technically avoid it, but it would create unnecessary
  duplicate struct cred objects in multithreaded scenarios.

* I am slightly unhappy with the elaborate memory allocation scheme
  that I built for the task work objects. Ideas are welcome.

Change Log
==========

v2:
 - https://lore.kernel.org/all/20250221184417.27954-2-gnoack3000@gmail.com/
 - Semantics:
   - Threads implicitly set NO_NEW_PRIVS unless they have
     CAP_SYS_ADMIN, to fulfill Landlock policy enforcement
     prerequisites
   - Landlock policy gets unconditionally overridden even if the
     previously established Landlock domains in sibling threads were
     diverging.
 - Restructure discovery of all sibling threads, with the algorithm
   proposed by Jann Horn [7]: Loop through threads multiple times, and
   get them all stuck in the pseudo signal (task work), until no new
   sibling threads show up.
 - Use RCU lock when iterating over sibling threads.
 - Override existing Landlock domains of other threads,
   instead of applying a new Landlock policy on top
 - Directly re-wire the struct cred for sibling threads,
   instread of creating a new one with prepare_creds().
 - Tests:
   - Remove multi_threaded_failure test
     (The only remaining failure case is ENOMEM,
     there is no good way to provoke that in a selftest)
   - Add test for success despite diverging Landlock domains.

[1] https://github.com/landlock-lsm/go-landlock
[2] https://sites.google.com/site/fullycapable/who-ordered-libpsx
[3] https://man.gnoack.org/7/nptl
[4] https://man.gnoack.org/2/setuid#VERSIONS
[5] https://lore.kernel.org/all/20240805-remove-cred-transfer-v2-0-a2aa1d45e6b8@google.com/
[6] https://www.kernel.org/doc/html/latest/security/credentials.html
[7] https://lore.kernel.org/all/CAG48ez0pWg3OTABfCKRk5sWrURM-HdJhQMcWedEppc_z1rrVJw@mail.gmail.com/

Günther Noack (2):
  landlock: Multithreading support for landlock_restrict_self()
  landlock: selftests for LANDLOCK_RESTRICT_SELF_TSYNC

 include/uapi/linux/landlock.h                 |   4 +
 security/landlock/cred.h                      |  12 +
 security/landlock/limits.h                    |   2 +-
 security/landlock/syscalls.c                  | 433 +++++++++++++++++-
 tools/testing/selftests/landlock/base_test.c  |   6 +-
 tools/testing/selftests/landlock/tsync_test.c |  99 ++++
 6 files changed, 550 insertions(+), 6 deletions(-)
 create mode 100644 tools/testing/selftests/landlock/tsync_test.c

-- 
2.51.0.618.g983fd99d29-goog




More information about the Linux-security-module-archive mailing list