[RFC] Landlock: mutable domains (and supervisor notification uAPI options)

Sun Feb 15 21:23:52 UTC 2026

On Sun, Feb 15, 2026 at 02:54:08AM +0000, Tingmao Wang wrote:
> Hi,
> 
> Recently I have been continuing work on the previously proposed Landlock
> supervise feature (context below).  While I do have some rough PoCs, and
> I'm aware that sometimes code is better than talk, because of the amount
> of work involved, I would like to get some early feedback on the design
> before continuing.
> 
> Scrappy demo (just 2-3 min screencasts):
> 
> - user-space implemented "permissive mode":
>     https://fileshare.maowtm.org/landlock-20260214/demo.mp4
> - mutable domains based on a reloadable config file:
>     https://fileshare.maowtm.org/landlock-20260213/demo.mp4
> 
> While I would be glad to receive reviews from anyone (and I've added
> people who have replied to the previous RFC in CC), Günther, when you are
> not too busy, can you kindly give this a review?  A lot of this has
> already been discussed with Mickaël, in fact a large part of this design
> was from his suggestions.  I apologize in advance for the length of this
> email - please feel free to respond to any part of it, and whenever you
> have time to.
> 
> PoC code used in the above videos are largely generated, somewhat buggy,
> and unreviewed, but they are available:
> 
> - mutable domains:
>     https://github.com/micromaomao/linux-dev/pull/26/changes
> - supervisor notification:
>     https://github.com/micromaomao/linux-dev/pull/27/changes
> 
> The motivations listed in [1] are still relevant, and to add to that, here
> are some additional examples of things we can do with the supervisor
> feature (all from unprivileged applications):
> 
> - Implementing a version of StemJail [2] which does not rely on bind
>   mounts and LD_PRELOAD (for the notification part, not for access
>   control).  Or in fact, any other uses of LD_PRELOAD for the purpose of
>   finding out what files are accessed.
> 
> - For island [3], some sort of denial logging tied to the context,
>   integrated in the tool itself (rather than through kernel audit) and
>   live config reload.
> 
> - Use in a non-security related context, such as automated build
>   dependency tracking.
> 
> [1]: https://lore.kernel.org/all/cover.1741047969.git.m@maowtm.org/
> [2]: https://github.com/stemjail/stemjail
> [3]: https://github.com/landlock-lsm/island
> 
> 
> Background
> ----------
> 
> A while ago I sent a "Landlock supervise" RFC patch series [1], in which I
> proposed to extend Landlock with additional functionality to support
> "interactive" rule enforcement.  In discussion with Mickaël, we decided to
> split this work into 3 stages:  quiet flag, mutable domains, and finally
> supervisor notification.  Relevant discussions are at [4] and in replies
> to [1].
> 
> The patch for quiet flag [5] has gone through multiple review iterations
> already.  It is useful on its own, but it was also motivated by the
> eventual use in controlling supervisor notification.
> 
> The next stage is to introduce "mutable domains".  The motivation for this
> is two fold:
> 
> 1. This allows the supervisor to allow access to (large) file hierarchies
>    without needing to be woken up again for each access.
> 2. Because we cannot block within security_path_mknod and other
>    directory-modification related hooks [6], the proposal was to return
>    immediately from those hooks after queuing the supervisor notification,
>    then wait in a separate task_work.  This however means that we cannot
>    directly "allow" access (and even if we can, it may introduce TOCTOU
>    problems).  In order to allow access to requested files, the supervisor
>    has to add additional rules to the (now mutable) domain which will
>    allow the required access.
> 
> [1]: https://lore.kernel.org/all/cover.1741047969.git.m@maowtm.org/
> [4]: https://github.com/landlock-lsm/linux/issues/44
> [5]: https://lore.kernel.org/all/cover.1766330134.git.m@maowtm.org/
> [6]: https://lore.kernel.org/all/20250311.Ti7bi9ahshuu@digikod.net/
> 
>
Hello Tingmao,

Thank you for sending this.

I've read the proposal and had some time to gather thoughts on it. I'm
planning to break this feedback into multiple parts.

This first part addresses the intersect flag.

> Proposed changes
> ----------------
> 
> This patchset introduces the concept of "supervisor" and "supervisee"
> rulesets (alternative names for this are "static"/"dynamic",
> "mutable"/"immutable" etc), which are Landlock rulesets that are joined
> together when enforced.  The supervisee ruleset can be thought of as the
> "static" part of a domain, and the supervisor ruleset can be thought of as
> the "dynamic" part.  The two rulesets can have different rules and access
> rights for individual rules, but they internally have the same sets of
> handled access and scope bits.  When an access request is evaluated for
> processes in such domains, the access is allowed if, for each layer,
> either the supervisee or the supervisor ruleset of that domain allows the
> access.
> 
> A Landlock supervisor will first create the supervisor ruleset, which
> internally creates a ref-counted landlock_supervisor which the unmerged
> (and in fact, unmergeable, to prevent accidental misuse) landlock_ruleset
> will point to.  Through a new ioctl, the user can get a supervisee ruleset
> with the attached supervisor (this relationship does not necessarily have
> to be 1-1), which can then be passed to landlock_restrict_self() by a
> child process.  The supervisor can also at any time (before the ioctl,
> before the landlock_restrict_self() call, or after it) modify the
> supervisor ruleset to add or remove (via a new "intersect" flag) rules or
> change access rights, and commit those changes through a flag passed to
> landlock_add_rule() (although maybe this would be better done as an
> ioctl() on the supervisor?), after which the changes start affecting the
> child.
> 
> The supervisee ruleset is immutable, it is basically the current
> landlock_ruleset, and internally we continue to "fold" rules from parents
> into the child's rbtree.  However, since all ancestor supervisor rulesets
> are mutable, we cannot simply fold the supervisor rules from parents into
> its children at enforce time, as it may be removed or changed later at a
> parent layer.  Therefore, if an access is not allowed by any layer's
> supervisee ruleset (which is quick to check thanks to the "folding" of the
> supervisee rules), Landlock will then have to check that the access is
> allowed by the supervisor rulesets of all the denying layers. (The access
> is also denied if any of the denying layers does not have a supervisor
> ruleset, in this case we don't even have to check the other supervisor
> rulesets.)
> 
> To enable removing rules from a ruleset, we also implement the
> LANDLOCK_ADD_RULE_INTERSECT flag for landlock_add_rule().  If this is
> passed, instead of adding rules, the corresponding rule, if it exists, is
> updated to be the intersection of the existing access rights and the
> specified access rights.  If the result is zero, the rule is removed.  For
> API consistency, the LANDLOCK_ADD_RULE_INTERSECT flag will be supported
> for both supervisor and supervisee (i.e. existing) rulesets, but it is
> probably only useful for supervisor rulesets.
> 
> (I'm not very certain about this intersect flag - see below for
> alternative designs)
> 
> Later on, a supervisor notification mechanism can be implemented to allow
> the supervisor to be notified when an access is denied by its supervised
> layer, but this is not in scope for the "mutable domains" feature on its
> own (although it does make it significantly more useful).  This will be
> the step after mutable domains, if we keep with the plan previously
> discussed with Mickaël.
> 
> 
> uAPI example
> ------------
> 
> ```c
> /*
>  * This landlock_ruleset_attr controls the handled/quiet/scope bits for
>  * this layer (internally shared by both the supervisor and supervisee
>  * rulesets).
>  */
> struct landlock_ruleset_attr attr = {
>     .handled_access_fs = ...,
>     /* ... */
> };
> 
> /* supervisor_fd default to CLOEXEC */
> int supervisor_fd = landlock_create_ruleset(
>     &attr, sizeof(attr), LANDLOCK_CREATE_RULESET_SUPERVISOR);
> if (supervisor_fd < 0)
>     perror("landlock_create_ruleset");
> 
> /*
>  * supervisor_fd can then be passed to landlock_add_rule, but it does not
>  * work with landlock_restrict_self.  Not working for restrict_self means
>  * that if a sandboxer accidentally passes the supervisor fd to the child,
>  * it would not work in the same way as the supervisee fd, and therefore
>  * the error is more discoverable.
>  */
>  if (landlock_add_rule(supervisor_fd, ...) < 0)
>     perror("landlock_add_rule");
> 
>  /*
>   * Any changes to the supervisor ruleset must be committed, even before
>   * any child calls landlock_restrict_self().  Without committing, the
>   * supervisor ruleset still behaves as if it is empty.
>   */
>  if (landlock_add_rule(supervisor_fd, ..., ...,
>         LANDLOCK_ADD_RULE_COMMIT_SUPERVISOR) < 0)
>     perror("landlock_add_rule(COMMIT)");
> 
> /* Creates the supervisee ruleset */
> int supervisee_fd = ioctl(supervisor_fd,
>         LANDLOCK_IOCTL_GET_SUPERVISEE_RULESET, /* flags= */ 0);
> if (supervisee_fd < 0)
>     perror("ioctl(LANDLOCK_IOCTL_GET_SUPERVISEE_RULESET)");
> 
> pid_t child = fork();
> if (child == 0) {
>     /* The supervisor should not leak supervisor_fd to any untrusted code. */
>     close(supervisor_fd);
>     if (landlock_restrict_self(supervisee_fd, 0) < 0)
>         perror("landlock_restrict_self");
>     execve(...);
>     perror("execve");
> } else {
>     close(supervisee_fd);
>     /*
>      * Here, the supervisor can add rules via landlock_add_rule(), Or
>      * remove rules via landlock_add_rule() with
>      * LANDLOCK_ADD_RULE_INTERSECT.
>      *
>      * Added rules doesn't come into effect until a final
>      * landlock_add_rule() with commit flag (which may also just add a
>      * dummy rule with access=0):
>      */
>     if (landlock_add_rule(supervisor_fd, ..., ..., LANDLOCK_ADD_RULE_COMMIT_SUPERVISOR) < 0)
>         perror("landlock_add_rule(COMMIT)");
> }
> ```
> 
> 
> Discussion on LANDLOCK_ADD_RULE_INTERSECT
> -----------------------------------------
> 
> This was initially proposed by Mickaël, although now after writing some
> example code against it [7], I'm not 100% sure that it is the most useful
> uAPI.  For a supervisor based on some sort of config file, it already has
> to track which rules are added to know what to remove, and thus I feel
> that it would be easier (both to use and to implement) to have an API that
> simply "replaces" a rule, rather than do a bitwise AND on the access.
> 
Instead of intersection being done at the rule level via
landlock_add_rule, would it be better for intersection to be done at the
ruleset_fd/ruleset level?

So instead of intersecting individual rules, you can intersect entire
rulesets, with the added benefit of being able to intersect handled
accesses as well. (so you could handle an access initially, and not
handle it later).

Intersecting at the ruleset level allows for grouping the intersection rules
together, so you could create an unenforced ruleset for the sole purpose
of intersecting with rulesets, and intersect all the rule(s) at once.

That way, the ruleset fd can be reused for this purpose later with other
supervisees, instead of creating ruleset, intersecting individual rules,
repeat.

I think also the semantics of having a function called
"landlock_add_rule" actually removing accesses (when the intersect flag
is added) is also confusing, because we're not really *add*-ing
anything, we're removing.

ALTERNATIVE #1

Maybe the best way to do it is instead continue treating rulesets as
immutable, but allow composition of them at ruleset creation time.

This would look something like:

Ruleset C = Ruleset A & Ruleset B

Ruleset A and B are never modified, but instead a new Ruleset C is
created that is the intersection of A and B. This could be done in a
variety of ways (LANDLOCK_CREATE_RULESET_INTERSECT? new IOCTL?)

An example API for what this might look like:

  struct landlock_ruleset_attr ruleset_attr = {
          // other fields for handled accesses must be blank.
          .left_fd = existing_fd,
          .right_fd = other_existing_fd,
  };
  int new_ruleset_fd = syscall(SYS_landlock_create_ruleset, &ruleset_attr, 
    sizeof(ruleset_attr), LANDLOCK_CREATE_RULESET_INTERSECT);

And then the resulting ruleset which is the intersection of existing_fd
and other_existing_fd could be returned.

Similarly, we could: 

  int new_ruleset_fd = syscall(SYS_landlock_create_ruleset, &ruleset_attr, 
      sizeof(ruleset_attr), LANDLOCK_CREATE_RULESET_UNION);

Which would be convienent for creating unions of rulesets.

Then instead mutating rulesets, we commit/replace an entirely new ruleset.

ioctl(supervisee_fd, LANDLOCK_IOCTL_COMMIT_RULESET, &new_ruleset_fd);

This has the following benefits:

1. Clearer semantics: "landlock_add_rule" is just for adding rules, not
removing.

2. Intersection of all ruleset attributes, not just individual rule
attributes.

3. Better logical grouping of rules for the purpose of intersection, and
better composition.

It does have drawbacks:

1. Intersecting individual rules requires making an entire ruleset for
that one rule.

2. Users must be responsible for closing the unused/old rulesets that
they might not longer need.

ALTERNATIVE #2

A middle ground is to keep the ruleset mutation via landlock_add_rule,
but have it be done at the ruleset_fd level.

Something like this:

  struct landlock_ruleset_operand intersection = {
    .operand = other_ruleset_fd
  };
  landlock_add_rule(ruleset_fd, LANDLOCK_RULE_INTERSECT_RULESET, &intersection, 0))

I think this is also a valid way to do things, and increases the
reusibility of rulesets.

1. Again, having landlock_add_rule being used to actually remove rules
is confusing.

2. I'm unsure if we can change handled accesses after ruleset creation,
so we might not be able to intersect the handled accesses like we can in
the ALTERNATIVE #1.

> Another alternative is to simply have a "clear all rules in this ruleset"
> flag.  This allows the supervisor to not have to track what is already
> allowed - if it reloads the config file, it can simply clear the ruleset,
> re-add all rules based on the config, then commit it.  Although I worry
> that this might make implementing some other use cases more difficult.

At a minimum, it is cumbersome, and I worry about file descriptors
becoming inaccessible (due to bind mounts / namespace changes in the
supervisor's environment).

Of course they can just hold those file descriptors open for the purposes
of future intersections, but this is annoying and error prone.

> [...]