[PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

Tue Mar 16 19:04:09 UTC 2021

On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün <mic at digikod.net> wrote:
> One could argue that chroot(2) is useless without a properly populated
> root hierarchy (i.e. without /dev and /proc).  However, there are
> multiple use cases that don't require the chrooting process to create
> file hierarchies with special files nor mount points, e.g.:
> * A process sandboxing itself, once all its libraries are loaded, may
>   not need files other than regular files, or even no file at all.
> * Some pre-populated root hierarchies could be used to chroot into,
>   provided for instance by development environments or tailored
>   distributions.
> * Processes executed in a chroot may not require access to these special
>   files (e.g. with minimal runtimes, or by emulating some special files
>   with a LD_PRELOADed library or seccomp).
>
> Unprivileged chroot is especially interesting for userspace developers
> wishing to harden their applications.  For instance, chroot(2) and Yama
> enable to build a capability-based security (i.e. remove filesystem
> ambient accesses) by calling chroot/chdir with an empty directory and
> accessing data through dedicated file descriptors obtained with
> openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.

I don't entirely understand. Are you writing this with the assumption
that a future change will make it possible to set these RESOLVE flags
process-wide, or something like that?

As long as that doesn't exist, I think that to make this safe, you'd
have to do something like the following - let a child process set up a
new mount namespace for you, and then chroot() into that namespace's
root:

struct shared_data {
  int root_fd;
};
int helper_fn(void *args) {
  struct shared_data *shared = args;
  mount("none", "/tmp", "tmpfs", MS_NOSUID|MS_NODEV, "");
  mkdir("/tmp/old_root", 0700);
  pivot_root("/tmp", "/tmp/old_root");
  umount("/tmp/old_root", "");
  shared->root_fd = open("/", O_PATH);
}
void setup_chroot() {
  struct shared_data shared = {};
  prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0);
  clone(helper_fn, my_stack,
CLONE_VFORK|CLONE_VM|CLONE_FILES|CLONE_NEWUSER|CLONE_NEWNS|SIGCHLD,
NULL);
  fchdir(shared.root_fd);
  chroot(".");
}

[...]
> diff --git a/fs/open.c b/fs/open.c
[...]
> +static inline int current_chroot_allowed(void)
> +{
> +       /*
> +        * Changing the root directory for the calling task (and its future
> +        * children) requires that this task has CAP_SYS_CHROOT in its
> +        * namespace, or be running with no_new_privs and not sharing its
> +        * fs_struct and not escaping its current root (cf. create_user_ns()).
> +        * As for seccomp, checking no_new_privs avoids scenarios where
> +        * unprivileged tasks can affect the behavior of privileged children.
> +        */
> +       if (task_no_new_privs(current) && current->fs->users == 1 &&

this read of current->fs->users should be using READ_ONCE()

> +                       !current_chrooted())
> +               return 0;
> +       if (ns_capable(current_user_ns(), CAP_SYS_CHROOT))
> +               return 0;
> +       return -EPERM;
> +}
[...]

Overall I think this change is a good idea.