[PATCH v2 1/1] fs: Allow no_new_privs tasks to call chroot(2)

Wed Mar 10 19:33:12 UTC 2021

On Wed, Mar 10, 2021 at 8:23 PM Eric W. Biederman <ebiederm at xmission.com> wrote:
>
> Mickaël Salaün <mic at digikod.net> writes:
>
> > From: Mickaël Salaün <mic at linux.microsoft.com>
> >
> > Being able to easily change root directories enable to ease some
> > development workflow and can be used as a tool to strengthen
> > unprivileged security sandboxes.  chroot(2) is not an access-control
> > mechanism per se, but it can be used to limit the absolute view of the
> > filesystem, and then limit ways to access data and kernel interfaces
> > (e.g. /proc, /sys, /dev, etc.).
> >
> > Users may not wish to expose namespace complexity to potentially
> > malicious processes, or limit their use because of limited resources.
> > The chroot feature is much more simple (and limited) than the mount
> > namespace, but can still be useful.  As for containers, users of
> > chroot(2) should take care of file descriptors or data accessible by
> > other means (e.g. current working directory, leaked FDs, passed FDs,
> > devices, mount points, etc.).  There is a lot of literature that discuss
> > the limitations of chroot, and users of this feature should be aware of
> > the multiple ways to bypass it.  Using chroot(2) for security purposes
> > can make sense if it is combined with other features (e.g. dedicated
> > user, seccomp, LSM access-controls, etc.).
> >
> > One could argue that chroot(2) is useless without a properly populated
> > root hierarchy (i.e. without /dev and /proc).  However, there are
> > multiple use cases that don't require the chrooting process to create
> > file hierarchies with special files nor mount points, e.g.:
> > * A process sandboxing itself, once all its libraries are loaded, may
> >   not need files other than regular files, or even no file at all.
> > * Some pre-populated root hierarchies could be used to chroot into,
> >   provided for instance by development environments or tailored
> >   distributions.
> > * Processes executed in a chroot may not require access to these special
> >   files (e.g. with minimal runtimes, or by emulating some special files
> >   with a LD_PRELOADed library or seccomp).
> >
> > Allowing a task to change its own root directory is not a threat to the
> > system if we can prevent confused deputy attacks, which could be
> > performed through execution of SUID-like binaries.  This can be
> > prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
> > prctl(2).  To only affect this task, its filesystem information must not
> > be shared with other tasks, which can be achieved by not passing
> > CLONE_FS to clone(2).  A similar no_new_privs check is already used by
> > seccomp to avoid the same kind of security issues.  Furthermore, because
> > of its security use and to avoid giving a new way for attackers to get
> > out of a chroot (e.g. using /proc/<pid>/root), an unprivileged chroot is
> > only allowed if the new root directory is the same or beneath the
> > current one.  This still allows a process to use a subset of its
> > legitimate filesystem to chroot into and then further reduce its view of
> > the filesystem.
> >
> > This change may not impact systems relying on other permission models
> > than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
> > such systems may require to update their security policies.
> >
> > Only the chroot system call is relaxed with this no_new_privs check; the
> > init_chroot() helper doesn't require such change.
> >
> > Allowing unprivileged users to use chroot(2) is one of the initial
> > objectives of no_new_privs:
> > https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
> > This patch is a follow-up of a previous one sent by Andy Lutomirski, but
> > with less limitations:
> > https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/
[...]
> Neither is_path_beneath nor path_is_under really help prevent escapes,
> as except for open files and files accessible from proc chroot already
> disallows going up.  The reason is the path is resolved with the current
> root before switching to it.

Yeah, this probably should use the same check as the CLONE_NEWUSER
logic, current_chrooted() from CLONE_NEWUSER; that check is already
used for guarding against the following syscall sequence, which has
similar security properties:
unshare(CLONE_NEWUSER); // gives the current process namespaced CAP_SYS_ADMIN
chroot("<...>"); // succeeds because of namespaced CAP_SYS_ADMIN

The current_chrooted() check in create_user_ns() is for the same
purpose as the check you're introducing here, so they should use the
same logic.