[PATCH v5 1/1] fs: Allow no_new_privs tasks to call chroot(2)

Tue Mar 30 17:01:57 UTC 2021

Hi,

Is there new comments on this patch? Could we move forward?

Regards,
 Mickaël

On 16/03/2021 21:36, Mickaël Salaün wrote:
> From: Mickaël Salaün <mic at linux.microsoft.com>
> 
> Being able to easily change root directories enables to ease some
> development workflow and can be used as a tool to strengthen
> unprivileged security sandboxes.  chroot(2) is not an access-control
> mechanism per se, but it can be used to limit the absolute view of the
> filesystem, and then limit ways to access data and kernel interfaces
> (e.g. /proc, /sys, /dev, etc.).
> 
> Users may not wish to expose namespace complexity to potentially
> malicious processes, or limit their use because of limited resources.
> The chroot feature is much more simple (and limited) than the mount
> namespace, but can still be useful.  As for containers, users of
> chroot(2) should take care of file descriptors or data accessible by
> other means (e.g. current working directory, leaked FDs, passed FDs,
> devices, mount points, etc.).  There is a lot of literature that discuss
> the limitations of chroot, and users of this feature should be aware of
> the multiple ways to bypass it.  Using chroot(2) for security purposes
> can make sense if it is combined with other features (e.g. dedicated
> user, seccomp, LSM access-controls, etc.).
> 
> One could argue that chroot(2) is useless without a properly populated
> root hierarchy (i.e. without /dev and /proc).  However, there are
> multiple use cases that don't require the chrooting process to create
> file hierarchies with special files nor mount points, e.g.:
> * A process sandboxing itself, once all its libraries are loaded, may
>   not need files other than regular files, or even no file at all.
> * Some pre-populated root hierarchies could be used to chroot into,
>   provided for instance by development environments or tailored
>   distributions.
> * Processes executed in a chroot may not require access to these special
>   files (e.g. with minimal runtimes, or by emulating some special files
>   with a LD_PRELOADed library or seccomp).
> 
> Allowing a task to change its own root directory is not a threat to the
> system if we can prevent confused deputy attacks, which could be
> performed through execution of SUID-like binaries.  This can be
> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
> prctl(2).  To only affect this task, its filesystem information must not
> be shared with other tasks, which can be achieved by not passing
> CLONE_FS to clone(2).  A similar no_new_privs check is already used by
> seccomp to avoid the same kind of security issues.  Furthermore, because
> of its security use and to avoid giving a new way for attackers to get
> out of a chroot (e.g. using /proc/<pid>/root, or chroot/chdir), an
> unprivileged chroot is only allowed if the calling process is not
> already chrooted.  This limitation is the same as for creating user
> namespaces.
> 
> This change may not impact systems relying on other permission models
> than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
> such systems may require to update their security policies.
> 
> Only the chroot system call is relaxed with this no_new_privs check; the
> init_chroot() helper doesn't require such change.
> 
> Allowing unprivileged users to use chroot(2) is one of the initial
> objectives of no_new_privs:
> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
> This patch is a follow-up of a previous one sent by Andy Lutomirski:
> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/
> 
> Cc: Al Viro <viro at zeniv.linux.org.uk>
> Cc: Andy Lutomirski <luto at amacapital.net>
> Cc: Christian Brauner <christian.brauner at ubuntu.com>
> Cc: Christoph Hellwig <hch at lst.de>
> Cc: David Howells <dhowells at redhat.com>
> Cc: Dominik Brodowski <linux at dominikbrodowski.net>
> Cc: Eric W. Biederman <ebiederm at xmission.com>
> Cc: James Morris <jmorris at namei.org>
> Cc: Jann Horn <jannh at google.com>
> Cc: John Johansen <john.johansen at canonical.com>
> Cc: Kentaro Takeda <takedakn at nttdata.co.jp>
> Cc: Serge Hallyn <serge at hallyn.com>
> Cc: Tetsuo Handa <penguin-kernel at i-love.sakura.ne.jp>
> Signed-off-by: Mickaël Salaün <mic at linux.microsoft.com>
> Reviewed-by: Kees Cook <keescook at chromium.org>
> Link: https://lore.kernel.org/r/20210316203633.424794-2-mic@digikod.net
> ---
> 
> Changes since v4:
> * Use READ_ONCE(current->fs->users) (found by Jann Horn).
> * Remove ambiguous example in commit description.
> * Add Reviewed-by Kees Cook.
> 
> Changes since v3:
> * Move the new permission checks to a dedicated helper
>   current_chroot_allowed() to make the code easier to read and align
>   with user_path_at(), path_permission() and security_path_chroot()
>   calls (suggested by Kees Cook).
> * Remove now useless included file.
> * Extend commit description.
> * Rebase on v5.12-rc3 .
> 
> Changes since v2:
> * Replace path_is_under() check with current_chrooted() to gain the same
>   protection as create_user_ns() (suggested by Jann Horn). See commit
>   3151527ee007 ("userns:  Don't allow creation if the user is chrooted")
> 
> Changes since v1:
> * Replace custom is_path_beneath() with existing path_is_under().
> ---
>  fs/open.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/open.c b/fs/open.c
> index e53af13b5835..480010a551b2 100644
> --- a/fs/open.c
> +++ b/fs/open.c
> @@ -532,6 +532,24 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
>  	return error;
>  }
>  
> +static inline int current_chroot_allowed(void)
> +{
> +	/*
> +	 * Changing the root directory for the calling task (and its future
> +	 * children) requires that this task has CAP_SYS_CHROOT in its
> +	 * namespace, or be running with no_new_privs and not sharing its
> +	 * fs_struct and not escaping its current root (cf. create_user_ns()).
> +	 * As for seccomp, checking no_new_privs avoids scenarios where
> +	 * unprivileged tasks can affect the behavior of privileged children.
> +	 */
> +	if (task_no_new_privs(current) && READ_ONCE(current->fs->users) == 1 &&
> +			!current_chrooted())
> +		return 0;
> +	if (ns_capable(current_user_ns(), CAP_SYS_CHROOT))
> +		return 0;
> +	return -EPERM;
> +}
> +
>  SYSCALL_DEFINE1(chroot, const char __user *, filename)
>  {
>  	struct path path;
> @@ -546,9 +564,10 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
>  	if (error)
>  		goto dput_and_out;
>  
> -	error = -EPERM;
> -	if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
> +	error = current_chroot_allowed();
> +	if (error)
>  		goto dput_and_out;
> +
>  	error = security_path_chroot(&path);
>  	if (error)
>  		goto dput_and_out;
>