[PATCH v5 1/1] fs: Allow no_new_privs tasks to call chroot(2)

Tue Mar 30 17:19:12 UTC 2021

On 3/30/2021 10:01 AM, Mickaël Salaün wrote:
> Hi,
>
> Is there new comments on this patch? Could we move forward?

I don't see that new comments are necessary when I don't see
that you've provided compelling counters to some of the old ones.
It's possible to use minimal privilege with CAP_SYS_CHROOT.
It looks like namespaces provide alternatives for all your
use cases. The constraints required to make this work are quite
limiting. Where is the real value add?

>
> Regards,
>  Mickaël
>
>
> On 16/03/2021 21:36, Mickaël Salaün wrote:
>> From: Mickaël Salaün <mic at linux.microsoft.com>
>>
>> Being able to easily change root directories enables to ease some
>> development workflow and can be used as a tool to strengthen
>> unprivileged security sandboxes.  chroot(2) is not an access-control
>> mechanism per se, but it can be used to limit the absolute view of the
>> filesystem, and then limit ways to access data and kernel interfaces
>> (e.g. /proc, /sys, /dev, etc.).
>>
>> Users may not wish to expose namespace complexity to potentially
>> malicious processes, or limit their use because of limited resources.
>> The chroot feature is much more simple (and limited) than the mount
>> namespace, but can still be useful.  As for containers, users of
>> chroot(2) should take care of file descriptors or data accessible by
>> other means (e.g. current working directory, leaked FDs, passed FDs,
>> devices, mount points, etc.).  There is a lot of literature that discuss
>> the limitations of chroot, and users of this feature should be aware of
>> the multiple ways to bypass it.  Using chroot(2) for security purposes
>> can make sense if it is combined with other features (e.g. dedicated
>> user, seccomp, LSM access-controls, etc.).
>>
>> One could argue that chroot(2) is useless without a properly populated
>> root hierarchy (i.e. without /dev and /proc).  However, there are
>> multiple use cases that don't require the chrooting process to create
>> file hierarchies with special files nor mount points, e.g.:
>> * A process sandboxing itself, once all its libraries are loaded, may
>>   not need files other than regular files, or even no file at all.
>> * Some pre-populated root hierarchies could be used to chroot into,
>>   provided for instance by development environments or tailored
>>   distributions.
>> * Processes executed in a chroot may not require access to these special
>>   files (e.g. with minimal runtimes, or by emulating some special files
>>   with a LD_PRELOADed library or seccomp).
>>
>> Allowing a task to change its own root directory is not a threat to the
>> system if we can prevent confused deputy attacks, which could be
>> performed through execution of SUID-like binaries.  This can be
>> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
>> prctl(2).  To only affect this task, its filesystem information must not
>> be shared with other tasks, which can be achieved by not passing
>> CLONE_FS to clone(2).  A similar no_new_privs check is already used by
>> seccomp to avoid the same kind of security issues.  Furthermore, because
>> of its security use and to avoid giving a new way for attackers to get
>> out of a chroot (e.g. using /proc/<pid>/root, or chroot/chdir), an
>> unprivileged chroot is only allowed if the calling process is not
>> already chrooted.  This limitation is the same as for creating user
>> namespaces.
>>
>> This change may not impact systems relying on other permission models
>> than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
>> such systems may require to update their security policies.
>>
>> Only the chroot system call is relaxed with this no_new_privs check; the
>> init_chroot() helper doesn't require such change.
>>
>> Allowing unprivileged users to use chroot(2) is one of the initial
>> objectives of no_new_privs:
>> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
>> This patch is a follow-up of a previous one sent by Andy Lutomirski:
>> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/
>>
>> Cc: Al Viro <viro at zeniv.linux.org.uk>
>> Cc: Andy Lutomirski <luto at amacapital.net>
>> Cc: Christian Brauner <christian.brauner at ubuntu.com>
>> Cc: Christoph Hellwig <hch at lst.de>
>> Cc: David Howells <dhowells at redhat.com>
>> Cc: Dominik Brodowski <linux at dominikbrodowski.net>
>> Cc: Eric W. Biederman <ebiederm at xmission.com>
>> Cc: James Morris <jmorris at namei.org>
>> Cc: Jann Horn <jannh at google.com>
>> Cc: John Johansen <john.johansen at canonical.com>
>> Cc: Kentaro Takeda <takedakn at nttdata.co.jp>
>> Cc: Serge Hallyn <serge at hallyn.com>
>> Cc: Tetsuo Handa <penguin-kernel at i-love.sakura.ne.jp>
>> Signed-off-by: Mickaël Salaün <mic at linux.microsoft.com>
>> Reviewed-by: Kees Cook <keescook at chromium.org>
>> Link: https://lore.kernel.org/r/20210316203633.424794-2-mic@digikod.net
>> ---
>>
>> Changes since v4:
>> * Use READ_ONCE(current->fs->users) (found by Jann Horn).
>> * Remove ambiguous example in commit description.
>> * Add Reviewed-by Kees Cook.
>>
>> Changes since v3:
>> * Move the new permission checks to a dedicated helper
>>   current_chroot_allowed() to make the code easier to read and align
>>   with user_path_at(), path_permission() and security_path_chroot()
>>   calls (suggested by Kees Cook).
>> * Remove now useless included file.
>> * Extend commit description.
>> * Rebase on v5.12-rc3 .
>>
>> Changes since v2:
>> * Replace path_is_under() check with current_chrooted() to gain the same
>>   protection as create_user_ns() (suggested by Jann Horn). See commit
>>   3151527ee007 ("userns:  Don't allow creation if the user is chrooted")
>>
>> Changes since v1:
>> * Replace custom is_path_beneath() with existing path_is_under().
>> ---
>>  fs/open.c | 23 +++++++++++++++++++++--
>>  1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/open.c b/fs/open.c
>> index e53af13b5835..480010a551b2 100644
>> --- a/fs/open.c
>> +++ b/fs/open.c
>> @@ -532,6 +532,24 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
>>  	return error;
>>  }
>>  
>> +static inline int current_chroot_allowed(void)
>> +{
>> +	/*
>> +	 * Changing the root directory for the calling task (and its future
>> +	 * children) requires that this task has CAP_SYS_CHROOT in its
>> +	 * namespace, or be running with no_new_privs and not sharing its
>> +	 * fs_struct and not escaping its current root (cf. create_user_ns()).
>> +	 * As for seccomp, checking no_new_privs avoids scenarios where
>> +	 * unprivileged tasks can affect the behavior of privileged children.
>> +	 */
>> +	if (task_no_new_privs(current) && READ_ONCE(current->fs->users) == 
1 &&
>> +			!current_chrooted())
>> +		return 0;
>> +	if (ns_capable(current_user_ns(), CAP_SYS_CHROOT))
>> +		return 0;
>> +	return -EPERM;
>> +}
>> +
>>  SYSCALL_DEFINE1(chroot, const char __user *, filename)
>>  {
>>  	struct path path;
>> @@ -546,9 +564,10 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
>>  	if (error)
>>  		goto dput_and_out;
>>  
>> -	error = -EPERM;
>> -	if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
>> +	error = current_chroot_allowed();
>> +	if (error)
>>  		goto dput_and_out;
>> +
>>  	error = security_path_chroot(&path);
>>  	if (error)
>>  		goto dput_and_out;
>>