[PATCH v5 1/1] fs: Allow no_new_privs tasks to call chroot(2)

Tue Mar 30 18:40:27 UTC 2021

On 3/30/2021 11:11 AM, Mickaël Salaün wrote:
> On 30/03/2021 19:19, Casey Schaufler wrote:
>> On 3/30/2021 10:01 AM, Mickaël Salaün wrote:
>>> Hi,
>>>
>>> Is there new comments on this patch? Could we move forward?
>> I don't see that new comments are necessary when I don't see
>> that you've provided compelling counters to some of the old ones.
> Which ones? I don't buy your argument about the beauty of CAP_SYS_CHROOT.

CAP_SYS_CHROOT, namespaces. Bind mounts. The restrictions on
"unprivileged" chroot being sufficiently onerous to make it
unlikely to be usable.

>> It's possible to use minimal privilege with CAP_SYS_CHROOT.
> CAP_SYS_CHROOT can lead to privilege escalation.

Not when used in conjunction with the same set of
restrictions you're requiring for "unprivileged" chroot. 

>> It looks like namespaces provide alternatives for all your
>> use cases.
> I explained in the commit message why it is not the case. In a nutshell,
> namespaces bring complexity which may not be required.

So? I can use a Swiss Army Knife to cut a string even though it
has a corkscrew.

>  When designing a
> secure system, we want to avoid giving access to such complexity to
> untrusted processes (i.e. more complexity leads to more bugs).

If you're *really* designing a secure system you can design it to
use existing mechanisms, like CAP_SYS_CHROOT!

>  An
> unprivileged chroot would enable to give just the minimum feature to
> drop some accesses. Of course it is not enough on its own, but it can be
> combined with existing (and future) security features.

Like NO_NEW_PRIVS, namespaces and capabilities!
You don't need anything new!

>> The constraints required to make this work are quite
>> limiting. Where is the real value add?
> As explain in the commit message, it is useful when hardening
> applications (e.g. network services, browsers, parsers, etc.). We don't
> want an untrusted (or compromised) application to have CAP_SYS_CHROOT
> nor (complex) namespace access.

If you can ensure that an unprivileged application is
always run with NO_NEW_PRIVS you could also ensure that
it runs with only CAP_SYS_CHROOT or in an appropriate
namespace. I believe that it would be easier for your
particular use case. I don't believe that is sufficient.

>>> Regards,
>>>  Mickaël
>>>
>>>
>>> On 16/03/2021 21:36, Mickaël Salaün wrote:
>>>> From: Mickaël Salaün <mic at linux.microsoft.com>
>>>>
>>>> Being able to easily change root directories enables to ease some
>>>> development workflow and can be used as a tool to strengthen
>>>> unprivileged security sandboxes.  chroot(2) is not an access-control
>>>> mechanism per se, but it can be used to limit the absolute view of the
>>>> filesystem, and then limit ways to access data and kernel interfaces
>>>> (e.g. /proc, /sys, /dev, etc.).
>>>>
>>>> Users may not wish to expose namespace complexity to potentially
>>>> malicious processes, or limit their use because of limited resources.
>>>> The chroot feature is much more simple (and limited) than the mount
>>>> namespace, but can still be useful.  As for containers, users of
>>>> chroot(2) should take care of file descriptors or data accessible by
>>>> other means (e.g. current working directory, leaked FDs, passed FDs,
>>>> devices, mount points, etc.).  There is a lot of literature that discuss
>>>> the limitations of chroot, and users of this feature should be aware 
of
>>>> the multiple ways to bypass it.  Using chroot(2) for security purposes
>>>> can make sense if it is combined with other features (e.g. dedicated
>>>> user, seccomp, LSM access-controls, etc.).
>>>>
>>>> One could argue that chroot(2) is useless without a properly populated
>>>> root hierarchy (i.e. without /dev and /proc).  However, there are
>>>> multiple use cases that don't require the chrooting process to create
>>>> file hierarchies with special files nor mount points, e.g.:
>>>> * A process sandboxing itself, once all its libraries are loaded, may
>>>>   not need files other than regular files, or even no file at all.
>>>> * Some pre-populated root hierarchies could be used to chroot into,
>>>>   provided for instance by development environments or tailored
>>>>   distributions.
>>>> * Processes executed in a chroot may not require access to these special
>>>>   files (e.g. with minimal runtimes, or by emulating some special files
>>>>   with a LD_PRELOADed library or seccomp).
>>>>
>>>> Allowing a task to change its own root directory is not a threat to the
>>>> system if we can prevent confused deputy attacks, which could be
>>>> performed through execution of SUID-like binaries.  This can be
>>>> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
>>>> prctl(2).  To only affect this task, its filesystem information must 
not
>>>> be shared with other tasks, which can be achieved by not passing
>>>> CLONE_FS to clone(2).  A similar no_new_privs check is already used by
>>>> seccomp to avoid the same kind of security issues.  Furthermore, because
>>>> of its security use and to avoid giving a new way for attackers to get
>>>> out of a chroot (e.g. using /proc/<pid>/root, or chroot/chdir), an
>>>> unprivileged chroot is only allowed if the calling process is not
>>>> already chrooted.  This limitation is the same as for creating user
>>>> namespaces.
>>>>
>>>> This change may not impact systems relying on other permission models
>>>> than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
>>>> such systems may require to update their security policies.
>>>>
>>>> Only the chroot system call is relaxed with this no_new_privs check; 
the
>>>> init_chroot() helper doesn't require such change.
>>>>
>>>> Allowing unprivileged users to use chroot(2) is one of the initial
>>>> objectives of no_new_privs:
>>>> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
>>>> This patch is a follow-up of a previous one sent by Andy Lutomirski:
>>>> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/
>>>>
>>>> Cc: Al Viro <viro at zeniv.linux.org.uk>
>>>> Cc: Andy Lutomirski <luto at amacapital.net>
>>>> Cc: Christian Brauner <christian.brauner at ubuntu.com>
>>>> Cc: Christoph Hellwig <hch at lst.de>
>>>> Cc: David Howells <dhowells at redhat.com>
>>>> Cc: Dominik Brodowski <linux at dominikbrodowski.net>
>>>> Cc: Eric W. Biederman <ebiederm at xmission.com>
>>>> Cc: James Morris <jmorris at namei.org>
>>>> Cc: Jann Horn <jannh at google.com>
>>>> Cc: John Johansen <john.johansen at canonical.com>
>>>> Cc: Kentaro Takeda <takedakn at nttdata.co.jp>
>>>> Cc: Serge Hallyn <serge at hallyn.com>
>>>> Cc: Tetsuo Handa <penguin-kernel at i-love.sakura.ne.jp>
>>>> Signed-off-by: Mickaël Salaün <mic at linux.microsoft.com>
>>>> Reviewed-by: Kees Cook <keescook at chromium.org>
>>>> Link: https://lore.kernel.org/r/20210316203633.424794-2-mic@digikod.net
>>>> ---
>>>>
>>>> Changes since v4:
>>>> * Use READ_ONCE(current->fs->users) (found by Jann Horn).
>>>> * Remove ambiguous example in commit description.
>>>> * Add Reviewed-by Kees Cook.
>>>>
>>>> Changes since v3:
>>>> * Move the new permission checks to a dedicated helper
>>>>   current_chroot_allowed() to make the code easier to read and align
>>>>   with user_path_at(), path_permission() and security_path_chroot()
>>>>   calls (suggested by Kees Cook).
>>>> * Remove now useless included file.
>>>> * Extend commit description.
>>>> * Rebase on v5.12-rc3 .
>>>>
>>>> Changes since v2:
>>>> * Replace path_is_under() check with current_chrooted() to gain the same
>>>>   protection as create_user_ns() (suggested by Jann Horn). See commit
>>>>   3151527ee007 ("userns:  Don't allow creation if the user is chrooted")
>>>>
>>>> Changes since v1:
>>>> * Replace custom is_path_beneath() with existing path_is_under().
>>>> ---
>>>>  fs/open.c | 23 +++++++++++++++++++++--
>>>>  1 file changed, 21 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/fs/open.c b/fs/open.c
>>>> index e53af13b5835..480010a551b2 100644
>>>> --- a/fs/open.c
>>>> +++ b/fs/open.c
>>>> @@ -532,6 +532,24 @@ SYSCALL_DEFINE1(fchdir, unsigned int, fd)
>>>>  	return error;
>>>>  }
>>>>  
>>>> +static inline int current_chroot_allowed(void)
>>>> +{
>>>> +	/*
>>>> +	 * Changing the root directory for the calling task (and its future
>>>> +	 * children) requires that this task has CAP_SYS_CHROOT in its
>>>> +	 * namespace, or be running with no_new_privs and not sharing its
>>>> +	 * fs_struct and not escaping its current root (cf. create_user_ns()).
>>>> +	 * As for seccomp, checking no_new_privs avoids scenarios where
>>>> +	 * unprivileged tasks can affect the behavior of privileged children.
>>>> +	 */
>>>> +	if (task_no_new_privs(current) && READ_ONCE(current->fs->users) == 
>> 1 &&
>>>> +			!current_chrooted())
>>>> +		return 0;
>>>> +	if (ns_capable(current_user_ns(), CAP_SYS_CHROOT))
>>>> +		return 0;
>>>> +	return -EPERM;
>>>> +}
>>>> +
>>>>  SYSCALL_DEFINE1(chroot, const char __user *, filename)
>>>>  {
>>>>  	struct path path;
>>>> @@ -546,9 +564,10 @@ SYSCALL_DEFINE1(chroot, const char __user *, filename)
>>>>  	if (error)
>>>>  		goto dput_and_out;
>>>>  
>>>> -	error = -EPERM;
>>>> -	if (!ns_capable(current_user_ns(), CAP_SYS_CHROOT))
>>>> +	error = current_chroot_allowed();
>>>> +	if (error)
>>>>  		goto dput_and_out;
>>>> +
>>>>  	error = security_path_chroot(&path);
>>>>  	if (error)
>>>>  		goto dput_and_out;
>>>>