[PATCH v4 1/1] fs: Allow no_new_privs tasks to call chroot(2)

Tue Mar 16 19:25:38 UTC 2021

On 16/03/2021 20:24, Kees Cook wrote:
> On Tue, Mar 16, 2021 at 08:04:09PM +0100, Jann Horn wrote:
>> On Tue, Mar 16, 2021 at 6:02 PM Mickaël Salaün <mic at digikod.net> wrote:
>>> One could argue that chroot(2) is useless without a properly populated
>>> root hierarchy (i.e. without /dev and /proc).  However, there are
>>> multiple use cases that don't require the chrooting process to create
>>> file hierarchies with special files nor mount points, e.g.:
>>> * A process sandboxing itself, once all its libraries are loaded, may
>>>   not need files other than regular files, or even no file at all.
>>> * Some pre-populated root hierarchies could be used to chroot into,
>>>   provided for instance by development environments or tailored
>>>   distributions.
>>> * Processes executed in a chroot may not require access to these special
>>>   files (e.g. with minimal runtimes, or by emulating some special files
>>>   with a LD_PRELOADed library or seccomp).
>>>
>>> Unprivileged chroot is especially interesting for userspace developers
>>> wishing to harden their applications.  For instance, chroot(2) and Yama
>>> enable to build a capability-based security (i.e. remove filesystem
>>> ambient accesses) by calling chroot/chdir with an empty directory and
>>> accessing data through dedicated file descriptors obtained with
>>> openat2(2) and RESOLVE_BENEATH/RESOLVE_IN_ROOT/RESOLVE_NO_MAGICLINKS.
>>
>> I don't entirely understand. Are you writing this with the assumption
>> that a future change will make it possible to set these RESOLVE flags
>> process-wide, or something like that?
> 
> I thought it meant "open all out-of-chroot dirs as fds using RESOLVE_...
> flags then chroot". As in, there's no way to then escape "up" for the
> old opens, and the new opens stay in the chroot.

Yes, that was the idea.

> 
>> [...]
>>> diff --git a/fs/open.c b/fs/open.c
>> [...]
>>> +static inline int current_chroot_allowed(void)
>>> +{
>>> +       /*
>>> +        * Changing the root directory for the calling task (and its future
>>> +        * children) requires that this task has CAP_SYS_CHROOT in its
>>> +        * namespace, or be running with no_new_privs and not sharing its
>>> +        * fs_struct and not escaping its current root (cf. create_user_ns()).
>>> +        * As for seccomp, checking no_new_privs avoids scenarios where
>>> +        * unprivileged tasks can affect the behavior of privileged children.
>>> +        */
>>> +       if (task_no_new_privs(current) && current->fs->users == 1 &&
>>
>> this read of current->fs->users should be using READ_ONCE()
> 
> Ah yeah, good call. I should remember this when I think "can this race?"
> :P
>