[PATCH v2 1/1] fs: Allow no_new_privs tasks to call chroot(2)

Thu Mar 11 10:37:57 UTC 2021

On 10/03/2021 20:33, Jann Horn wrote:
> On Wed, Mar 10, 2021 at 8:23 PM Eric W. Biederman <ebiederm at xmission.com> wrote:
>>
>> Mickaël Salaün <mic at digikod.net> writes:
>>
>>> From: Mickaël Salaün <mic at linux.microsoft.com>
>>>
>>> Being able to easily change root directories enable to ease some
>>> development workflow and can be used as a tool to strengthen
>>> unprivileged security sandboxes.  chroot(2) is not an access-control
>>> mechanism per se, but it can be used to limit the absolute view of the
>>> filesystem, and then limit ways to access data and kernel interfaces
>>> (e.g. /proc, /sys, /dev, etc.).
>>>
>>> Users may not wish to expose namespace complexity to potentially
>>> malicious processes, or limit their use because of limited resources.
>>> The chroot feature is much more simple (and limited) than the mount
>>> namespace, but can still be useful.  As for containers, users of
>>> chroot(2) should take care of file descriptors or data accessible by
>>> other means (e.g. current working directory, leaked FDs, passed FDs,
>>> devices, mount points, etc.).  There is a lot of literature that discuss
>>> the limitations of chroot, and users of this feature should be aware of
>>> the multiple ways to bypass it.  Using chroot(2) for security purposes
>>> can make sense if it is combined with other features (e.g. dedicated
>>> user, seccomp, LSM access-controls, etc.).
>>>
>>> One could argue that chroot(2) is useless without a properly populated
>>> root hierarchy (i.e. without /dev and /proc).  However, there are
>>> multiple use cases that don't require the chrooting process to create
>>> file hierarchies with special files nor mount points, e.g.:
>>> * A process sandboxing itself, once all its libraries are loaded, may
>>>   not need files other than regular files, or even no file at all.
>>> * Some pre-populated root hierarchies could be used to chroot into,
>>>   provided for instance by development environments or tailored
>>>   distributions.
>>> * Processes executed in a chroot may not require access to these special
>>>   files (e.g. with minimal runtimes, or by emulating some special files
>>>   with a LD_PRELOADed library or seccomp).
>>>
>>> Allowing a task to change its own root directory is not a threat to the
>>> system if we can prevent confused deputy attacks, which could be
>>> performed through execution of SUID-like binaries.  This can be
>>> prevented if the calling task sets PR_SET_NO_NEW_PRIVS on itself with
>>> prctl(2).  To only affect this task, its filesystem information must not
>>> be shared with other tasks, which can be achieved by not passing
>>> CLONE_FS to clone(2).  A similar no_new_privs check is already used by
>>> seccomp to avoid the same kind of security issues.  Furthermore, because
>>> of its security use and to avoid giving a new way for attackers to get
>>> out of a chroot (e.g. using /proc/<pid>/root), an unprivileged chroot is
>>> only allowed if the new root directory is the same or beneath the
>>> current one.  This still allows a process to use a subset of its
>>> legitimate filesystem to chroot into and then further reduce its view of
>>> the filesystem.
>>>
>>> This change may not impact systems relying on other permission models
>>> than POSIX capabilities (e.g. Tomoyo).  Being able to use chroot(2) on
>>> such systems may require to update their security policies.
>>>
>>> Only the chroot system call is relaxed with this no_new_privs check; the
>>> init_chroot() helper doesn't require such change.
>>>
>>> Allowing unprivileged users to use chroot(2) is one of the initial
>>> objectives of no_new_privs:
>>> https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html
>>> This patch is a follow-up of a previous one sent by Andy Lutomirski, but
>>> with less limitations:
>>> https://lore.kernel.org/lkml/0e2f0f54e19bff53a3739ecfddb4ffa9a6dbde4d.1327858005.git.luto@amacapital.net/
> [...]
>> Neither is_path_beneath nor path_is_under really help prevent escapes,
>> as except for open files and files accessible from proc chroot already
>> disallows going up.  The reason is the path is resolved with the current
>> root before switching to it.
> 
> Yeah, this probably should use the same check as the CLONE_NEWUSER
> logic, current_chrooted() from CLONE_NEWUSER; that check is already
> used for guarding against the following syscall sequence, which has
> similar security properties:
> unshare(CLONE_NEWUSER); // gives the current process namespaced CAP_SYS_ADMIN
> chroot("<...>"); // succeeds because of namespaced CAP_SYS_ADMIN
> 
> The current_chrooted() check in create_user_ns() is for the same
> purpose as the check you're introducing here, so they should use the
> same logic.
> 

I don't know how I missed this, but current_chrooted() is definitely the
right approach.