[PATCH 1/2] fs/exec: Explicitly unshare fs_struct on exec

Tue May 13 13:05:45 UTC 2025

On Thu, Oct 06, 2022 at 08:25:01AM -0700, Kees Cook wrote:
> On October 6, 2022 7:13:37 AM PDT, Jann Horn <jannh at google.com> wrote:
> >On Thu, Oct 6, 2022 at 11:05 AM Christian Brauner <brauner at kernel.org> wrote:
> >> On Thu, Oct 06, 2022 at 01:27:34AM -0700, Kees Cook wrote:
> >> > The check_unsafe_exec() counting of n_fs would not add up under a heavily
> >> > threaded process trying to perform a suid exec, causing the suid portion
> >> > to fail. This counting error appears to be unneeded, but to catch any
> >> > possible conditions, explicitly unshare fs_struct on exec, if it ends up
> >>
> >> Isn't this a potential uapi break? Afaict, before this change a call to
> >> clone{3}(CLONE_FS) followed by an exec in the child would have the
> >> parent and child share fs information. So if the child e.g., changes the
> >> working directory post exec it would also affect the parent. But after
> >> this change here this would no longer be true. So a child changing a
> >> workding directoro would not affect the parent anymore. IOW, an exec is
> >> accompanied by an unshare(CLONE_FS). Might still be worth trying ofc but
> >> it seems like a non-trivial uapi change but there might be few users
> >> that do clone{3}(CLONE_FS) followed by an exec.
> >
> >I believe the following code in Chromium explicitly relies on this
> >behavior, but I'm not sure whether this code is in active use anymore:
> >
> >https://source.chromium.org/chromium/chromium/src/+/main:sandbox/linux/suid/sandbox.c;l=101?q=CLONE_FS&sq=&ss=chromium
> 
> Oh yes. I think I had tried to forget this existed. Ugh. Okay, so back to the drawing board, I guess. The counting will need to be fixed...
> 
> It's possible we can move the counting after dethread -- it seems the early count was just to avoid setting flags after the point of no return, but it's not an error condition...
> 

I landed here from git blame.

I was looking at sanitizing shared fs vs suid handling, but the entire
ordeal is so convoluted I'm confident the best way forward is to whack
the problem to begin with.

Per the above link, the notion of a shared fs struct across different
processes is depended on so merely unsharing is a no-go.

However, the shared state is only a problem for suid/sgid.

Here is my proposal: *deny* exec of suid/sgid binaries if fs_struct is
shared. This will have to be checked for after the execing proc becomes
single-threaded ofc.

While technically speaking this does introduce a change in behavior,
there is precedent for doing it and seeing if anyone yells.

With this in place there is no point maintainig ->in_exec or checking
the flag.

There is the known example of depending on shared fs_struct across exec.
Hopefully there is no example of depending on execing a suid/sgid binary
in such a setting -- it would be quite a weird setup given that for
security reasons the perms must not be changed.

The upshot of this method is that any breakage will be immediately
visible in the form of a failed exec.

Another route would be to do the mandatory unshare but only for
suid/sgid, except that would have a hidden failure (if you will).

Comments?