[PATCH] security/commoncap: don't assume "setid" if all ids are identical

Wed May 7 06:33:25 UTC 2025

On Wed, May 7, 2025 at 5:16 AM Andrew G. Morgan <morgan at kernel.org> wrote:
> If a setuid program execs itself, does the presence of this code undo
> any protection the kernel afforded it on its first invocation?

What protection do you mean, and what behavior do you expect when
setid execs itself? I see this affects:

1. reset effective ids to real ids (only affects NO_NEW_PRIVS)
2. new cap_permitted cannot be higher than old cap_permitted
3. clear cap_ambient
4. clear pdeath_signal (in begin_new_exec)
5. reset stack limits (in begin_new_exec)

About these (from my very limited knowledge of this part of the kernel):

1. is my primary goal, and really no new privs gained by allowing the
process to keep existing ids
2. only ever changes anything if new cap_permitted is higher, but if
that's the case, the is_setid check is irrelevant because __cap_gained
is true, therefore no change with my patch
3. as I already described, the kernel is wrong (or the documentation
is wrong), and my patch adjusts kernel to behave as documented
4. I don't see how this is dangerous for anything regarding re-exec;
if pdeath_signal wasn't reset on the first exec, it's safe to keep it
after the re-exec, too
5. same as 4, I think

Did I miss anything?

> FWIW I ran the libcap quicktest.sh script against your change and it
> doesn't break any capability thing I test for when making libcap
> releases.

Thanks for taking the time to run these tests. I'm glad the existing
tests didn't find any obvious bugs. If we identify an actual problem
with my patch, let's write a new test that fails with my patch!

> That being said, in general this whole zoo of *[ug]id values
> and their subtle behavior is not casually obvious. I'd recommend using
> a userspace workaround for your use case, and not changing the legacy
> behavior of the kernel.

What userspace workaround would be possible? The only thing that comes
to my mind is to apply NO_NEW_PRIVS in the child process after exec -
but that's too late, because arbitrary untrusted code has already been
executed at this point. This means I lose NO_NEW_PRIVS completely,
because the attacker can simply skip the call.

Max