[PATCH] LSM: add SafeSetID module that gates setid calls

Fri Nov 9 00:30:13 UTC 2018

On Thu, Nov 8, 2018 at 1:34 PM Casey Schaufler <casey at schaufler-ca.com> wrote:
>
> On 11/8/2018 12:53 PM, Micah Morton wrote:
> > It seems like the CAP_SETUID_RANGE idea proposed by Serge is mainly a
> > way to avoid silently breaking programs that run with CAP_SETUID,
> > which could cause security vulnerabilities.
>
> It wouldn't be "silent". You'd have to fix anything that needs
> the new capability instead of the old.
>
> > Serge, does Casey's
> > suggestion (killing processes that try to perform unapproved
> > transitions) make sense to you as an alternate way to safeguard
> > against this? Sure there could be regressions, but they would only
> > happen to users for which whitelist policies had been configured, and
> > killing processes should be an effective way of identifying any
> > missing whitelist policies on the system for some restricted user.
>
> I'll point out that seccomp uses the "kill what you deny" approach.
> It would be one thing if it was reasonable to assume that programs
> are checking their error returns, but we know they're not.

Totally agree

>
> > One
> > less attractive thing about adding a CAP_SETUID_RANGE capability would
> > be that more of the common kernel code would have to be modified to
> > add a new capability, whereas currently the LSM just uses the LSM
> > hooks.
>
> That's got to be a secondary consideration. If CAP_SETUID_RANGE
> were the right solution the additional work would be worth doing.
>
> > The other unresolved aspect here (discussed by Stephen above) is how
> > to "know" that ns_capable(..., CAP_SETUID) was called by sys_set*uid
> > and not some other kernel code path.
>
> It sounds like you want a new LSM hook (security_uid_change?) to
> put in that/those code path/paths. That would be a lot cleaner than
> trying to infer the syscall.

Providing a new LSM hook would be tricky, because there are places
(besides sys_set*id) that call ns_capable(..., CAP_SETUID) which we
would want to fail if there were whitelist restrictions for the
current uid: https://elixir.bootlin.com/linux/latest/ident/CAP_SETUID.
Even if we changed all non-sys_set*id code paths to call some hook
that will deny them if the current uid is restricted, I'm not sure
there would be any way to ensure that code paths added in the future
would call the hook to restrict themselves in the event that the user
is restricted. I guess our security_capable hook could always deny the
CAP_SETUID check for restricted users but then change the lines in
kernel/sys.c (e.g.
https://elixir.bootlin.com/linux/latest/source/kernel/sys.c#L584) to
do something like 'if (ns_capable(old->user_ns, CAP_SETUID) ||
security_uid_change(old->uid, new->uid))'. But that would be weird
since security_uid_change would have to redo all the checks that
ns_capable did (to basically say _yes_ if the current user has
CAP_SETUID _and_ the transition is allowed in the whitelist).

I might be missing something, but Stephen's idea seems to make more
sense to me. We already pass the
SECURITY_CAP_AUDIT/SECURITY_CAP_NOAUDIT flags to the security_capable
hook to let it know whether to write an audit message for the check.
Seems like it would be easy to extend this to define more bits (e.g.
SECURITY_SETID_TRANSITION) in that flag or have a more general
mechanism where we pass some kind of struct to security_capable that
contains these flags and possibly other args like Stephen was
mentioning. I'll share a patch for that approach, assuming I don't
realize there is some reason that actually wouldn't be a good idea.

>
> > The reason we need to know this
> > is to be able to distinguish id transitions from other privileged
> > actions (e.g. create/modify/enter user namespace). Certain transitions
> > should be allowed for whitelisted users, but the other privileged
> > actions should be denied (or else the security hardening provided by
> > this LSM is significantly weakened). Do people think the current
> > reliance on comparing the return value of syscall_get_nr() to
> > arch-specific syscall constants (e.g. __NR_setuid) is a deal-breaker
> > and we should find an arch-independent solution such as the one
> > proposed by Stephen?
>
> The problem with inferring the syscall is that the mechanism for
> doing so is subject to change and architectural magic.
>
> > Or is checking against arch-specific constants
> > not a big deal and the code can stay as is?
>
> It's bad enough when we have to make LSM code check things
> like what filesystem an inode relates to. I would say that
> you really want a different approach.
>
> > On Fri, Nov 2, 2018 at 12:22 PM Serge E. Hallyn <serge at hallyn.com> wrote:
> >> Quoting Casey Schaufler (casey at schaufler-ca.com):
> >>> On 11/2/2018 11:30 AM, Serge E. Hallyn wrote:
> >>>> Quoting Casey Schaufler (casey at schaufler-ca.com):
> >>>>
> >>>>> Let me suggest a change to the way your LSM works
> >>>>> that would reduce my concerns. Rather than refusing to
> >>>>> make a UID change that isn't on your whitelist, kill a
> >>>>> process that makes a prohibited request. This mitigates
> >>>>> the problem where a process doesn't check for an error
> >>>>> return. Sure, your system will be harder to get running
> >>>>> until your whitelist is complete, but you'll avoid a
> >>>>> whole category of security bugs.
> >>>> Might also consider not restricting CAP_SETUID, but instead adding a
> >>>> new CAP_SETUID_RANGE capability.  That way you can be sure there will be
> >>>> no regressions with any programs which run with CAP_SETUID.
> >>>>
> >>>> Though that violates what Casey was just arguing halfway up the email.
> >>> I know that it's hard to believe 20 years after the fact,
> >>> but the POSIX group worked very hard to ensure that the granularity
> >>> of capabilities was correct for the security policy that the
> >>> interfaces defined in P1003.1. What would CAP_SETUID_RANGE mean?
> >> CAP_SETUID would mean you can switch to any uid.
> >>
> >> CAP_SETUID_RANGE would mean you could make the transitions which have
> >> been defined through <handwave> some mechanism.  Be it prctl, or some
> >> new security.uidrange xattr, or the mechanism Micah proposed, except
> >> it only applies for CAP_SETUID_RANGE not CAP_SETUID.
> >>
> >> -serge
>