Possible mistake in commit 3ca459eaba1b ("tun: fix group permission check")

Tue Jan 28 22:59:32 UTC 2025

stsp wrote:
> 28.01.2025 18:04, Willem de Bruijn пишет:
> > Ondrej Mosnacek wrote:
> >> On Mon, Jan 27, 2025 at 3:50 PM Willem de Bruijn
> >> <willemdebruijn.kernel at gmail.com> wrote:
> >>> stsp wrote:
> >>>> 27.01.2025 12:10, Ondrej Mosnacek пишет:
> >>>>> Hello,
> >>>>>
> >>>>> It looks like the commit in $SUBJ may have introduced an unintended
> >>>>> change in behavior. According to the commit message, the intent was to
> >>>>> require just one of {user, group} to match instead of both, which
> >>>>> sounds reasonable, but the commit also changes the behavior for when
> >>>>> neither of tun->owner and tun->group is set. Before the commit the
> >>>>> access was always allowed, while after the commit CAP_NET_ADMIN is
> >>>>> required in this case.
> >>>>>
> >>>>> I'm asking because the tun_tap subtest of selinux-testuite [1] started
> >>>>> to fail after this commit (it assumed CAP_NET_ADMIN was not needed),
> >>>>> so I'm trying to figure out if we need to change the test or if it
> >>>>> needs to be fixed in the kernel.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> [1] https://github.com/SELinuxProject/selinux-testsuite/
> >>>>>
> >>>> Hi, IMHO having the persistent
> >>>> TAP device inaccessible by anyone
> >>>> but the CAP_NET_ADMIN is rather
> >>>> useless, so the compatibility should
> >>>> be restored on the kernel side.
> >>>> I'd raise the questions about adding
> >>>> the CAP_NET_ADMIN checks into
> >>>> TUNSETOWNER and/or TUNSETPERSIST,
> >>>> but this particular change to TUNSETIFF,
> >>>> at least on my side, was unintentional.
> >>>>
> >>>> Sorry about that. :(
> >>> Thanks for the report Ondrej.
> >>>
> >>> Agreed that we need to reinstate this. I suggest this explicit
> >>> extra branch after the more likely cases:
> >>>
> >>>          @@ -585,6 +585,9 @@ static inline bool tun_capable(struct tun_struct *tun)
> >>>                          return 1;
> >>>                  if (gid_valid(tun->group) && in_egroup_p(tun->group))
> >>>                          return 1;
> >>>          +       if (!uid_valid(tun->owner) && !gid_valid(tun->group))
> >>>          +               return 1;
> >>>          +
> >>>                  return 0;
> >>>           }
> >> That could work, but the semantics become a bit weird, actually: When
> >> you set both uid and gid, one of them needs to match. If you unset
> >> uid/gid, you get a stricter condition (gid/uid must match).
> > I don't follow this point.
> >
> > Judging from the history, the intent was that
> >
> > - if user is set, then it must match.
> > - if group is set, then it must match.
> >
> > And I think the group constraint was added with the idea that no one
> > would try to use both constraints at the same time.
> >
> > The referenced patch intended to (only) relax the condition when both
> > are set after all.
> >
> >> And if you
> >> then also unset the other one, you suddenly get a less strict
> >> condition than the first two - nothing has to match. Might be
> >> acceptable, but it may confuse people unless well documented.
> > I find that ownership is optional and must be set explicitly through
> > TUNSETOWNER and TUNSETGROUP quite surprising too.
> >
> > But this is only reverting to long established behavior.
> >
> >> Also there is another smaller issue in the new code that I forgot to
> >> mention - with LSMs (such as SELinux) the ns_capable() call will
> >> produce an audit record when the capability is denied by an LSM. These
> >> audit records are meant to indicate that the permission was needed but
> >> denied and that the policy was either breached or needs to be
> >> adjusted. Therefore, the ns_capable() call should ideally only happen
> >> after the user/group checks so that only accesses that actually
> >> wouldn't succeed without the capability yield an audit record.
> >>
> >> So I would suggest this version:
> >>
> >> static inline bool tun_capable(struct tun_struct *tun)
> >> {
> >>      const struct cred *cred = current_cred();
> >>      struct net *net = dev_net(tun->dev);
> >>
> >>      if (uid_valid(tun->owner) && uid_eq(cred->euid, tun->owner))
> >>          return 1;
> >>      if (gid_valid(tun->group) && in_egroup_p(tun->group))
> >>          return 1;
> >>      if (!uid_valid(tun->owner) && !gid_valid(tun->group))
> >>          return 1;
> >>      return ns_capable(net->user_ns, CAP_NET_ADMIN);
> >> }
> > Improvement makes sense, thanks.
> >
> > One more point, based on the problem description in the referenced
> > patch:
> >
> >      Currently tun checks the group permission even if the user have matched.
> >      Besides going against the usual permission semantic, this has a
> >      very interesting implication: if the tun group is not among the
> >      supplementary groups of the tun user, then effectively no one can
> >      access the tun device.
> >
> > The intent was to skip the group check if the user matches. Not
> > necessarily the reverse.
> >
> > To minimize the impact of the patch, perhaps it can still always deny
> > if tun->owner is set and does not match. That keeps the group check
> > iff the owner is not explicitly set as well.
> 
> Doesn't this mean, if the user
> is set then group is completely
> ignored?

Yes

> By doing that you indeed avoid
> the problem of "completely
> inaccessible tap". However, that
> breaks my setup, as I really
> intended to provide tap to the
> owner and the unrelated group.
> This is because, eg when setting
> a CI job, you can add the needed
> user to the needed group, but
> you also need to re-login, which
> is not always possible. :(

Could you leave tun->owner unset?

> Also completely ignoring group
> when the user is set, is somewhat
> questionable. At the very least,
> perhaps then you need to explicitly
> clear the group when the user
> is set, to avoid the confusion.
> Having "either user or group"
> sounds like a sensible semantic,
> but its a different semantic.

True. I think that would have satisfied the intent of adding the
group check at the time, and would have avoided this situation.

But we indeed cannot retroactively restrict allowed behavior.
As that will break users.

Conversely, it might be that an existing user out there depends on
the prior behavior that only a process that matches both user and
group can use the device. Which might be reason for reverting the
patch entirely.