[PATCH] vfs: allow unprivileged whiteout creation

Mon May 4 15:38:32 UTC 2020

On Mon, May 4, 2020 at 7:18 AM Miklos Szeredi <miklos at szeredi.hu> wrote:
>
> On Fri, May 1, 2020 at 8:40 PM Stephen Smalley
> <stephen.smalley.work at gmail.com> wrote:
> >
> > On Fri, May 1, 2020 at 3:34 AM Miklos Szeredi <miklos at szeredi.hu> wrote:
> > >
> > > On Fri, May 01, 2020 at 05:14:44AM +0100, Al Viro wrote:
> > > > On Thu, Apr 09, 2020 at 11:28:59PM +0200, Miklos Szeredi wrote:
> > > > > From: Miklos Szeredi <mszeredi at redhat.com>
> > > > >
> > > > > Whiteouts, unlike real device node should not require privileges to create.
> > > > >
> > > > > The general concern with device nodes is that opening them can have side
> > > > > effects.  The kernel already avoids zero major (see
> > > > > Documentation/admin-guide/devices.txt).  To be on the safe side the patch
> > > > > explicitly forbids registering a char device with 0/0 number (see
> > > > > cdev_add()).
> > > > >
> > > > > This guarantees that a non-O_PATH open on a whiteout will fail with ENODEV;
> > > > > i.e. it won't have any side effect.
> > > >
> > > > >  int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
> > > > >  {
> > > > > +   bool is_whiteout = S_ISCHR(mode) && dev == WHITEOUT_DEV;
> > > > >     int error = may_create(dir, dentry);
> > > > >
> > > > >     if (error)
> > > > >             return error;
> > > > >
> > > > > -   if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD))
> > > > > +   if ((S_ISCHR(mode) || S_ISBLK(mode)) && !capable(CAP_MKNOD) &&
> > > > > +       !is_whiteout)
> > > > >             return -EPERM;
> > > >
> > > > Hmm...  That exposes vfs_whiteout() to LSM; are you sure that you won't
> > > > end up with regressions for overlayfs on sufficiently weird setups?
> > >
> > > You're right.  OTOH, what can we do?  We can't fix the weird setups, only the
> > > distros/admins can.
> > >
> > > Can we just try this, and revert to calling ->mknod directly from overlayfs if
> > > it turns out to be a problem that people can't fix easily?
> > >
> > > I guess we could add a new ->whiteout security hook as well, but I'm not sure
> > > it's worth it.  Cc: LMS mailing list; patch re-added for context.
> >
> > I feel like I am still missing context but IIUC this change is
> > allowing unprivileged userspace to explicitly call mknod(2) with the
> > whiteout device number and skip all permission checks (except the LSM
> > one). And then you are switching vfs_whiteout() over to using
> > vfs_mknod() internally since it no longer does permission checking and
> > that was why vfs_whiteout() was separate originally to avoid imposing
> > any checks on overlayfs-internal creation of whiteouts?
> >
> > If that's correct, then it seems problematic since we have no way in
> > the LSM hook to distinguish the two cases (userspace invocation of
> > mknod(2) versus overlayfs-internal operation).  Don't know offhand
> > what credential is in effect in the overlayfs case (mounter or
> > current) but regardless Android seems to use current regardless, and
> > that could easily fail.
>
> The major point is: whiteouts are *not* device files, not in the real
> sense, it just happens that whiteouts are represented by the file
> having a char/0/0 type.
>
> Also the fact that overlayfs invocation is indistinguishable from
> userspace invocation is very much on purpose.  Whiteout creation was
> the exception before this change, not the rule.
>
> If you consider the above, how should this be handled from an LSM perspective?

In that case, I guess you can leave the patch as is aside from moving
the capable() check last, and we will just need to allow creation of
these files to the mounter context for overlayfs-internal usage. It
doesn't appear to be safe to skip the hook call altogether for the
general case (e.g. userspace mknod(2)).