[PATCH 09/16] fs: add vfs_set_fscaps()

Seth Forshee (DigitalOcean) sforshee at kernel.org
Thu Dec 7 14:42:58 UTC 2023


[Adding Mimi for insights on EVM questions]

On Fri, Dec 01, 2023 at 12:18:00PM -0600, Seth Forshee (DigitalOcean) wrote:
> On Fri, Dec 01, 2023 at 06:39:18PM +0100, Christian Brauner wrote:
> > > +/**
> > > + * vfs_set_fscaps - set filesystem capabilities
> > > + * @idmap: idmap of the mount the inode was found from
> > > + * @dentry: the dentry on which to set filesystem capabilities
> > > + * @caps: the filesystem capabilities to be written
> > > + * @flags: setxattr flags to use when writing the capabilities xattr
> > > + *
> > > + * This function writes the supplied filesystem capabilities to the dentry.
> > > + *
> > > + * Return: 0 on success, a negative errno on error.
> > > + */
> > > +int vfs_set_fscaps(struct mnt_idmap *idmap, struct dentry *dentry,
> > > +		   const struct vfs_caps *caps, int flags)
> > > +{
> > > +	struct inode *inode = d_inode(dentry);
> > > +	struct inode *delegated_inode = NULL;
> > > +	struct vfs_ns_cap_data nscaps;
> > > +	int size, error;
> > > +
> > > +	/*
> > > +	 * Unfortunately EVM wants to have the raw xattr value to compare to
> > > +	 * the on-disk version, so we need to pass the raw xattr to the
> > > +	 * security hooks. But we also want to do security checks before
> > > +	 * breaking leases, so that means a conversion to the raw xattr here
> > > +	 * which will usually be reduntant with the conversion we do for
> > > +	 * writing the xattr to disk.
> > > +	 */
> > > +	size = vfs_caps_to_xattr(idmap, i_user_ns(inode), caps, &nscaps,
> > > +				 sizeof(nscaps));
> > > +	if (size < 0)
> > > +		return size;
> > 
> > Oh right, I remember that. Slight eyeroll. See below though...
> > 
> > > +
> > > +retry_deleg:
> > > +	inode_lock(inode);
> > > +
> > > +	error = xattr_permission(idmap, inode, XATTR_NAME_CAPS, MAY_WRITE);
> > > +	if (error)
> > > +		goto out_inode_unlock;
> > > +	error = security_inode_setxattr(idmap, dentry, XATTR_NAME_CAPS, &nscaps,
> > > +					size, flags);
> > > +	if (error)
> > > +		goto out_inode_unlock;
> > 
> > For posix acls I added dedicated security hooks that take the struct
> > posix_acl stuff and then plumb that down into the security modules. You
> > could do the same thing here and then just force EVM and others to do
> > their own conversion from in-kernel to xattr format, instead of forcing
> > the VFS to do this.
> > 
> > Because right now we make everyone pay the price all the time when
> > really EVM should pay that price and this whole unpleasantness.
> 
> Good point, I'll do that.

I've been reconsidering various approaches here. One thing I noticed is
that for the non-generic case (iow overlayfs) I missed calling
security_inode_post_setxattr(), where EVM also wants the raw xattr, so
that would require another conversion. That got me wondering whether the
setxattr security hooks really matter when writing fscaps to overlayfs.
And it seems like they might not: the LSMs only look for their own
xattrs, and IMA doesn't do anything with fscaps xattrs. EVM does, but
what it does for a xattr write to an overlayfs indoe seems at least
partially if not completely redundant with what it will do when the
xattr is written to the upper filesystem.

So could we push these security calls down to the generic fscaps
implementations just before/after writing the raw xattr data and just
skip them for overlayfs? If so we can get away with doing the vfs_caps
to xattr conversion only once.

The trade offs are that filesystems which implement fscaps inode
operations become responsible for calling the security hooks if needed,
and if something changes such that we need to call those security hooks
for fscaps on overlayfs this solution would no longer work.



More information about the Linux-security-module-archive mailing list