lsm sb_delete hook, was Re: [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes()

Dave Chinner david at fromorbit.com
Mon Oct 7 23:33:06 UTC 2024


On Mon, Oct 07, 2024 at 01:37:19PM -0700, Linus Torvalds wrote:
> On Thu, 3 Oct 2024 at 04:57, Jan Kara <jack at suse.cz> wrote:
> >
> > Fair enough. If we go with the iterator variant I've suggested to Dave in
> > [1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and
> > Landlocks hook_sb_delete() into a single iteration relatively easily. But
> > I'd wait with that convertion until this series lands.
> 
> Honza, I looked at this a bit more, particularly with an eye of "what
> happens if we just end up making the inode lifetimes subject to the
> dentry lifetimes" as suggested by Dave elsewhere.

....

> which makes the fsnotify_inode_delete() happen when the inode is
> removed from the dentry.

There may be other inode references being held that make
the inode live longer than the dentry cache. When should the
fsnotify marks be removed from the inode in that case? Do they need
to remain until, e.g, writeback completes?

> Then at umount time, the dentry shrinking will deal with all live
> dentries, and at most the fsnotify layer would send the FS_UNMOUNT to
> just the root dentry inodes?

I don't think even that is necessary, because
shrink_dcache_for_umount() drops the sb->s_root dentry after
trimming the dentry tree. Hence the dcache drop would cleanup all
inode references, roots included.

> Wouldn't that make things much cleaner, and remove at least *one* odd
> use of the nasty s_inodes list?

Yes, it would, but someone who knows exactly when the fsnotify
marks can be removed needs to chime in here...

> I have this feeling that maybe we can just remove the other users too
> using similar models. I think the LSM layer use (in landlock) is bogus
> for exactly the same reason - there's really no reason to keep things
> around for a random cached inode without a dentry.

Perhaps, but I'm not sure what the landlock code is actually trying
to do. It seems to be trying to avoid races between syscalls
releasing inode references and unmount calling security_sb_delete()
to clean up inode references that it has leaked. This implies that
it's not a) tracking inodes itself, and b) not cleaning up internal
state early enough in unmount.

Hence, to me, the lifecycle and reference counting of inode related
objects in landlock doesn't seem quite right, and the use of the
security_sb_delete() callout appears to be papering over an internal
lifecycle issue.

I'd love to get rid of it altogether.

> And I wonder if the quota code (which uses the s_inodes list to enable
> quotas on already mounted filesystems) could for all the same reasons
> just walk the dentry tree instead (and remove_dquot_ref similarly
> could just remove it at dentry_unlink_inode() time)?

I don't think that will work because we have to be able to modify
quota in evict() processing. This is especially true for unlinked
inodes being evicted from cache, but also the dquots need to stay
attached until writeback completes.

Hence I don't think we can remove the quota refs from the inode
before we call iput_final(), and so I think quotaoff (at least)
still needs to iterate inodes...

> It really feels like most (all?) of the s_inode list users are
> basically historical, and shouldn't use that list at all. And there
> aren't _that_ many of them. I think Dave was right in just saying that
> this list should go away entirely (or was it somebody else who made
> that comment?)

Yeah, I said that it should go away entirely.

My view of this whole s_inodes list is that subsystems that are
taking references to inodes *must* track or manage the references to
the inodes themselves.

The canonical example is the VFS itself: evict_inodes() doesn't need
to iterate s_inodes at all. It can walk the inode LRU to purge all
the unreferenced cached inodes from memory. iput_final() guarantees
that all unreferenced inodes are either put on the LRU or torn down
immediately.

Hence I think that it is a poor architectural decision to require
superblock teardown to clean up inode references random subsystems
have *leaked* to prevent UAFs.  It forces the sb to track all
inodes whether the VFS actually needs to track them or not.

-Dave.
-- 
Dave Chinner
david at fromorbit.com



More information about the Linux-security-module-archive mailing list