lsm sb_delete hook, was Re: [PATCH 4/7] vfs: Convert sb->s_inodes iteration to super_iter_inodes()

Jan Kara jack at suse.cz
Tue Oct 8 11:23:44 UTC 2024


On Tue 08-10-24 10:57:22, Amir Goldstein wrote:
> On Tue, Oct 8, 2024 at 1:33 AM Dave Chinner <david at fromorbit.com> wrote:
> >
> > On Mon, Oct 07, 2024 at 01:37:19PM -0700, Linus Torvalds wrote:
> > > On Thu, 3 Oct 2024 at 04:57, Jan Kara <jack at suse.cz> wrote:
> > > >
> > > > Fair enough. If we go with the iterator variant I've suggested to Dave in
> > > > [1], we could combine the evict_inodes(), fsnotify_unmount_inodes() and
> > > > Landlocks hook_sb_delete() into a single iteration relatively easily. But
> > > > I'd wait with that convertion until this series lands.
> > >
> > > Honza, I looked at this a bit more, particularly with an eye of "what
> > > happens if we just end up making the inode lifetimes subject to the
> > > dentry lifetimes" as suggested by Dave elsewhere.
> >
> > ....
> >
> > > which makes the fsnotify_inode_delete() happen when the inode is
> > > removed from the dentry.
> >
> > There may be other inode references being held that make
> > the inode live longer than the dentry cache. When should the
> > fsnotify marks be removed from the inode in that case? Do they need
> > to remain until, e.g, writeback completes?
> >
> 
> fsnotify inode marks remain until explicitly removed or until sb
> is unmounted (*), so other inode references are irrelevant to
> inode mark removal.
> 
> (*) fanotify has "evictable" inode marks, which do not hold inode
> reference and go away on inode evict, but those mark evictions
> do not generate any event (i.e. there is no FAN_UNMOUNT).

Yes. Amir beat me with the response so let me just add that FS_UMOUNT event
is for inotify which guarantees that either you get an event about somebody
unlinking the inode (e.g. IN_DELETE_SELF) or event about filesystem being
unmounted (IN_UMOUNT) if you place mark on some inode. I also don't see how
we would maintain this behavior with what Linus proposes.

> > > Then at umount time, the dentry shrinking will deal with all live
> > > dentries, and at most the fsnotify layer would send the FS_UNMOUNT to
> > > just the root dentry inodes?
> >
> > I don't think even that is necessary, because
> > shrink_dcache_for_umount() drops the sb->s_root dentry after
> > trimming the dentry tree. Hence the dcache drop would cleanup all
> > inode references, roots included.
> >
> > > Wouldn't that make things much cleaner, and remove at least *one* odd
> > > use of the nasty s_inodes list?
> >
> > Yes, it would, but someone who knows exactly when the fsnotify
> > marks can be removed needs to chime in here...

So fsnotify needs a list of inodes for the superblock which have marks
attached and for which we hold inode reference. We can keep it inside
fsnotify code although it would practically mean another list_head for the
inode for this list (probably in our fsnotify_connector structure which
connects list of notification marks to the inode). If we actually get rid
of i_sb_list in struct inode, this will be a win for the overall system,
otherwise it is a net loss IMHO. So if we can figure out how to change
other s_inodes owners we can certainly do this fsnotify change.

> > > And I wonder if the quota code (which uses the s_inodes list to enable
> > > quotas on already mounted filesystems) could for all the same reasons
> > > just walk the dentry tree instead (and remove_dquot_ref similarly
> > > could just remove it at dentry_unlink_inode() time)?
> >
> > I don't think that will work because we have to be able to modify
> > quota in evict() processing. This is especially true for unlinked
> > inodes being evicted from cache, but also the dquots need to stay
> > attached until writeback completes.
> >
> > Hence I don't think we can remove the quota refs from the inode
> > before we call iput_final(), and so I think quotaoff (at least)
> > still needs to iterate inodes...

Yeah, I'm not sure how to get rid of the s_inodes use in quota code. One of
the things we need s_inodes list for is during quotaoff on a mounted
filesystem when we need to iterate all inodes which are referencing quota
structures and free them.  In theory we could keep a list of inodes
referencing quota structures but that would require adding list_head to
inode structure for filesystems that support quotas. Now for the sake of
full context I'll also say that enabling / disabling quotas on a mounted
filesystem is a legacy feature because it is quite easy that quota
accounting goes wrong with it. So ext4 and f2fs support for quite a few
years a mode where quota tracking is enabled on mount and disabled on
unmount (if appropriate fs feature is enabled) and you can only enable /
disable enforcement of quota limits during runtime.  So I could see us
deprecating this functionality altogether although jfs never adapted to
this new way we do quotas so we'd have to deal with that somehow.  But one
way or another it would take a significant amount of time before we can
completely remove this so it is out of question for this series.

I see one problem with the idea "whoever has a need to iterate inodes needs
to keep track of inodes it needs to iterate through". It is fine
conceptually but with s_inodes list we pay the cost only once and multiple
users benefit. With each subsystem tracking inodes we pay the cost for each
user (both in terms of memory and CPU). So if you don't use any of the
subsystems that need iteration, you win, but if you use two or more of
these subsystems, in particular those which need to track significant
portion of all inodes, you are losing.

								Honza
-- 
Jan Kara <jack at suse.com>
SUSE Labs, CR



More information about the Linux-security-module-archive mailing list