[RFC PATCH v1 1/7] fs: Add inode_get_ino() and implement get_ino() for NFS

Mon Oct 21 14:04:45 UTC 2024

On Thu, Oct 17, 2024 at 01:05:54PM -0400, Jeff Layton wrote:
> On Thu, 2024-10-17 at 11:15 -0400, Paul Moore wrote:
> > On Thu, Oct 17, 2024 at 10:58 AM Christoph Hellwig <hch at infradead.org> wrote:
> > > On Thu, Oct 17, 2024 at 10:54:12AM -0400, Paul Moore wrote:
> > > > Okay, good to know, but I was hoping that there we could come up with
> > > > an explicit list of filesystems that maintain their own private inode
> > > > numbers outside of inode-i_ino.
> > > 
> > > Anything using iget5_locked is a good start.  Add to that file systems
> > > implementing their own inode cache (at least xfs and bcachefs).
> > 
> > Also good to know, thanks.  However, at this point the lack of a clear
> > answer is making me wonder a bit more about inode numbers in the view
> > of VFS developers; do you folks care about inode numbers?  I'm not
> > asking to start an argument, it's a genuine question so I can get a
> > better understanding about the durability and sustainability of
> > inode->i_no.  If all of you (the VFS folks) aren't concerned about
> > inode numbers, I suspect we are going to have similar issues in the
> > future and we (the LSM folks) likely need to move away from reporting
> > inode numbers as they aren't reliably maintained by the VFS layer.
> > 
> 
> Like Christoph said, the kernel doesn't care much about inode numbers.
> 
> People care about them though, and sometimes we have things in the
> kernel that report them in some fashion (tracepoints, procfiles, audit
> events, etc.). Having those match what the userland stat() st_ino field
> tells you is ideal, and for the most part that's the way it works.
> 
> The main exception is when people use 32-bit interfaces (somewhat rare
> these days), or they have a 32-bit kernel with a filesystem that has a
> 64-bit inode number space (NFS being one of those). The NFS client has
> basically hacked around this for years by tracking its own fileid field
> in its inode. That's really a waste though. That could be converted
> over to use i_ino instead if it were always wide enough.
> 
> It'd be better to stop with these sort of hacks and just fix this the
> right way once and for all, by making i_ino 64 bits everywhere.
> 
> A lot of the changes can probably be automated via coccinelle. I'd
> probably start by turning all of the direct i_ino accesses into static
> inline wrapper function calls. The hard part will be parceling out that
> work into digestable chunks. If you can avoid "flag day" changes, then
> that's ideal.  You'd want a patch per subsystem so you can collect
> ACKs. 
> 
> The hardest part will probably be the format string changes. I'm not
> sure you can easily use coccinelle for that, so that may need to be
> done by hand or scripted with python or something.

The problem where we're dealing with 64 bit inode numbers even on 32 bit
systems is one problem and porting struct inode to use a 64 bit type for
i_ino is a good thing that I agree we should explore.

I'm still not sure how that would stolve the audit problem though. The
inode numbers that audit reports, even if always 64 bit, are not unique
with e.g., btrfs and bcachefs. Audit would need to report additional
information for both filesystems like the subvolume number which would
make this consistent.

We should port to 64 bit and yes that'll take some time. Then audit may
want to grow support to report additional information such as the
subvolume number. And that can be done in various ways without having to
burden struct inode_operations with any of this. Or, document the 20
year reality that audit inode numbers aren't unique on some filesystems.