[RFC PATCH v1 1/7] fs: Add inode_get_ino() and implement get_ino() for NFS

Thu Oct 17 21:06:42 UTC 2024

On Thu, 2024-10-17 at 13:59 -0400, Jeff Layton wrote:
> On Thu, 2024-10-17 at 17:09 +0000, Trond Myklebust wrote:
> > On Thu, 2024-10-17 at 13:05 -0400, Jeff Layton wrote:
> > > On Thu, 2024-10-17 at 11:15 -0400, Paul Moore wrote:
> > > > On Thu, Oct 17, 2024 at 10:58 AM Christoph Hellwig
> > > > <hch at infradead.org> wrote:
> > > > > On Thu, Oct 17, 2024 at 10:54:12AM -0400, Paul Moore wrote:
> > > > > > Okay, good to know, but I was hoping that there we could
> > > > > > come
> > > > > > up with
> > > > > > an explicit list of filesystems that maintain their own
> > > > > > private
> > > > > > inode
> > > > > > numbers outside of inode-i_ino.
> > > > > 
> > > > > Anything using iget5_locked is a good start.  Add to that
> > > > > file
> > > > > systems
> > > > > implementing their own inode cache (at least xfs and
> > > > > bcachefs).
> > > > 
> > > > Also good to know, thanks.  However, at this point the lack of
> > > > a
> > > > clear
> > > > answer is making me wonder a bit more about inode numbers in
> > > > the
> > > > view
> > > > of VFS developers; do you folks care about inode numbers?  I'm
> > > > not
> > > > asking to start an argument, it's a genuine question so I can
> > > > get a
> > > > better understanding about the durability and sustainability of
> > > > inode->i_no.  If all of you (the VFS folks) aren't concerned
> > > > about
> > > > inode numbers, I suspect we are going to have similar issues in
> > > > the
> > > > future and we (the LSM folks) likely need to move away from
> > > > reporting
> > > > inode numbers as they aren't reliably maintained by the VFS
> > > > layer.
> > > > 
> > > 
> > > Like Christoph said, the kernel doesn't care much about inode
> > > numbers.
> > > 
> > > People care about them though, and sometimes we have things in
> > > the
> > > kernel that report them in some fashion (tracepoints, procfiles,
> > > audit
> > > events, etc.). Having those match what the userland stat() st_ino
> > > field
> > > tells you is ideal, and for the most part that's the way it
> > > works.
> > > 
> > > The main exception is when people use 32-bit interfaces (somewhat
> > > rare
> > > these days), or they have a 32-bit kernel with a filesystem that
> > > has
> > > a
> > > 64-bit inode number space (NFS being one of those). The NFS
> > > client
> > > has
> > > basically hacked around this for years by tracking its own fileid
> > > field
> > > in its inode. That's really a waste though. That could be
> > > converted
> > > over to use i_ino instead if it were always wide enough.
> > > 
> > > It'd be better to stop with these sort of hacks and just fix this
> > > the
> > > right way once and for all, by making i_ino 64 bits everywhere.
> > 
> > Nope.
> > 
> > That won't fix glibc, which is the main problem NFS has to work
> > around.
> > 
> 
> True, but that's really a separate problem.

Currently, the problem where the kernel needs to use one inode number
in iget5() and a different one when replying to stat() is limited to
the set of 64-bit kernels that can operate in 32-bit userland
compability mode. So mainly on x86_64 kernels that are set up to run in
i386 userland compatibility mode.

If you now decree that all kernels will use 64-bit inode numbers
internally, then you've suddenly expanded the problem to encompass all
the remaining 32-bit kernels. In order to avoid stat() returning
EOVERFLOW to the applications, they too will have to start generating
separate 32-bit inode numbers.

> 
> It also doesn't inform how we track inode numbers inside the kernel.
> Inode numbers have been 64 bits for years on "real" filesystems. If
> we
> were designing this today, i_ino would be a u64, and we'd only hash
> that down to 32 bits when necessary.

"I'm doing a (free) operating system (just a hobby, won't be big and
professional like gnu) for 386(486) AT clones."

History is a bitch...

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust at hammerspace.com