[PATCH 00/13] VFS: Filesystem information [ver #19]

Wed Apr 1 05:22:38 UTC 2020

On Wed, 2020-03-18 at 17:05 +0100, Miklos Szeredi wrote:
> On Wed, Mar 18, 2020 at 4:08 PM David Howells <dhowells at redhat.com>
> wrote:
> 
> > ============================
> > WHY NOT USE PROCFS OR SYSFS?
> > ============================
> > 
> > Why is it better to go with a new system call rather than adding
> > more magic
> > stuff to /proc or /sysfs for each superblock object and each mount
> > object?
> > 
> >  (1) It can be targetted.  It makes it easy to query directly by
> > path.
> >      procfs and sysfs cannot do this easily.
> > 
> >  (2) It's more efficient as we can return specific binary data
> > rather than
> >      making huge text dumps.  Granted, sysfs and procfs could
> > present the
> >      same data, though as lots of little files which have to be
> >      individually opened, read, closed and parsed.
> 
> Asked this a number of times, but you haven't answered yet:  what
> application would require such a high efficiency?

Umm ... systemd and udisks2 and about 4 others.

A problem I've had with autofs for years is using autofs direct mount
maps of any appreciable size cause several key user space applications
to consume all available CPU while autofs is starting or stopping which
takes a fair while with a very large mount table. I saw a couple of
applications affected purely because of the large mount table but not
as badly as starting or stopping autofs.

Maps of 5,000 to 10,000 map entries can almost be handled, not uncommon
for heavy autofs users in spite of the problem, but much larger than
that and you've got a serious problem.

There are problems with expiration as well but that's more an autofs
problem that I need to fix.

To be clear it's not autofs that needs the improvement (I need to
deal with this in autofs itself) it's the affect that these large
mount tables have on the rest of the user space and that's quite
significant.

I can't even think about resolving my autofs problem until this
problem is resolved and handling very large numbers of mounts
as efficiently as possible must be part of that solution for me
and I think for the OS overall too.

Ian
> 
> Nobody's suggesting we move stat(2) to proc interfaces, and AFAIK
> nobody suggested we move /proc/PID/* to a binary syscall interface.
> Each one has its place, and I strongly feel that mount info belongs
> in
> the latter category.    Feel free to prove the opposite.
> 
> >  (3) We wouldn't have the overhead of open and close (even adding a
> >      self-contained readfile() syscall has to do that internally
> 
> Busted: add f_op->readfile() and be done with all that.   For example
> DEFINE_SHOW_ATTRIBUTE() could be trivially moved to that interface.
> 
> We could optimize existing proc, sys, etc. interfaces, but it's not
> been an issue, apparently.
> 
> >  (4) Opening a file in procfs or sysfs has a pathwalk overhead for
> > each
> >      file accessed.  We can use an integer attribute ID instead
> > (yes, this
> >      is similar to ioctl) - but could also use a string ID if that
> > is
> >      preferred.
> > 
> >  (5) Can easily query cross-namespace if, say, a container manager
> > process
> >      is given an fs_context that hasn't yet been mounted into a
> > namespace -
> >      or hasn't even been fully created yet.
> 
> Works with my patch.
> 
> >  (6) Don't have to create/delete a bunch of sysfs/procfs nodes each
> > time a
> >      mount happens or is removed - and since systemd makes much use
> > of
> >      mount namespaces and mount propagation, this will create a lot
> > of
> >      nodes.
> 
> Not true.
> 
> > The argument for doing this through procfs/sysfs/somemagicfs is
> > that
> > someone using a shell can just query the magic files using ordinary
> > text
> > tools, such as cat - and that has merit - but it doesn't solve the
> > query-by-pathname problem.
> > 
> > The suggested way around the query-by-pathname problem is to open
> > the
> > target file O_PATH and then look in a magic directory under procfs
> > corresponding to the fd number to see a set of attribute files[*]
> > laid out.
> > Bash, however, can't open by O_PATH or O_NOFOLLOW as things
> > stand...
> 
> Bash doesn't have fsinfo(2) either, so that's not really a good
> argument.
> 
> Implementing a utility to show mount attribute(s) by path is trivial
> for the file based interface, while it would need to be updated for
> each extension of fsinfo(2).   Same goes for libc, language bindings,
> etc.
> 
> Thanks,
> Miklos