[RFC PATCH] getting misc stats/attributes via xattr API

Amir Goldstein amir73il at gmail.com
Mon May 9 14:20:50 UTC 2022


On Mon, May 9, 2022 at 3:48 PM Christian Brauner <brauner at kernel.org> wrote:
>
> On Tue, May 03, 2022 at 02:23:23PM +0200, Miklos Szeredi wrote:
> > This is a simplification of the getvalues(2) prototype and moving it to the
> > getxattr(2) interface, as suggested by Dave.
> >
> > The patch itself just adds the possibility to retrieve a single line of
> > /proc/$$/mountinfo (which was the basic requirement from which the fsinfo
> > patchset grew out of).
> >
> > But this should be able to serve Amir's per-sb iostats, as well as a host of
> > other cases where some statistic needs to be retrieved from some object.  Note:
> > a filesystem object often represents other kinds of objects (such as processes
> > in /proc) so this is not limited to fs attributes.
> >
> > This also opens up the interface to setting attributes via setxattr(2).
> >
> > After some pondering I made the namespace so:
> >
> > : - root
> > bar - an attribute
> > foo: - a folder (can contain attributes and/or folders)
> >
> > The contents of a folder is represented by a null separated list of names.
> >
> > Examples:
> >
> > $ getfattr -etext -n ":" .
> > # file: .
> > :="mnt:\000mntns:"
> >
> > $ getfattr -etext -n ":mnt:" .
> > # file: .
> > :mnt:="info"
> >
> > $ getfattr -etext -n ":mnt:info" .
> > # file: .
> > :mnt:info="21 1 254:0 / / rw,relatime - ext4 /dev/root rw\012"
>
> Hey Miklos,
>
> One comment about this. We really need to have this interface support
> giving us mount options like "relatime" back in numeric form (I assume
> this will be possible.). It is royally annoying having to maintain a
> mapping table in userspace just to do:
>
> relatime -> MS_RELATIME/MOUNT_ATTR_RELATIME
> ro       -> MS_RDONLY/MOUNT_ATTR_RDONLY
>
> A library shouldn't be required to use this interface. Conservative
> low-level software that keeps its shared library dependencies minimal
> will need to be able to use that interface without having to go to an
> external library that transforms text-based output to binary form (Which
> I'm very sure will need to happen if we go with a text-based
> interface.).
>

No need for a library.
We can export:

:mnt:attr:flags (in hex format)

> >
> > $ getfattr -etext -n ":mntns:" .
> > # file: .
> > :mntns:="21:\00022:\00024:\00025:\00023:\00026:\00027:\00028:\00029:\00030:\00031:"
> >
> > $ getfattr -etext -n ":mntns:28:" .
> > # file: .
> > :mntns:28:="info"
> >
> > Comments?
>
> I'm not a fan of text-based APIs and I'm particularly not a fan of the
> xattr APIs. But at this point I'm ready to compromise on a lot as long
> as it gets us values out of the kernel in some way. :)
>
> I had to use xattrs extensively in various low-level userspace projects
> and they continue to be a source of races and memory bugs.
>
> A few initial questions:
>
> * The xattr APIs often require the caller to do sm like (copying some go
>   code quickly as I have that lying around):
>
>         for _, x := range split {
>                 xattr := string(x)
>                 // Call Getxattr() twice: First, to determine the size of the
>                 // buffer we need to allocate to store the extended attributes,
>                 // second, to actually store the extended attributes in the
>                 // buffer. Also, check if the size of the extended attribute
>                 // hasn't increased between the two calls.
>                 pre, err = unix.Getxattr(path, xattr, nil)
>                 if err != nil || pre < 0 {
>                         return nil, err
>                 }
>
>                 dest = make([]byte, pre)
>                 post := 0
>                 if pre > 0 {
>                         post, err = unix.Getxattr(path, xattr, dest)
>                         if err != nil || post < 0 {
>                                 return nil, err
>                         }
>                 }
>
>                 if post > pre {
>                         return nil, fmt.Errorf("Extended attribute '%s' size increased from %d to %d during retrieval", xattr, pre, post)
>                 }
>
>                 xattrs[xattr] = string(dest)
>         }
>
>   This pattern of requesting the size first by passing empty arguments,
>   then allocating the buffer and then passing down that buffer to
>   retrieve that value is really annoying to use and error prone (I do
>   of course understand why it exists.).
>
>   For real xattrs it's not that bad because we can assume that these
>   values don't change often and so the race window between
>   getxattr(GET_SIZE) and getxattr(GET_VALUES) often doesn't matter. But
>   fwiw, the post > pre check doesn't exist for no reason; we do indeed
>   hit that race.

It is not really a race, you can do {} while (errno != ERANGE) and there
will be no race.

>
>   In addition, it is costly having to call getxattr() twice. Again, for
>   retrieving xattrs it often doesn't matter because it's not a super
>   common operation but for mount and other info it might matter.
>

samba and many other projects that care about efficiency solved this
a long time ago with an opportunistic buffer - never start with NULL buffer
most values will fit in a 1K buffer.

>   Will we have to use the same pattern for mnt and other info as well?
>   If so, I worry that the race is way more likely than it is for real
>   xattrs.
>
> * Would it be possible to support binary output with this interface?
>   I really think users would love to have an interfact where they can
>   get a struct with binary info back. I'm not advocating to make the
>   whole interface binary but I wouldn't mind having the option to
>   support it.
>   Especially for some information at least. I'd really love to have a
>   way go get a struct mount_info or whatever back that gives me all the
>   details about a mount encompassed in a single struct.
>

I suggested that up thread and Greg has explicitly and loudly
NACKed it - so you will have to take it up with him

>   Callers like systemd will have to parse text and will end up
>   converting everything from text into binary anyway; especially for
>   mount information. So giving them an option for this out of the box
>   would be quite good.
>
>   Interfaces like statx aim to be as fast as possible because we exptect
>   them to be called quite often. Retrieving mount info is quite costly
>   and is done quite often as well. Maybe not for all software but for a
>   lot of low-level software. Especially when starting services in
>   systemd a lot of mount parsing happens similar when starting
>   containers in runtimes.
>

This API is not for *everything*. Obviously it does not replace statx and some
info (like the cifs OFFLINE flag) should be added to statx.
Whether or not mount info needs to get a special treatment like statx
is not proven.
Miklos claims this is a notification issue-
With David Howells' mount notification API, systemd can be pointed at the new
mount that was added/removed/changed and then systemd will rarely need to
parse thousands of mounts info.

> * If we decide to go forward with this interface - and I think I
>   mentioned this in the lsfmm session - could we please at least add a
>   new system call? It really feels wrong to retrieve mount and other
>   information through the xattr interfaces. They aren't really xattrs.
>
>   Imho, xattrs are a bit like a wonky version of streams already (One of
>   the reasons I find them quite unpleasant.). Making mount and other
>   information retrievable directly through the getxattr() interface will
>   turn them into a full-on streams implementation imho. I'd prefer not
>   to do that (Which is another reason I'd prefer at least a separate
>   system call.).

If you are thinking about a read() like interface for xattr or any alternative
data stream then Linus has NACKed it times before.

However, we could add getxattr_multi() as Dave Chinner suggested for
enumerating multiple keys+ (optional) values.
In contrast to listxattr(), getxattr_multi() could allow a "short read"
at least w.r.t the size of the vector.

Thanks,
Amir.



More information about the Linux-security-module-archive mailing list