[RFC PATCH 2/3] add statmnt(2) syscall

Tue Sep 19 21:28:38 UTC 2023

On Tue, Sep 19, 2023 at 4:02 AM Miklos Szeredi <miklos at szeredi.hu> wrote:
> On Tue, 19 Sept 2023 at 02:38, Matthew House <mattlloydhouse at gmail.com> wrote:
>
> > One natural solution is to set either of the two lengths to the expected
> > size if the provided buffer are too small. That way, the caller learns both
> > which of the buffers is too small, and how large they need to be. Replacing
> > a provided size with an expected size in this way already has precedent in
> > existing syscalls:
>
> This is where the thread started.  Knowing the size of the buffer is
> no good, since the needed buffer could change between calls.

As Brauner mentioned, this does not change with the single-buffer
interface. And since changes are not likely to occur extremely frequently,
I feel like it would be better for the caller to only need one retry in the
common case rather than N retries for however many doublings it takes to
fit the whole buffer.

> We are trying to create a simple interface, no?  My proposal would
> need a helper like this:
>
> struct statmnt *statmount(uint64_t mnt_id, uint64_t mask, unsigned int flags)
> {
>         size_t bufsize = 1 << 15;
>         void *buf;
>         int ret;
>
>         for (;;) {
>                 buf = malloc(bufsize <<= 1);
>                 if (!buf)
>                         return NULL;
>                 ret = syscall(__NR_statmnt, mnt_id, mask, buf, bufsize, flags);
>                 if (!ret)
>                         return buf;
>                 free(buf);
>                 if (errno != EOVERFLOW)
>                         return NULL;
>         }
> }
>
> Christian's would be (ignoring .fs_type for now):
>
> int statmount(uint64_t mnt_id, uint64_t mask, struct statmnt *st,
> unsigned int flags)
> {
>         int ret;
>
>         st->mnt_root_size = 1 << 15;
>         st->mountpoint_size = 1 << 15;
>         for (;;) {
>                 st->mnt_root = malloc(st->mnt_root_size <<= 1);
>                 st->mountpoint = malloc(st->mountpoint <<= 1);
>                 if (!st->mnt_root || !st->mountpoint) {
>                         free(st->mnt_root);
>                         free(st->mountpoint);
>                         return -1;
>                 }
>                 ret = syscall(__NR_statmnt, mnt_id, mask, st,
> sizeof(*st), flags);
>                 if (!ret || errno != EOVERFLOW)
>                         return ret;
>                 free(st->mnt_root);
>                 free(st->mountpoint);
>         }
> }
>
> It's not hugely more complex, but more complex nonetheless.
>
> Also having the helper allocate buffers inside the struct could easily
> result in leaks since it's not obvious what the caller needs to free,
> while in the first example it is.

There's nothing stopping the userspace helper from exposing a contiguous
buffer that can be easily freed, even if the kernel API uses a separate-
buffer interface internally. It just takes a bit of addition in the helper
to calculate the correct pointers. To wit:

struct statmnt *statmount(uint64_t mnt_id, uint64_t mask, unsigned int flags)
{
        uint32_t mnt_root_size = PATH_MAX;
        uint32_t mountpoint_size = PATH_MAX;
        struct statmnt *st;
        int ret;

        for (;;) {
                st = malloc(sizeof(*st) + mnt_root_size + mountpoint_size);
                if (!st)
                        return NULL;
                st->mnt_root = (char *)st + sizeof(*st);
                st->mountpoint = (char *)st + sizeof(*st) + mnt_root_size;
                st->mnt_root_size = mnt_root_size;
                st->mountpoint_size = mountpoint_size;
                ret = syscall(__NR_statmnt, mnt_id, mask, st, sizeof(*st),
                              flags);
                if (ret) {
                        free(st);
                        return NULL;
                }
                if (st->mnt_root_size <= mnt_root_size &&
                    st->mountpoint_size <= mountpoint_size)
                        return st;
                mnt_root_size = st->mnt_root_size;
                mountpoint_size = st->mountpoint_size;
                free(st);
        }
}

(This is also far more helpful for users of the returned struct statmnt *,
since they can just dereference the two pointers instead of having to
decode the offsets by hand.)

More generally speaking, the biggest reason I dislike the current single-
buffer interface is that the output is "all or nothing": either the caller
has enough space in the buffer to store every single string, or it's unable
to get any fields at all, just an -EOVERFLOW. There's no room for the
caller to say that it just wants the integer fields and doesn't care about
the strings. Thus, to reliably call statmnt() on an arbitrary mount, the
ability to dynamically allocate memory is effectively mandatory. The only
real solution to this would be additional statx-like flags to select the
returned strings.

Meanwhile, with a separate-buffer interface, where the caller provides a
pointer and capacity for each string, granular output would be trivial: the
caller could just specify NULL/0 for any string it doesn't want, and still
successfully retrieve all the integer fields. This would also work well if
the caller, e.g., wants to set a hard cap of PATH_MAX bytes for each string
(since it's using static buffers), but nonetheless wants to retrieve the
integer fields if a string is too long.

Besides that, if the caller is written in standard C but doesn't want to
use malloc(3) to allocate the buffer, then its helper function must be
written very carefully (with a wrapper struct around the header and data)
to satisfy the aliasing rules, which forbid programs from using a struct
statmnt * pointer to read from a declared char[N] array. In practice,
callers tend to very rarely exercise this proper care with existing single-
buffer interfaces, such as recvmsg(2)'s msg_control buffer, and I would not
be very happy if statmnt() further contributed to this widespread issue.

Thank you,
Matthew House