[RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)

Tingmao Wang m at maowtm.org
Sun Apr 6 20:43:01 UTC 2025


Hi 9p-fs maintainers,

(CC'ing Landlock and Fanotify/inotify people as this affects both use
cases, although most of my investigation has been on Landlock)

Previously [1], I noticed that when using 9pfs filesystems, the Landlock
LSM is blocking access even for files / directories allowed by rules, and
that this has something to do with 9pfs creating new inodes despite
Landlock holding a reference to the existing one.  Because Landlock uses
inodes' in-memory state (i_security) to identify allowed fs
objects/hierarchies, this causes Landlock to partially break on 9pfs, at
least in uncached mode, which is the default:

    # mount -t 9p -o trans=virtio test /mnt
    # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
    Executing the sandboxed command...
    bash: cannot set terminal process group (164): Inappropriate ioctl for device
    bash: no job control in this shell
    # cat /mnt/readme
    cat: /mnt/readme: Permission denied

This, however, works if somebody is holding onto the dentry (and it also
works with cache=loose), as in both cases the inode is reused:

    # tail -f /mnt/readme &
    [1] 196
    # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
    Executing the sandboxed command...
    bash: cannot set terminal process group (164): Inappropriate ioctl for device
    bash: no job control in this shell
    # cat /mnt/readme
    aa

It also works on directories if one have a shell that cd into the
directory.  Note that this means only certain usage of Landlock are
affected - for example, sandboxing applications that takes a list of files
to allow, landlocks itself, then evecve.  On the other hand, this does not
affect applications that opens a file, then Landlocks itself while keeping
the file it needs open.

While the above is a very simple example, this is problematic in
real-world use cases if Landlock is used to sandox applications on system
that has files mounted via 9pfs, or use 9pfs as the root filesystem.  In
addition, this also affects fanotify / inotify when using inode mark (for
local access):

    root at d8c28a676d72:/# ./fanotify-basic-open /readme & # on virtiofs
    [1] 173
    root at d8c28a676d72:/# cat readme
    aa
    FAN_OPEN: File /readme
    root at d8c28a676d72:/# mount -t 9p -o trans=virtio test /mnt
    root at d8c28a676d72:/# ./fanotify-basic-open /mnt/readme & # on 9pfs
    [2] 176
    root at d8c28a676d72:/# cat /mnt/readme
    aa
    root at d8c28a676d72:/#

Same can be demonstrated with inotifywait.  The source code for
fanotify-basic-open, adopted from the fanotify man page, is available at
the end of this email.

Note that this is not a security bug for Landlock since it can only cause
legitimate access to be denied, but might be a problem for fanotify perm
(although I do recognize that using perm on individual inodes is already
perhaps a bit unreliable?)

It seems that there was an attempt at making 9pfs reuse inodes, based on
qid.path, however it was reverted [2] due to issues with servers that
present duplicate qids, for example on a QEMU host that has multiple
filesystems mounted under a single 9pfs export without multidevs=remap, or
in the case of other servers that doesn't necessarily support remapping
qids ([3] and more).  I've done some testing on v6.12-rc4 which has the
simplified 9pfs inode code before it was reverted, and found that Landlock
works (however, we of course then have the issue demonstrated in [2]).

Unrelated to the above problem, it also seems like even with the revert in
[2], because in cached mode inode are still reused based on qid (and type,
version (aka mtime), etc), the setup mentioned in [2] still causes
problems in th latest kernel with cache=loose:

    host # cd /tmp/linux-test
    host # mkdir m1 m2
    host # mount -t tmpfs tmpfs m1
    host # mount -t tmpfs tmpfs m2
    host # mkdir m1/dir m2/dir  # needs to be done together so that they have the same mtime
    host # echo foo > m1/dir/foo
    host # echo bar > m2/dir/bar

    guest # mount -t 9p -o trans=virtio,cache=loose test /mnt
    guest # cd /mnt/m1/dir
    qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!
    guest # ls
    foo
    guest # ls /mnt/m2/dir
    foo # <- should be bar
    guest # uname -a
    Linux d8c28a676d72 6.14.0-dev #92 SMP PREEMPT_DYNAMIC Sun Apr  6 18:47:54 BST 2025 x86_64 GNU/Linux

With the above in mind, I have a proposal for 9pfs to:
1. Reuse inodes even in uncached mode
2. However, reuse them based on qid.path AND the actual pathname, by doing
   the appropriate testing in v9fs_test(_new)?_inode(_dotl)?

The main problem here is how to store the pathname in a sensible way and
tie it to the inode.  For now I opted with an array of names acquired with
take_dentry_name_snapshot, which reuses the same memory as the dcache to
store the actual strings, but doesn't tie the lifetime of the dentry with
the inode (I thought about holding a reference to the dentry in the
v9fs_inode, but it seemed like a wrong approach and would cause dentries
to not be evicted/released).

Maybe ideally there could be a general way for filesystems to tell
VFS/dcache to "pin" a dentry as long as the inode is alive, but still
allows the dentry and inode to be evicted based on memory pressure?  In
fact, if the dentry is alive, we might not even need to do this in 9pfs,
as we will automatically get the same inode pointed to by the dentry.

Currently this pathname is immutable once attached to an inode, and
therefore it is not protected by locks or RCU, but this might have to
change for us to support renames that preserve the inode on next access.
This is not in this patchset yet.  Also let me know if I've missed any
locks etc (I'm new to VFS, or for that matter, the kernel).

Storing one pathname per inode also means we don't reuse the same inode
for hardlinks -- maybe this can be fixed as well in a future version, if
this approach sounds good?

>From some QEMU documentation I read [4] it seems like there is a plan to
resolve these kind of problems in a new version of the protocol, by
expanding the qid to include the filesystem identifier of a file on the
host, so maybe this can be disabled after a successful protocol version
check with the host?  For now this is implemented as a default option,
inodeident=path, which can be set to 'none' to instead get the previous
behaviour.

This patchset is still a bit of a work in progress, and I've not tested
the performance impact.  It currently uses strncmp to compare paths but
this might be able to be optimized into a hash comparison first.  It
should also normally only need to do it for one pair of filenames, as the
test is only done if qid.path matches in the first place.

Also, currently the inode is not reused in cached mode if mtime changed
AND the dentry was evicted -- I considered removing the qid.version test
in v9fs_test_inode_dotl but I think perhaps care needs to be taken to
ensure we can refresh an inode that potentially has data cached?  This is
a TODO for this patchset.

Another TODO is to maybe add support for case-insensitive matching?

For this patch series I've tested modifying files on both host and guest,
changing a file from regular file to dir then back while preserving ino,
keeping a file open in the guest with a fd, and using Landlock (which
results in an ihold but does not keep the dentry) then trying to access
the file from both inside and outside the Landlocked shell.

Let me know what's your thought on this -- if this is a viable approach
I'm happy to work on it more and do more testing.  The main motivation
behind this is getting Landlock to work "out of the box" on 9pfs.

This patch series was based on, and tested on v6.14 + [5]

Kind regards,
Tingmao

[1]: https://github.com/landlock-lsm/linux/issues/45
[2]: https://lore.kernel.org/all/20241024-revert_iget-v1-4-4cac63d25f72@codewreck.org/
[3]: https://lore.kernel.org/all/20240923100508.GA32066@willie-the-truck/
[4]: https://wiki.qemu.org/Documentation/9p#Protocol_Plans
[5]: https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/

fanotify-basic-open.c:

    #define _GNU_SOURCE /* Needed to get O_LARGEFILE definition */
    #include <errno.h>
    #include <fcntl.h>
    #include <limits.h>
    #include <poll.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <sys/fanotify.h>
    #include <unistd.h>

    /* Read all available fanotify events from the file descriptor 'fd'. */

    static void handle_events(int fd)
    {
        const struct fanotify_event_metadata *metadata;
        struct fanotify_event_metadata buf[200];
        ssize_t size;
        char path[PATH_MAX];
        ssize_t path_len;
        char procfd_path[PATH_MAX];
        struct fanotify_response response;

        for (;;) {
            size = read(fd, buf, sizeof(buf));
            if (size == -1 && errno != EAGAIN) {
                perror("read");
                exit(EXIT_FAILURE);
            }

            /* Check if end of available data reached. */

            if (size <= 0)
                break;

            /* Point to the first event in the buffer. */

            metadata = buf;

            /* Loop over all events in the buffer. */

            while (FAN_EVENT_OK(metadata, size)) {
                if (metadata->fd >= 0) {
                    if (metadata->mask & FAN_OPEN) {
                        printf("FAN_OPEN: ");
                    }

                    /* Retrieve and print pathname of the accessed file. */

                    snprintf(procfd_path, sizeof(procfd_path),
                        "/proc/self/fd/%d", metadata->fd);
                    path_len = readlink(procfd_path, path,
                                sizeof(path) - 1);
                    if (path_len == -1) {
                        perror("readlink");
                        exit(EXIT_FAILURE);
                    }

                    path[path_len] = '\0';
                    printf("File %s\n", path);

                    /* Close the file descriptor of the event. */

                    close(metadata->fd);
                }

                /* Advance to next event. */

                metadata = FAN_EVENT_NEXT(metadata, size);
            }
        }
    }

    int main(int argc, char *argv[])
    {
        char buf;
        int fd, poll_num;
        nfds_t nfds;
        struct pollfd fds[2];

        /* Check mount point is supplied. */

        if (argc != 2) {
            fprintf(stderr, "Usage: %s FILE\n", argv[0]);
            exit(EXIT_FAILURE);
        }

        fd = fanotify_init(0, O_RDONLY | O_LARGEFILE);
        if (fd == -1) {
            perror("fanotify_init");
            exit(EXIT_FAILURE);
        }

        if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_INODE, FAN_OPEN, AT_FDCWD,
                argv[1]) == -1) {
            perror("fanotify_mark");
            exit(EXIT_FAILURE);
        }

        while (1) {
            handle_events(fd);
        }
    }

Tingmao Wang (6):
  fs/9p: Add ability to identify inode by path for .L
  fs/9p: add default option for path-based inodes
  fs/9p: Hide inodeident=path from show_options as it is the default
  fs/9p: Add ability to identify inode by path for non-.L
  fs/9p: .L: Refresh stale inodes on reuse
  fs/9p: non-.L: Refresh stale inodes on reuse

 fs/9p/Makefile         |   3 +-
 fs/9p/ino_path.c       | 114 ++++++++++++++++++++++++++++++++++++++
 fs/9p/v9fs.c           |  33 ++++++++++-
 fs/9p/v9fs.h           |  63 +++++++++++++++------
 fs/9p/vfs_inode.c      | 122 +++++++++++++++++++++++++++++++++++------
 fs/9p/vfs_inode_dotl.c | 108 +++++++++++++++++++++++++++++++++---
 fs/9p/vfs_super.c      |  10 +++-
 7 files changed, 406 insertions(+), 47 deletions(-)
 create mode 100644 fs/9p/ino_path.c


base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
prerequisite-patch-id: 12dc6676db52ff32eed082b1e5d273f297737f61
prerequisite-patch-id: 93ab54c52a41fa44b8d0baf55df949d0ad27e99a
prerequisite-patch-id: 5f558bf969e6eaa3d011c98de0806ca8ad369efe
--
2.39.5



More information about the Linux-security-module-archive mailing list