[RFC PATCH 0/6] fs/9p: Reuse inode based on path (in addition to qid)

Fri Jul 4 18:37:06 UTC 2025

Hi,

I tested this patch series and I confirm that it fixes the issue for
Landlock.  Tingmao explained that it also fixes other subsystems such as
fanotify, and 9p itself.  I sent a patch to test this fix with Landlock
and I'll merge it once 9p is fixed:
https://lore.kernel.org/all/20250704171345.1393451-1-mic@digikod.net/

Could you please give it a try in linux-next and consider it for the
next merge window?

Regards,
 Mickaël

On Sun, Apr 06, 2025 at 09:43:01PM +0100, Tingmao Wang wrote:
> Hi 9p-fs maintainers,
> 
> (CC'ing Landlock and Fanotify/inotify people as this affects both use
> cases, although most of my investigation has been on Landlock)
> 
> Previously [1], I noticed that when using 9pfs filesystems, the Landlock
> LSM is blocking access even for files / directories allowed by rules, and
> that this has something to do with 9pfs creating new inodes despite
> Landlock holding a reference to the existing one.  Because Landlock uses
> inodes' in-memory state (i_security) to identify allowed fs
> objects/hierarchies, this causes Landlock to partially break on 9pfs, at
> least in uncached mode, which is the default:
> 
>     # mount -t 9p -o trans=virtio test /mnt
>     # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
>     Executing the sandboxed command...
>     bash: cannot set terminal process group (164): Inappropriate ioctl for device
>     bash: no job control in this shell
>     # cat /mnt/readme
>     cat: /mnt/readme: Permission denied
> 
> This, however, works if somebody is holding onto the dentry (and it also
> works with cache=loose), as in both cases the inode is reused:
> 
>     # tail -f /mnt/readme &
>     [1] 196
>     # env LL_FS_RO=/etc:/usr:/bin:/lib:/mnt/readme LL_FS_RW= /sandboxer bash
>     Executing the sandboxed command...
>     bash: cannot set terminal process group (164): Inappropriate ioctl for device
>     bash: no job control in this shell
>     # cat /mnt/readme
>     aa
> 
> It also works on directories if one have a shell that cd into the
> directory.  Note that this means only certain usage of Landlock are
> affected - for example, sandboxing applications that takes a list of files
> to allow, landlocks itself, then evecve.  On the other hand, this does not
> affect applications that opens a file, then Landlocks itself while keeping
> the file it needs open.
> 
> While the above is a very simple example, this is problematic in
> real-world use cases if Landlock is used to sandox applications on system
> that has files mounted via 9pfs, or use 9pfs as the root filesystem.  In
> addition, this also affects fanotify / inotify when using inode mark (for
> local access):
> 
>     root at d8c28a676d72:/# ./fanotify-basic-open /readme & # on virtiofs
>     [1] 173
>     root at d8c28a676d72:/# cat readme
>     aa
>     FAN_OPEN: File /readme
>     root at d8c28a676d72:/# mount -t 9p -o trans=virtio test /mnt
>     root at d8c28a676d72:/# ./fanotify-basic-open /mnt/readme & # on 9pfs
>     [2] 176
>     root at d8c28a676d72:/# cat /mnt/readme
>     aa
>     root at d8c28a676d72:/#
> 
> Same can be demonstrated with inotifywait.  The source code for
> fanotify-basic-open, adopted from the fanotify man page, is available at
> the end of this email.
> 
> Note that this is not a security bug for Landlock since it can only cause
> legitimate access to be denied, but might be a problem for fanotify perm
> (although I do recognize that using perm on individual inodes is already
> perhaps a bit unreliable?)
> 
> It seems that there was an attempt at making 9pfs reuse inodes, based on
> qid.path, however it was reverted [2] due to issues with servers that
> present duplicate qids, for example on a QEMU host that has multiple
> filesystems mounted under a single 9pfs export without multidevs=remap, or
> in the case of other servers that doesn't necessarily support remapping
> qids ([3] and more).  I've done some testing on v6.12-rc4 which has the
> simplified 9pfs inode code before it was reverted, and found that Landlock
> works (however, we of course then have the issue demonstrated in [2]).
> 
> Unrelated to the above problem, it also seems like even with the revert in
> [2], because in cached mode inode are still reused based on qid (and type,
> version (aka mtime), etc), the setup mentioned in [2] still causes
> problems in th latest kernel with cache=loose:
> 
>     host # cd /tmp/linux-test
>     host # mkdir m1 m2
>     host # mount -t tmpfs tmpfs m1
>     host # mount -t tmpfs tmpfs m2
>     host # mkdir m1/dir m2/dir  # needs to be done together so that they have the same mtime
>     host # echo foo > m1/dir/foo
>     host # echo bar > m2/dir/bar
> 
>     guest # mount -t 9p -o trans=virtio,cache=loose test /mnt
>     guest # cd /mnt/m1/dir
>     qemu-system-x86_64: warning: 9p: Multiple devices detected in same VirtFS export, which might lead to file ID collisions and severe misbehaviours on guest! You should either use a separate export for each device shared from host or use virtfs option 'multidevs=remap'!
>     guest # ls
>     foo
>     guest # ls /mnt/m2/dir
>     foo # <- should be bar
>     guest # uname -a
>     Linux d8c28a676d72 6.14.0-dev #92 SMP PREEMPT_DYNAMIC Sun Apr  6 18:47:54 BST 2025 x86_64 GNU/Linux
> 
> With the above in mind, I have a proposal for 9pfs to:
> 1. Reuse inodes even in uncached mode
> 2. However, reuse them based on qid.path AND the actual pathname, by doing
>    the appropriate testing in v9fs_test(_new)?_inode(_dotl)?
> 
> The main problem here is how to store the pathname in a sensible way and
> tie it to the inode.  For now I opted with an array of names acquired with
> take_dentry_name_snapshot, which reuses the same memory as the dcache to
> store the actual strings, but doesn't tie the lifetime of the dentry with
> the inode (I thought about holding a reference to the dentry in the
> v9fs_inode, but it seemed like a wrong approach and would cause dentries
> to not be evicted/released).
> 
> Maybe ideally there could be a general way for filesystems to tell
> VFS/dcache to "pin" a dentry as long as the inode is alive, but still
> allows the dentry and inode to be evicted based on memory pressure?  In
> fact, if the dentry is alive, we might not even need to do this in 9pfs,
> as we will automatically get the same inode pointed to by the dentry.

Christian, any though?

> 
> Currently this pathname is immutable once attached to an inode, and
> therefore it is not protected by locks or RCU, but this might have to
> change for us to support renames that preserve the inode on next access.
> This is not in this patchset yet.  Also let me know if I've missed any
> locks etc (I'm new to VFS, or for that matter, the kernel).
> 
> Storing one pathname per inode also means we don't reuse the same inode
> for hardlinks -- maybe this can be fixed as well in a future version, if
> this approach sounds good?
> 
> From some QEMU documentation I read [4] it seems like there is a plan to
> resolve these kind of problems in a new version of the protocol, by
> expanding the qid to include the filesystem identifier of a file on the
> host, so maybe this can be disabled after a successful protocol version
> check with the host?  For now this is implemented as a default option,
> inodeident=path, which can be set to 'none' to instead get the previous
> behaviour.
> 
> This patchset is still a bit of a work in progress, and I've not tested
> the performance impact.  It currently uses strncmp to compare paths but
> this might be able to be optimized into a hash comparison first.  It
> should also normally only need to do it for one pair of filenames, as the
> test is only done if qid.path matches in the first place.
> 
> Also, currently the inode is not reused in cached mode if mtime changed
> AND the dentry was evicted -- I considered removing the qid.version test
> in v9fs_test_inode_dotl but I think perhaps care needs to be taken to
> ensure we can refresh an inode that potentially has data cached?  This is
> a TODO for this patchset.
> 
> Another TODO is to maybe add support for case-insensitive matching?
> 
> For this patch series I've tested modifying files on both host and guest,
> changing a file from regular file to dir then back while preserving ino,
> keeping a file open in the guest with a fd, and using Landlock (which
> results in an ihold but does not keep the dentry) then trying to access
> the file from both inside and outside the Landlocked shell.
> 
> Let me know what's your thought on this -- if this is a viable approach
> I'm happy to work on it more and do more testing.  The main motivation
> behind this is getting Landlock to work "out of the box" on 9pfs.
> 
> This patch series was based on, and tested on v6.14 + [5]
> 
> Kind regards,
> Tingmao
> 
> [1]: https://github.com/landlock-lsm/linux/issues/45
> [2]: https://lore.kernel.org/all/20241024-revert_iget-v1-4-4cac63d25f72@codewreck.org/
> [3]: https://lore.kernel.org/all/20240923100508.GA32066@willie-the-truck/
> [4]: https://wiki.qemu.org/Documentation/9p#Protocol_Plans
> [5]: https://lore.kernel.org/all/cover.1743956147.git.m@maowtm.org/
> 
> fanotify-basic-open.c:
> 
>     #define _GNU_SOURCE /* Needed to get O_LARGEFILE definition */
>     #include <errno.h>
>     #include <fcntl.h>
>     #include <limits.h>
>     #include <poll.h>
>     #include <stdio.h>
>     #include <stdlib.h>
>     #include <sys/fanotify.h>
>     #include <unistd.h>
> 
>     /* Read all available fanotify events from the file descriptor 'fd'. */
> 
>     static void handle_events(int fd)
>     {
>         const struct fanotify_event_metadata *metadata;
>         struct fanotify_event_metadata buf[200];
>         ssize_t size;
>         char path[PATH_MAX];
>         ssize_t path_len;
>         char procfd_path[PATH_MAX];
>         struct fanotify_response response;
> 
>         for (;;) {
>             size = read(fd, buf, sizeof(buf));
>             if (size == -1 && errno != EAGAIN) {
>                 perror("read");
>                 exit(EXIT_FAILURE);
>             }
> 
>             /* Check if end of available data reached. */
> 
>             if (size <= 0)
>                 break;
> 
>             /* Point to the first event in the buffer. */
> 
>             metadata = buf;
> 
>             /* Loop over all events in the buffer. */
> 
>             while (FAN_EVENT_OK(metadata, size)) {
>                 if (metadata->fd >= 0) {
>                     if (metadata->mask & FAN_OPEN) {
>                         printf("FAN_OPEN: ");
>                     }
> 
>                     /* Retrieve and print pathname of the accessed file. */
> 
>                     snprintf(procfd_path, sizeof(procfd_path),
>                         "/proc/self/fd/%d", metadata->fd);
>                     path_len = readlink(procfd_path, path,
>                                 sizeof(path) - 1);
>                     if (path_len == -1) {
>                         perror("readlink");
>                         exit(EXIT_FAILURE);
>                     }
> 
>                     path[path_len] = '\0';
>                     printf("File %s\n", path);
> 
>                     /* Close the file descriptor of the event. */
> 
>                     close(metadata->fd);
>                 }
> 
>                 /* Advance to next event. */
> 
>                 metadata = FAN_EVENT_NEXT(metadata, size);
>             }
>         }
>     }
> 
>     int main(int argc, char *argv[])
>     {
>         char buf;
>         int fd, poll_num;
>         nfds_t nfds;
>         struct pollfd fds[2];
> 
>         /* Check mount point is supplied. */
> 
>         if (argc != 2) {
>             fprintf(stderr, "Usage: %s FILE\n", argv[0]);
>             exit(EXIT_FAILURE);
>         }
> 
>         fd = fanotify_init(0, O_RDONLY | O_LARGEFILE);
>         if (fd == -1) {
>             perror("fanotify_init");
>             exit(EXIT_FAILURE);
>         }
> 
>         if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_INODE, FAN_OPEN, AT_FDCWD,
>                 argv[1]) == -1) {
>             perror("fanotify_mark");
>             exit(EXIT_FAILURE);
>         }
> 
>         while (1) {
>             handle_events(fd);
>         }
>     }
> 
> Tingmao Wang (6):
>   fs/9p: Add ability to identify inode by path for .L
>   fs/9p: add default option for path-based inodes
>   fs/9p: Hide inodeident=path from show_options as it is the default
>   fs/9p: Add ability to identify inode by path for non-.L
>   fs/9p: .L: Refresh stale inodes on reuse
>   fs/9p: non-.L: Refresh stale inodes on reuse
> 
>  fs/9p/Makefile         |   3 +-
>  fs/9p/ino_path.c       | 114 ++++++++++++++++++++++++++++++++++++++
>  fs/9p/v9fs.c           |  33 ++++++++++-
>  fs/9p/v9fs.h           |  63 +++++++++++++++------
>  fs/9p/vfs_inode.c      | 122 +++++++++++++++++++++++++++++++++++------
>  fs/9p/vfs_inode_dotl.c | 108 +++++++++++++++++++++++++++++++++---
>  fs/9p/vfs_super.c      |  10 +++-
>  7 files changed, 406 insertions(+), 47 deletions(-)
>  create mode 100644 fs/9p/ino_path.c
> 
> 
> base-commit: 38fec10eb60d687e30c8c6b5420d86e8149f7557
> prerequisite-patch-id: 12dc6676db52ff32eed082b1e5d273f297737f61
> prerequisite-patch-id: 93ab54c52a41fa44b8d0baf55df949d0ad27e99a
> prerequisite-patch-id: 5f558bf969e6eaa3d011c98de0806ca8ad369efe
> --
> 2.39.5
>