[RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L
Mickaël Salaün
mic at digikod.net
Fri Aug 8 08:32:44 UTC 2025
On Fri, Jul 11, 2025 at 08:11:44PM +0100, Tingmao Wang wrote:
> Hi Al, thanks for the review :) I haven't had the chance to properly
> think about this until today, so apologies for the delay.
>
> On 7/5/25 01:19, Al Viro wrote:
> > On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> >
> >> +struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> >> +{
> >> + struct v9fs_ino_path *path;
> >> + size_t path_components = 0;
> >> + struct dentry *curr = dentry;
> >> + ssize_t i;
> >> +
> >> + lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >> +
> >> + rcu_read_lock();
> >> +
> >> + /* Don't include the root dentry */
> >> + while (curr->d_parent != curr) {
> >> + path_components++;
> >> + curr = curr->d_parent;
> >> + }
> >> + if (WARN_ON(path_components > SSIZE_MAX)) {
>
> (Looking at this again I think this check is a bit bogus. I don't know
> how would it be possible at all for us to have > SSIZE_MAX deep
> directories especially since each level requires a dentry allocation, but
> even if this check is actually useful, it should be in the while loop,
> before each path_components++)
WARN_ON_ONCE() would be better, especially in a while loop. I avoid
using WARN_ON(), especially when that can be triggered by users.
>
> >> + rcu_read_unlock();
> >> + return NULL;
> >> + }
> >> +
> >> + path = kmalloc(struct_size(path, names, path_components),
> >> + GFP_KERNEL);
> >
> > Blocking allocation under rcu_read_lock().
>
> I think my first instinct of how to fix this, if the original code is
> correct barring this allocation issue, would be to take rcu read lock
> twice (first walk to calculate how much to allocate, then second walk to
> actually take the snapshots). We should be safe to rcu_read_unlock() in
> the middle as long as caller has a reference to the target dentry (this
> needs to be true even if we just do one rcu_read_lock() anyway), and we
> can start a parent walk again. The v9fs rename_sem should ensure we see
> the same path again.
>
> Alternatively, we can use dget_parent to do the walk, and not lock RCU at
> all. We still need to walk twice tho, to know how much to allocate. But
> for now I will keep the current approach.
>
> New version:
>
> /*
> * Must hold rename_sem due to traversing parents. Caller must hold
> * reference to dentry.
> */
> struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> {
> struct v9fs_ino_path *path;
> size_t path_components = 0;
> struct dentry *curr = dentry;
> ssize_t i;
>
> lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> might_sleep(); /* Allocation below might block */
>
> rcu_read_lock();
>
> /* Don't include the root dentry */
> while (curr->d_parent != curr) {
> if (WARN_ON(path_components >= SSIZE_MAX)) {
> rcu_read_unlock();
> return NULL;
> }
> path_components++;
> curr = curr->d_parent;
> }
>
> /*
> * Allocation can block so don't do it in RCU (and because the
> * allocation might be large, since name_snapshot leaves space for
> * inline str, not worth trying GFP_ATOMIC)
> */
> rcu_read_unlock();
>
> path = kmalloc(struct_size(path, names, path_components), GFP_KERNEL);
> if (!path) {
> rcu_read_unlock();
This unlock is wrong.
> return NULL;
> }
>
> path->nr_components = path_components;
> curr = dentry;
>
> rcu_read_lock();
> for (i = path_components - 1; i >= 0; i--) {
> take_dentry_name_snapshot(&path->names[i], curr);
> curr = curr->d_parent;
> }
> WARN_ON(curr != curr->d_parent);
> rcu_read_unlock();
> return path;
> }
>
> How does this look?
Looks good to me overall. Please sent a new patch series.
>
> On 7/5/25 01:25, Al Viro wrote:
> > On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> >> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
> >> + struct dentry *dentry)
> >> +{
> >> + struct dentry *curr = dentry;
> >> + struct qstr *curr_name;
> >> + struct name_snapshot *compare;
> >> + ssize_t i;
> >> +
> >> + lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >> +
> >> + rcu_read_lock();
> >> + for (i = ino_path->nr_components - 1; i >= 0; i--) {
> >> + if (curr->d_parent == curr) {
> >> + /* We're supposed to have more components to walk */
> >> + rcu_read_unlock();
> >> + return false;
> >> + }
> >> + curr_name = &curr->d_name;
> >> + compare = &ino_path->names[i];
> >> + /*
> >> + * We can't use hash_len because it is salted with the parent
> >> + * dentry pointer. We could make this faster by pre-computing our
> >> + * own hashlen for compare and ino_path outside, probably.
> >> + */
> >> + if (curr_name->len != compare->name.len) {
> >> + rcu_read_unlock();
> >> + return false;
> >> + }
> >> + if (strncmp(curr_name->name, compare->name.name,
> >> + curr_name->len) != 0) {
> >
> > ... without any kind of protection for curr_name. Incidentally,
> > what about rename()? Not a cross-directory one, just one that
> > changes the name of a subdirectory within the same parent?
>
> As far as I can tell, in v9fs_vfs_rename, v9ses->rename_sem is taken for
> both same-parent and different parent renames, so I think we're safe here
> (and hopefully for any v9fs dentries, nobody should be causing d_name to
> change except for ourselves when we call d_move in v9fs_vfs_rename? If
> yes then because we also take v9ses->rename_sem, in theory we should be
> fine here...?)
A lockdep_assert_held() or similar and a comment would make this clear.
>
> (Let me know if I missed anything. I'm assuming only the filesystem
> "owning" a dentry should d_move/d_exchange the dentry.)
>
> However, I see that there is a d_same_name function in dcache.c which is
> slightly more careful (but still requires the caller to check the dentry
> seqcount, which we do not need to because of the reasoning above), and in
> hindsight I think that is probably the more proper way to do this
> comparison (and will also handle case-insensitivity, although I've not
> explored if this is applicable to 9pfs).
>
> New version:
>
> /*
> * Must hold rename_sem due to traversing parents
> */
> bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
> {
> struct dentry *curr = dentry;
> struct name_snapshot *compare;
> ssize_t i;
>
> lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
>
> rcu_read_lock();
> for (i = ino_path->nr_components - 1; i >= 0; i--) {
> if (curr->d_parent == curr) {
> /* We're supposed to have more components to walk */
> rcu_read_unlock();
> return false;
> }
> compare = &ino_path->names[i];
> if (!d_same_name(curr, curr->d_parent, &compare->name)) {
> rcu_read_unlock();
> return false;
> }
> curr = curr->d_parent;
> }
> rcu_read_unlock();
> if (curr != curr->d_parent) {
> /* dentry is deeper than ino_path */
> return false;
> }
> return true;
> }
I like this new version.
>
> If you think this is not enough, can you suggest what would be needed?
> I'm thinking maybe we can check dentry seqcount to be safe, but from
> earlier discussion in "bpf path iterator" my impression is that that is
> VFS internal data - can we use it here (if needed)?
>
> (I assume, from looking at the code, just having a reference does not
> prevent d_name from changing)
>
>
More information about the Linux-security-module-archive
mailing list