[RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L

Fri Aug 8 08:32:44 UTC 2025

On Fri, Jul 11, 2025 at 08:11:44PM +0100, Tingmao Wang wrote:
> Hi Al, thanks for the review :)  I haven't had the chance to properly
> think about this until today, so apologies for the delay.
> 
> On 7/5/25 01:19, Al Viro wrote:
> > On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> > 
> >> +struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> >> +{
> >> +	struct v9fs_ino_path *path;
> >> +	size_t path_components = 0;
> >> +	struct dentry *curr = dentry;
> >> +	ssize_t i;
> >> +
> >> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >> +
> >> +	rcu_read_lock();
> >> +
> >> +    /* Don't include the root dentry */
> >> +	while (curr->d_parent != curr) {
> >> +		path_components++;
> >> +		curr = curr->d_parent;
> >> +	}
> >> +	if (WARN_ON(path_components > SSIZE_MAX)) {
> 
> (Looking at this again I think this check is a bit bogus.  I don't know
> how would it be possible at all for us to have > SSIZE_MAX deep
> directories especially since each level requires a dentry allocation, but
> even if this check is actually useful, it should be in the while loop,
> before each path_components++)

WARN_ON_ONCE() would be better, especially in a while loop.  I avoid
using WARN_ON(), especially when that can be triggered by users.

> 
> >> +		rcu_read_unlock();
> >> +		return NULL;
> >> +	}
> >> +
> >> +	path = kmalloc(struct_size(path, names, path_components),
> >> +		       GFP_KERNEL);
> > 
> > Blocking allocation under rcu_read_lock().
> 
> I think my first instinct of how to fix this, if the original code is
> correct barring this allocation issue, would be to take rcu read lock
> twice (first walk to calculate how much to allocate, then second walk to
> actually take the snapshots).  We should be safe to rcu_read_unlock() in
> the middle as long as caller has a reference to the target dentry (this
> needs to be true even if we just do one rcu_read_lock() anyway), and we
> can start a parent walk again.  The v9fs rename_sem should ensure we see
> the same path again.
> 
> Alternatively, we can use dget_parent to do the walk, and not lock RCU at
> all.  We still need to walk twice tho, to know how much to allocate.  But
> for now I will keep the current approach.
> 
> New version:
> 
> /*
>  * Must hold rename_sem due to traversing parents.  Caller must hold
>  * reference to dentry.
>  */
> struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
> {
> 	struct v9fs_ino_path *path;
> 	size_t path_components = 0;
> 	struct dentry *curr = dentry;
> 	ssize_t i;
> 
> 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> 	might_sleep(); /* Allocation below might block */
> 
> 	rcu_read_lock();
> 
> 	/* Don't include the root dentry */
> 	while (curr->d_parent != curr) {
> 		if (WARN_ON(path_components >= SSIZE_MAX)) {
> 			rcu_read_unlock();
> 			return NULL;
> 		}
> 		path_components++;
> 		curr = curr->d_parent;
> 	}
> 
> 	/*
> 	 * Allocation can block so don't do it in RCU (and because the
> 	 * allocation might be large, since name_snapshot leaves space for
> 	 * inline str, not worth trying GFP_ATOMIC)
> 	 */
> 	rcu_read_unlock();
> 
> 	path = kmalloc(struct_size(path, names, path_components), GFP_KERNEL);
> 	if (!path) {
> 		rcu_read_unlock();

This unlock is wrong.

> 		return NULL;
> 	}
> 
> 	path->nr_components = path_components;
> 	curr = dentry;
> 
> 	rcu_read_lock();
> 	for (i = path_components - 1; i >= 0; i--) {
> 		take_dentry_name_snapshot(&path->names[i], curr);
> 		curr = curr->d_parent;
> 	}
> 	WARN_ON(curr != curr->d_parent);
> 	rcu_read_unlock();
> 	return path;
> }
> 
> How does this look?

Looks good to me overall.  Please sent a new patch series.

> 
> On 7/5/25 01:25, Al Viro wrote:
> > On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> >> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
> >> +			     struct dentry *dentry)
> >> +{
> >> +	struct dentry *curr = dentry;
> >> +	struct qstr *curr_name;
> >> +	struct name_snapshot *compare;
> >> +	ssize_t i;
> >> +
> >> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> >> +
> >> +	rcu_read_lock();
> >> +	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> >> +		if (curr->d_parent == curr) {
> >> +			/* We're supposed to have more components to walk */
> >> +			rcu_read_unlock();
> >> +			return false;
> >> +		}
> >> +		curr_name = &curr->d_name;
> >> +		compare = &ino_path->names[i];
> >> +		/*
> >> +		 * We can't use hash_len because it is salted with the parent
> >> +		 * dentry pointer.  We could make this faster by pre-computing our
> >> +		 * own hashlen for compare and ino_path outside, probably.
> >> +		 */
> >> +		if (curr_name->len != compare->name.len) {
> >> +			rcu_read_unlock();
> >> +			return false;
> >> +		}
> >> +		if (strncmp(curr_name->name, compare->name.name,
> >> +			    curr_name->len) != 0) {
> > 
> > ... without any kind of protection for curr_name.  Incidentally,
> > what about rename()?  Not a cross-directory one, just one that
> > changes the name of a subdirectory within the same parent?
> 
> As far as I can tell, in v9fs_vfs_rename, v9ses->rename_sem is taken for
> both same-parent and different parent renames, so I think we're safe here
> (and hopefully for any v9fs dentries, nobody should be causing d_name to
> change except for ourselves when we call d_move in v9fs_vfs_rename?  If
> yes then because we also take v9ses->rename_sem, in theory we should be
> fine here...?)

A lockdep_assert_held() or similar and a comment would make this clear.

> 
> (Let me know if I missed anything.  I'm assuming only the filesystem
> "owning" a dentry should d_move/d_exchange the dentry.)
> 
> However, I see that there is a d_same_name function in dcache.c which is
> slightly more careful (but still requires the caller to check the dentry
> seqcount, which we do not need to because of the reasoning above), and in
> hindsight I think that is probably the more proper way to do this
> comparison (and will also handle case-insensitivity, although I've not
> explored if this is applicable to 9pfs).
> 
> New version:
> 
> /*
>  * Must hold rename_sem due to traversing parents
>  */
> bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
> {
> 	struct dentry *curr = dentry;
> 	struct name_snapshot *compare;
> 	ssize_t i;
> 
> 	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
> 
> 	rcu_read_lock();
> 	for (i = ino_path->nr_components - 1; i >= 0; i--) {
> 		if (curr->d_parent == curr) {
> 			/* We're supposed to have more components to walk */
> 			rcu_read_unlock();
> 			return false;
> 		}
> 		compare = &ino_path->names[i];
> 		if (!d_same_name(curr, curr->d_parent, &compare->name)) {
> 			rcu_read_unlock();
> 			return false;
> 		}
> 		curr = curr->d_parent;
> 	}
> 	rcu_read_unlock();
> 	if (curr != curr->d_parent) {
> 		/* dentry is deeper than ino_path */
> 		return false;
> 	}
> 	return true;
> }

I like this new version.

> 
> If you think this is not enough, can you suggest what would be needed?
> I'm thinking maybe we can check dentry seqcount to be safe, but from
> earlier discussion in "bpf path iterator" my impression is that that is
> VFS internal data - can we use it here (if needed)?
> 
> (I assume, from looking at the code, just having a reference does not
> prevent d_name from changing)
> 
>