[RFC PATCH 1/6] fs/9p: Add ability to identify inode by path for .L

Fri Jul 11 19:11:44 UTC 2025

Hi Al, thanks for the review :)  I haven't had the chance to properly
think about this until today, so apologies for the delay.

On 7/5/25 01:19, Al Viro wrote:
> On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
> 
>> +struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
>> +{
>> +	struct v9fs_ino_path *path;
>> +	size_t path_components = 0;
>> +	struct dentry *curr = dentry;
>> +	ssize_t i;
>> +
>> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
>> +
>> +	rcu_read_lock();
>> +
>> +    /* Don't include the root dentry */
>> +	while (curr->d_parent != curr) {
>> +		path_components++;
>> +		curr = curr->d_parent;
>> +	}
>> +	if (WARN_ON(path_components > SSIZE_MAX)) {

(Looking at this again I think this check is a bit bogus.  I don't know
how would it be possible at all for us to have > SSIZE_MAX deep
directories especially since each level requires a dentry allocation, but
even if this check is actually useful, it should be in the while loop,
before each path_components++)

>> +		rcu_read_unlock();
>> +		return NULL;
>> +	}
>> +
>> +	path = kmalloc(struct_size(path, names, path_components),
>> +		       GFP_KERNEL);
> 
> Blocking allocation under rcu_read_lock().

I think my first instinct of how to fix this, if the original code is
correct barring this allocation issue, would be to take rcu read lock
twice (first walk to calculate how much to allocate, then second walk to
actually take the snapshots).  We should be safe to rcu_read_unlock() in
the middle as long as caller has a reference to the target dentry (this
needs to be true even if we just do one rcu_read_lock() anyway), and we
can start a parent walk again.  The v9fs rename_sem should ensure we see
the same path again.

Alternatively, we can use dget_parent to do the walk, and not lock RCU at
all.  We still need to walk twice tho, to know how much to allocate.  But
for now I will keep the current approach.

New version:

/*
 * Must hold rename_sem due to traversing parents.  Caller must hold
 * reference to dentry.
 */
struct v9fs_ino_path *make_ino_path(struct dentry *dentry)
{
	struct v9fs_ino_path *path;
	size_t path_components = 0;
	struct dentry *curr = dentry;
	ssize_t i;

	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
	might_sleep(); /* Allocation below might block */

	rcu_read_lock();

	/* Don't include the root dentry */
	while (curr->d_parent != curr) {
		if (WARN_ON(path_components >= SSIZE_MAX)) {
			rcu_read_unlock();
			return NULL;
		}
		path_components++;
		curr = curr->d_parent;
	}

	/*
	 * Allocation can block so don't do it in RCU (and because the
	 * allocation might be large, since name_snapshot leaves space for
	 * inline str, not worth trying GFP_ATOMIC)
	 */
	rcu_read_unlock();

	path = kmalloc(struct_size(path, names, path_components), GFP_KERNEL);
	if (!path) {
		rcu_read_unlock();
		return NULL;
	}

	path->nr_components = path_components;
	curr = dentry;

	rcu_read_lock();
	for (i = path_components - 1; i >= 0; i--) {
		take_dentry_name_snapshot(&path->names[i], curr);
		curr = curr->d_parent;
	}
	WARN_ON(curr != curr->d_parent);
	rcu_read_unlock();
	return path;
}

How does this look?

On 7/5/25 01:25, Al Viro wrote:
> On Sun, Apr 06, 2025 at 09:43:02PM +0100, Tingmao Wang wrote:
>> +bool ino_path_compare(struct v9fs_ino_path *ino_path,
>> +			     struct dentry *dentry)
>> +{
>> +	struct dentry *curr = dentry;
>> +	struct qstr *curr_name;
>> +	struct name_snapshot *compare;
>> +	ssize_t i;
>> +
>> +	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);
>> +
>> +	rcu_read_lock();
>> +	for (i = ino_path->nr_components - 1; i >= 0; i--) {
>> +		if (curr->d_parent == curr) {
>> +			/* We're supposed to have more components to walk */
>> +			rcu_read_unlock();
>> +			return false;
>> +		}
>> +		curr_name = &curr->d_name;
>> +		compare = &ino_path->names[i];
>> +		/*
>> +		 * We can't use hash_len because it is salted with the parent
>> +		 * dentry pointer.  We could make this faster by pre-computing our
>> +		 * own hashlen for compare and ino_path outside, probably.
>> +		 */
>> +		if (curr_name->len != compare->name.len) {
>> +			rcu_read_unlock();
>> +			return false;
>> +		}
>> +		if (strncmp(curr_name->name, compare->name.name,
>> +			    curr_name->len) != 0) {
> 
> ... without any kind of protection for curr_name.  Incidentally,
> what about rename()?  Not a cross-directory one, just one that
> changes the name of a subdirectory within the same parent?

As far as I can tell, in v9fs_vfs_rename, v9ses->rename_sem is taken for
both same-parent and different parent renames, so I think we're safe here
(and hopefully for any v9fs dentries, nobody should be causing d_name to
change except for ourselves when we call d_move in v9fs_vfs_rename?  If
yes then because we also take v9ses->rename_sem, in theory we should be
fine here...?)

(Let me know if I missed anything.  I'm assuming only the filesystem
"owning" a dentry should d_move/d_exchange the dentry.)

However, I see that there is a d_same_name function in dcache.c which is
slightly more careful (but still requires the caller to check the dentry
seqcount, which we do not need to because of the reasoning above), and in
hindsight I think that is probably the more proper way to do this
comparison (and will also handle case-insensitivity, although I've not
explored if this is applicable to 9pfs).

New version:

/*
 * Must hold rename_sem due to traversing parents
 */
bool ino_path_compare(struct v9fs_ino_path *ino_path, struct dentry *dentry)
{
	struct dentry *curr = dentry;
	struct name_snapshot *compare;
	ssize_t i;

	lockdep_assert_held_read(&v9fs_dentry2v9ses(dentry)->rename_sem);

	rcu_read_lock();
	for (i = ino_path->nr_components - 1; i >= 0; i--) {
		if (curr->d_parent == curr) {
			/* We're supposed to have more components to walk */
			rcu_read_unlock();
			return false;
		}
		compare = &ino_path->names[i];
		if (!d_same_name(curr, curr->d_parent, &compare->name)) {
			rcu_read_unlock();
			return false;
		}
		curr = curr->d_parent;
	}
	rcu_read_unlock();
	if (curr != curr->d_parent) {
		/* dentry is deeper than ino_path */
		return false;
	}
	return true;
}

If you think this is not enough, can you suggest what would be needed?
I'm thinking maybe we can check dentry seqcount to be safe, but from
earlier discussion in "bpf path iterator" my impression is that that is
VFS internal data - can we use it here (if needed)?

(I assume, from looking at the code, just having a reference does not
prevent d_name from changing)