[PATCH v5 bpf-next 0/5] bpf path iterator
Song Liu
songliubraving at meta.com
Mon Jul 7 18:50:12 UTC 2025
Hi Christian,
Thanks for your comments!
> On Jul 7, 2025, at 4:17 AM, Christian Brauner <brauner at kernel.org> wrote:
[...]
>>> 3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback.
>>
>> I think that's fine.
>
> Ok, sorry for the delay but there's a lot of different things going on
> right now and this one isn't exactly an easy thing to solve.
>
> I mentioned this before and so did Neil: the lookup implementation
> supports two modes sleeping and non-sleeping. That api is abstracted
> away as heavily as possible by the VFS so that non-core code will not be
> exposed to it other than in exceptional circumstances and doesn't have
> to care about it.
>
> It is a conceptual dead-end to expose these two modes via separate APIs
> and leak this implementation detail into non-core code. It will not
> happen as far as I'm concerned.
>
> I very much understand the urge to get the refcount step-by-step thing
> merged asap. Everyone wants their APIs merged fast. And if it's
> reasonable to move fast we will (see the kernfs xattr thing).
>
> But here are two use-cases that ask for the same thing with different
> constraints that closely mirror our unified approach. Merging one
> quickly just to have something and then later bolting the other one on
> top, augmenting, or replacing, possible having to deprecate the old API
> is just objectively nuts. That's how we end up with a spaghetthi helper
> collection. We want as little helper fragmentation as possible.
>
> We need a unified API that serves both use-cases. I dislike
> callback-based APIs generally but we have precedent in the VFS for this
> for cases where the internal state handling is delicate enough that it
> should not be exposed (see __iterate_supers() which does exactly work
> like Neil suggested down to the flag argument itself I added).
>
> So I'm open to the callback solution.
>
> (Note for really absurd perf requirements you could even make it work
> with static calls I'm pretty sure.)
I guess we will go with Mickaël’s idea:
> int vfs_walk_ancestors(struct path *path,
> bool (*walk_cb)(const struct path *ancestor, void *data),
> void *data, int flags)
>
> The walk continue while walk_cb() returns true. walk_cb() can then
> check if @ancestor is equal to a @root, or other properties. The
> walk_cb() return value (if not bool) should not be returned by
> vfs_walk_ancestors() because a walk stop doesn't mean an error.
If necessary, we hide “root" inside @data. This is good.
> @path would be updated with latest ancestor path (e.g. @root).
Update @path to the last ancestor and hold proper references.
I missed this part earlier. With this feature, vfs_walk_ancestors
should work usable with open-codeed bpf path iterator.
I have a question about this behavior with RCU walk. IIUC, RCU
walk does not hold reference to @ancestor when calling walk_cb().
If walk_cb() returns false, shall vfs_walk_ancestors() then
grab a reference on @ancestor? This feels a bit weird to me.
Maybe “updating @path to the last ancestor” should only apply to
LOOKUP_RCU==false case?
> @flags could contain LOOKUP_RCU or not, which enables us to have
> walk_cb() not-RCU compatible.
>
> When passing LOOKUP_RCU, if the first call to vfs_walk_ancestors()
> failed with -ECHILD, the caller can restart the walk by calling
> vfs_walk_ancestors() again but without LOOKUP_RCU.
Given we want callers to handle -ECHILD and call vfs_walk_ancestors
again without LOOKUP_RCU, I think we should keep @path not changed
With LOOKUP_RCU==true, and only update it to the last ancestor
when LOOKUP_RCU==false.
With this behavior, landlock code will be like:
/* Assume we hold reference on “path”.
* With LOOKUP_RCU, path will not change, we don’t need
* extra reference on “path”.
*/
err = vfs_walk_ancestors(path, ll_cb, data, LOOKUP_RCU);
/*
* At this point, whether err is 0 or not, path is not
* changed.
*/
if (err == -ECHILD) {
struct path walk_path = *path;
/* reset any data changed by the walk */
reset_data(data);
/* get a reference on walk_path. */
path_get(&walk_path);
err = vfs_walk_ancestors(&walk_path, ll_cb, data, 0);
/* Now, walk_path might be updated */
/* Always release reference on walk_path */
path_put(&walk_path);
}
BPF path iterator sode will look like:
static bool bpf_cb(const struct path *ancestor, void *data)
{
return false;
}
struct path *bpf_iter_path_next(struct bpf_iter_path *it)
{
struct bpf_iter_path_kern *kit = (void *)it;
if (vfs_walk_ancestors(&kit->path, bpf_cb, NULL))
return NULL;
return &kit->path;
}
Does this sound reasonable to every body?
Thanks,
Song
More information about the Linux-security-module-archive
mailing list