[PATCH v5 bpf-next 0/5] bpf path iterator
Christian Brauner
brauner at kernel.org
Mon Jul 7 10:46:41 UTC 2025
On Fri, Jun 27, 2025 at 08:51:21AM +1000, NeilBrown wrote:
> On Fri, 27 Jun 2025, Song Liu wrote:
> >
> >
> > > On Jun 26, 2025, at 3:22 AM, NeilBrown <neil at brown.name> wrote:
> >
> > [...]
> >
> > >> I guess I misunderstood the proposal of vfs_walk_ancestors()
> > >> initially, so some clarification:
> > >>
> > >> I think vfs_walk_ancestors() is good for the rcu-walk, and some
> > >> rcu-then-ref-walk. However, I don’t think it fits all use cases.
> > >> A reliable step-by-step ref-walk, like this set, works well with
> > >> BPF, and we want to keep it.
> > >
> > > The distinction between rcu-walk and ref-walk is an internal
> > > implementation detail. You as a caller shouldn't need to think about
> > > the difference. You just want to walk. Note that LOOKUP_RCU is
> > > documented in namei.h as "semi-internal". The only uses outside of
> > > core-VFS code is in individual filesystem's d_revalidate handler - they
> > > are checking if they are allowed to sleep or not. You should never
> > > expect to pass LOOKUP_RCU to an VFS API - no other code does.
> > >
> > > It might be reasonable for you as a caller to have some control over
> > > whether the call can sleep or not. LOOKUP_CACHED is a bit like that.
> > > But for dotdot lookup the code will never sleep - so that is not
> > > relevant.
> >
> > Unfortunately, the BPF use case is more complicated. In some cases,
> > the callback function cannot be call in rcu critical sections. For
> > example, the callback may need to read xatter. For these cases, we
> > we cannot use RCU walk at all.
>
> I really think you should stop using the terms RCU walk and ref-walk. I
> think they might be focusing your thinking in an unhelpful direction.
Thank you! I really appreciate you helping to shape this API and it
aligns a lot with my thinking.
> The key issue about reading xattrs is that it might need to sleep.
> Focusing on what might need to sleep and what will never need to sleep
> is a useful approach - the distinction is wide spread in the kernel and
> several function take a flag indicating if they are permitted to sleep,
> or if failure when sleeping would be required.
>
> So your above observation is better described as
>
> The vfs_walk_ancestors() API has an (implicit) requirement that the
> callback mustn't sleep. This is a problem for some use-cases
> where the call back might need to sleep - e.g. for accessing xattrs.
>
> That is a good and useful observation. I can see three possibly
> responses:
>
> 1/ Add a vfs_walk_ancestors_maysleep() API for which the callback is
> always allowed to sleep. I don't particularly like this approach.
Agreed.
>
> 2/ Use repeated calls to vfs_walk_parent() when the handling of each
> ancestor might need to sleep. I see no problem with supporting both
> vfs_walk_ancestors() and vfs_walk_parent(). There is plenty of
> precedent for having different interfaces for different use cases.
Meh.
>
> 3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback.
I think that's fine.
More information about the Linux-security-module-archive
mailing list