[PATCH v5 bpf-next 0/5] bpf path iterator

Christian Brauner brauner at kernel.org
Mon Jul 7 10:46:41 UTC 2025


On Fri, Jun 27, 2025 at 08:51:21AM +1000, NeilBrown wrote:
> On Fri, 27 Jun 2025, Song Liu wrote:
> > 
> > 
> > > On Jun 26, 2025, at 3:22 AM, NeilBrown <neil at brown.name> wrote:
> > 
> > [...]
> > 
> > >> I guess I misunderstood the proposal of vfs_walk_ancestors() 
> > >> initially, so some clarification:
> > >> 
> > >> I think vfs_walk_ancestors() is good for the rcu-walk, and some 
> > >> rcu-then-ref-walk. However, I don’t think it fits all use cases. 
> > >> A reliable step-by-step ref-walk, like this set, works well with 
> > >> BPF, and we want to keep it.
> > > 
> > > The distinction between rcu-walk and ref-walk is an internal
> > > implementation detail.  You as a caller shouldn't need to think about
> > > the difference.  You just want to walk.  Note that LOOKUP_RCU is
> > > documented in namei.h as "semi-internal".  The only uses outside of
> > > core-VFS code is in individual filesystem's d_revalidate handler - they
> > > are checking if they are allowed to sleep or not.  You should never
> > > expect to pass LOOKUP_RCU to an VFS API - no other code does.
> > > 
> > > It might be reasonable for you as a caller to have some control over
> > > whether the call can sleep or not.  LOOKUP_CACHED is a bit like that.
> > > But for dotdot lookup the code will never sleep - so that is not
> > > relevant.
> > 
> > Unfortunately, the BPF use case is more complicated. In some cases, 
> > the callback function cannot be call in rcu critical sections. For 
> > example, the callback may need to read xatter. For these cases, we
> > we cannot use RCU walk at all. 
> 
> I really think you should stop using the terms RCU walk and ref-walk.  I
> think they might be focusing your thinking in an unhelpful direction.

Thank you! I really appreciate you helping to shape this API and it
aligns a lot with my thinking.

> The key issue about reading xattrs is that it might need to sleep.
> Focusing on what might need to sleep and what will never need to sleep
> is a useful approach - the distinction is wide spread in the kernel and
> several function take a flag indicating if they are permitted to sleep,
> or if failure when sleeping would be required.
> 
> So your above observation is better described as 
> 
>    The vfs_walk_ancestors() API has an (implicit) requirement that the
>    callback mustn't sleep.  This is a problem for some use-cases
>    where the call back might need to sleep - e.g. for accessing xattrs.
> 
> That is a good and useful observation.  I can see three possibly
> responses:
> 
> 1/ Add a vfs_walk_ancestors_maysleep() API for which the callback is
>    always allowed to sleep.  I don't particularly like this approach.

Agreed.

> 
> 2/ Use repeated calls to vfs_walk_parent() when the handling of each
>    ancestor might need to sleep.  I see no problem with supporting both
>    vfs_walk_ancestors() and vfs_walk_parent().  There is plenty of
>    precedent for having different  interfaces for different use cases.

Meh.

> 
> 3/ Extend vfs_walk_ancestors() to pass a "may sleep" flag to the callback.

I think that's fine.



More information about the Linux-security-module-archive mailing list