[PATCH bpf-next 3/4] bpf: Introduce path iterator

Thu May 29 19:46:00 UTC 2025

On Thu, May 29, 2025 at 11:35 AM Al Viro <viro at zeniv.linux.org.uk> wrote:
>
> On Thu, May 29, 2025 at 11:00:51AM -0700, Song Liu wrote:
> > On Thu, May 29, 2025 at 10:38 AM Al Viro <viro at zeniv.linux.org.uk> wrote:
> > >
> > > On Thu, May 29, 2025 at 09:53:21AM -0700, Song Liu wrote:
> > >
> > > > Current version of path iterator only supports walking towards the root,
> > > > with helper path_parent. But the path iterator API can be extended
> > > > to cover other use cases.
> > >
> > > Clarify the last part, please - call me paranoid, but that sounds like
> > > a beginning of something that really should be discussed upfront.
> >
> > We don't have any plan with future use cases yet. The only example
> > I mentioned in the original version of the commit log is "walk the
> > mount tree". IOW, it is similar to the current iterator, but skips non
> > mount point iterations.
> >
> > Since we call it "path iterator", it might make sense to add ways to
> > iterate the VFS tree in different patterns. For example, we may
> > have an iterator that iterates all files within a directory. Again, we
> > don't see urgent use cases other than the current "walk to root"
> > iterator.
>
> What kinds of locking environments can that end up used in?

This will start with a referenced "struct path", in a sleepable context.

> The reason why I'm getting more and more unhappy with this thing is
> that it sounds like a massive headache for any correctness analysis in
> VFS work.
>
> Going straight to the root starting at a point you already have pinned
> is relatively mild - you can't do path_put() in any blocking contexts,
> obviously, and you'd better be careful with what you are doing on
> mountpoint traversal (e.g. combined with "now let's open that directory
> and read it" it's an instant "hell, no" - you could easily bypass MNT_LOCKED
> restrictions that way), but if there's a threat of that getting augmented
> with other things (iterating through all files in directory would be
> a very different beast from the locking POV, if nothing else)... ouch.

We are fully aware that a "files in the directory" iterator may need
different locking. This is the exact reason we want to provide this
logic as an iterator in the kernel: to get locking/etc correct in the
first place, so that the users can avoid making mistakes.

> Basically, you are creating a spot we will need to watch very carefully
> from now on.  And the rationale appears to include "so that we could
> expose that to random out-of-tree code that decided to call itself LSM",
> so pardon me for being rather suspicious about the details.

No matter what we call them, these use cases exist, out-of-tree or
in-tree, as BPF programs or kernel modules. We are learning from
Landlock here, simply because it is probably the best way to achieve
this.

This particular set introduces a safer API than combinations of
existing APIs (follow_up(), dget_parent(), etc.). It guarantees all
the memory accesses are to properly referenced kernel objects;
it also guaranteed all the acquired references are released.
Therefore, I don't see it adds risks in any sense.

Thanks,
Song