[PATCH bpf-next 3/4] bpf: Introduce path iterator

Al Viro viro at zeniv.linux.org.uk
Fri May 30 18:43:48 UTC 2025


On Fri, May 30, 2025 at 02:20:39PM +0200, Mickaël Salaün wrote:

> Without access to mount_lock, what would be the best way to fix this
> Landlock issue while making it backportable?
> 
> > 
> > If we update path_parent in this patchset with choose_mountpoint(),
> > and use it in Landlock, we will close this race condition, right?
> 
> choose_mountpoint() is currently private, but if we add a new filesystem
> helper, I think the right approach would be to expose follow_dotdot(),
> updating its arguments with public types.  This way the intermediates
> mount points will not be exposed, RCU optimization will be leveraged,
> and usage of this new helper will be simplified.

IMO anything that involves struct nameidata should remain inside
fs/namei.c - something public might share helpers with it, but that's
it.  We had more than enough pain on changes in there, and I'm pretty
sure that we are not done yet; in the area around atomic_open, but not
only there.  Parts of that are still too subtle, IMO - it got a lot
better over the years, but I would really prefer to avoid the need
to bring more code into analysis for any further massage.

Are you sure that follow_dotdot() behaviour is what you really want?

Note that it's not quite how the pathname resolution works.  There we
have the result of follow_dotdot() fed to step_into(), and that changes
things.  Try this:

mkdir /tmp/foo
mkdir /tmp/foo/bar
cd /tmp/foo/bar
mount -t tmpfs none /tmp/foo
touch /tmp/foo/x
ls -Uldi . .. /tmp/foo ../.. /tmp ../x

and think about the results.  Traversing .. is basically "follow_up as much
as possible, then to parent, then follow_down as much as possible" and
the last part (../x) explains why we do it that way.

Which objects would you want to iterate through when dealing with the
current directory in the experiment above?  Simulation of pathwalk
would have the root of overmounting filesystem as the second object
visited; follow_dotdot() would yield the directory overmounted by
that instead.

I'm not saying that either behaviour is right for your case - just that
they are not identical and it's something that needs to be consciously
chosen.



More information about the Linux-security-module-archive mailing list