[RFC/PATCH v2 bpf-next fanotify 7/7] selftests/bpf: Add test for BPF based fanotify fastpath handler

Alexei Starovoitov alexei.starovoitov at gmail.com
Fri Nov 15 20:04:02 UTC 2024


On Thu, Nov 14, 2024 at 11:26 PM Amir Goldstein <amir73il at gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 9:14 PM Alexei Starovoitov
> <alexei.starovoitov at gmail.com> wrote:
> >
> > On Thu, Nov 14, 2024 at 12:44 AM Song Liu <song at kernel.org> wrote:
> > >
> > > +
> > > +       if (bpf_is_subdir(dentry, v->dentry))
> > > +               ret = FAN_FP_RET_SEND_TO_USERSPACE;
> > > +       else
> > > +               ret = FAN_FP_RET_SKIP_EVENT;
> >
> > It seems to me that all these patches and feature additions
> > to fanotify, new kfuncs, etc are done just to do the above
> > filtering by subdir ?
> >
> > If so, just hard code this logic as an extra flag to fanotify ?
> > So it can filter all events by subdir.
> > bpf programmability makes sense when it needs to express
> > user space policy. Here it's just a filter by subdir.
> > bpf hammer doesn't look like the right tool for this use case.
>
> Good question.
>
> Speaking as someone who has made several attempts to design
> efficient subtree filtering in fanotify, it is not as easy as it sounds.
>
> I recently implemented a method that could be used for "practical"
> subdir filtering in userspace, not before Jan has questioned if we
> should go directly to subtree filtering with bpf [1].
>
> This is not the only filter that was proposed for fanotify, where bpf
> filter came as an alternative proposal [2], but subtree filtering is by far
> the most wanted filter.

Interesting :)
Thanks for the links. Long threads back then.
Looks like bpf filtering was proposed 2 and 4 years ago and
Song's current attempt is the first real prototype that actually
implements what was discussed for so long?
Cool. I guess it has legs then.

> The problem with implementing a naive is_subtree() filter in fanotify
> is the unbounded cost to be paid by every user for every fs access
> when M such filters are installed deep in the fs tree.

Agree that the concern is real.
I just don't see how filtering in bpf vs filtering in the kernel
is going to be any different.
is_subdir() is fast, so either kernel or bpf prog calling it
on every file_open doesn't seem like a big deal.
Are you saying that 'naive is_subdir' would call is_subdir
M times for every such filter ?
I guess it can be optimized. When these M filters are installed
the kernel can check whether they're subdirs of each other
and filter a subset of M filters with a single is_subdir() call.
Same logic can be in either bpf prog or in the kernel.

> Making this more efficient then becomes a matter of trading of
> memory (inode/path cache size) and performance and depends
> on the size and depth of the watched filesystem.
> This engineering decision *is* the userspace policy that can be
> expressed by a bpf program.
>
> As you may know, Linux is lagging behind Win and MacOS w.r.t
> subtree filtering for fs events.
>
> MacOS/FreeBSD took the userspace approach with fseventsd [3].
> If you Google "fseventsd", you will get results with "High CPU and
> Memory Usage" for as far as the browser can scroll.

Yeah. Fun.

Looking at your 2 year old attempt it seems the key part was to
make sure inodes are not pinned for filtering purpose?
How would you call is_subdir() then if parent dentry is not dget() ?
Or meaning of 'pinning' is different?

> I hope the bpf-aided early subtree filtering technology would be
> able to reduce some of this overhead to facilitate a better engineering
> solution, but that remains to be proven...

Sure. Let's explore this further.

> Thanks,
> Amir.
>
> [1] https://lore.kernel.org/linux-fsdevel/20220228140556.ae5rhgqsyzm5djbp@quack3.lan/
> [2] https://lore.kernel.org/linux-fsdevel/20200828084603.GA7072@quack2.suse.cz/
> [3] https://developer.apple.com/library/archive/documentation/Darwin/Conceptual/FSEvents_ProgGuide/TechnologyOverview/TechnologyOverview.html#//apple_ref/doc/uid/TP40005289-CH3-SW1



More information about the Linux-security-module-archive mailing list