[PATCH v3 fanotify 2/2] samples/fanotify: Add a sample fanotify fiter

Mon Dec 2 17:38:34 UTC 2024

> On Nov 28, 2024, at 9:10 AM, Jan Kara <jack at suse.cz> wrote:
> 
> On Wed 27-11-24 02:16:09, Song Liu wrote:
>>> On Nov 26, 2024, at 4:50 PM, Alexei Starovoitov <alexei.starovoitov at gmail.com> wrote:
>>> 
>> 
>> [...]
>> 
>>>>> +
>>>>> +static void sample_filter_free(struct fanotify_filter_hook *filter_hook)
>>>>> +{
>>>>> +       struct fan_filter_sample_data *data = filter_hook->data;
>>>>> +
>>>>> +       path_put(&data->subtree_path);
>>>>> +       kfree(data);
>>>>> +}
>>>>> +
>>>> 
>>>> Hi Song,
>>>> 
>>>> This example looks fine but it raises a question.
>>>> This filter will keep the mount of subtree_path busy until the group is closed
>>>> or the filter is detached.
>>>> This is probably fine for many services that keep the mount busy anyway.
>>>> 
>>>> But what if this wasn't the intention?
>>>> What if an Anti-malware engine that watches all mounts wanted to use that
>>>> for configuring some ignore/block subtree filters?
>>>> 
>>>> One way would be to use a is_subtree() variant that looks for a
>>>> subtree root inode
>>>> number and then verifies it with a subtree root fid.
>>>> A production subtree filter will need to use a variant of is_subtree()
>>>> anyway that
>>>> looks for a set of subtree root inodes, because doing a loop of is_subtree() for
>>>> multiple paths is a no go.
>>>> 
>>>> Don't need to change anything in the example, unless other people
>>>> think that we do need to set a better example to begin with...
>>> 
>>> I think we have to treat this patch as a real filter and not as an example
>>> to make sure that the whole approach is workable end to end.
>>> The point about not holding path/dentry is very valid.
>>> The algorithm needs to support that.
>> 
>> Hmm.. I am not sure whether we cannot hold a refcount. If that is a 
>> requirement, the algorithm will be more complex.
> 
> Well, for production use that would certainly be a requirement. Many years
> ago dnotify (the first fs notification subsystem) was preventing
> filesystems from being unmounted because it required open file and it was a
> pain.
> 
>> IIUC, fsnotify_mark on a inode does not hold a refcount to inode.
> 
> The connector (head of the mark list) does hold inode reference. But we
> have a hook in the unmount path (fsnotify_unmount_inodes()) which drops all
> the marks and connectors for the filesystem.
> 
>> And when the inode is evicted, the mark is freed. I guess this 
>> requires the user space, the AntiVirus scanner for example, to 
>> hold a reference to the inode? If this is the case, I think it 
>> is OK for the filter, either bpf or kernel module, to hold a 
>> reference to the subtree root.
> 
> No, fsnotify pins the inodes in memory (which if fine) but releases them
> when unmount should happen. Userspace doesn't need to pin anything.

To get similar logic for the filter (module or BPF), we will need 
another call back. We should add this, though we may not use it 
in the subtree monitoring case. 

> 
>>> It may very well turn out that the logic of handling many filters
>>> without a loop and not grabbing a path refcnt is too complex for bpf.
>>> Then this subtree filtering would have to stay as a kernel module
>>> or extra flag/feature for fanotify.
>> 
>> Handling multiple subtrees is indeed an issue. Since we rely on 
>> the mark in the SB, multiple subtrees under the same SB will share
>> that mark. Unless we use some cache, accessing a file will 
>> trigger multiple is_subdir() calls. 
>> 
>> One possible solution is that have a new helper that checks
>> is_subdir() for a list of parent subtrees with a single series
>> of dentry walk. IOW, something like:
>> 
>> bool is_subdir_of_any(struct dentry *new_dentry, 
>>                      struct list_head *list_of_dentry).
>> 
>> For BPF, one possible solution is to walk the dentry tree 
>> up to the root, under bpf_rcu_read_lock().
> 
> I can see two possible issues with this. Firstly, you don't have list_head
> in a dentry you could easily use to pass dentries to a function like this.
> Probably you'll need an external array with dentry pointers or something
> like that.
> 
> Second issue is more inherent in the BPF filter approach - if there would
> be more notification groups each watching for some subtree (like users
> watching their home dirs, apps watching their subtrees with data etc.), then
> we'd still end up traversing the directory tree for each such notification
> group. That seems suboptimal but I have to think how much we care how we
> could possibly avoid that.

For the subtree monitoring example, maybe "monitor the whole sb 
and filter the events" is not the right approach. Maintaining a 
mark on each directory under the subtree might be a better approach. 
Any comments or suggestions on this?

Thanks,
Song