[PATCH bpf-next 2/4] bpf: Make bpf inode storage available to tracing program

Fri Nov 15 17:35:39 UTC 2024

Hi Jan, 

> On Nov 15, 2024, at 3:19 AM, Jan Kara <jack at suse.cz> wrote:

[...]

>> AFAICT, we need to modify how lsm blob are managed with 
>> CONFIG_BPF_SYSCALL=y && CONFIG_BPF_LSM=n case. The solution, even
>> if it gets accepted, doesn't really save any memory. Instead of 
>> growing struct inode by 8 bytes, the solution will allocate 8
>> more bytes to inode->i_security. So the total memory consumption
>> is the same, but the memory is more fragmented.
> 
> I guess you've found a better solution for this based on James' suggestion.
> 
>> Therefore, I think we should really step back and consider adding
>> the i_bpf_storage to struct inode. While this does increase the
>> size of struct inode by 8 bytes, it may end up with less overall
>> memory consumption for the system. This is why. 
>> 
>> When the user cannot use inode local storage, the alternative is 
>> to use hash maps (use inode pointer as key). AFAICT, all hash maps 
>> comes with non-trivial overhead, in memory consumption, in access 
>> latency, and in extra code to manage the memory. OTOH, inode local 
>> storage doesn't have these issue, and is usually much more efficient: 
>> - memory is only allocated for inodes with actual data, 
>> - O(1) latency, 
>> - per inode data is freed automatically when the inode is evicted. 
>> Please refer to [1] where Amir mentioned all the work needed to 
>> properly manage a hash map, and I explained why we don't need to 
>> worry about these with inode local storage.
> 
> Well, but here you are speaking of a situation where bpf inode storage
> space gets actually used for most inodes. Then I agree i_bpf_storage is the
> most economic solution. But I'd also expect that for vast majority of
> systems the bpf inode storage isn't used at all and if it does get used, it
> is used only for a small fraction of inodes. So we are weighting 8 bytes
> per inode for all those users that don't need it against more significant
> memory savings for users that actually do need per inode bpf storage. A
> factor in this is that a lot of people are running some distribution kernel
> which generally enables most config options that are at least somewhat
> useful. So hiding the cost behind CONFIG_FOO doesn't really help such
> people.

Agreed that an extra pointer will be used if there is no actual users
of it. However, in longer term, "most users do not use bpf inode
storage" may not be true. As kernel engineers, we may not always notice 
when user space is using some BPF features. For example, systemd has
a BPF LSM program "restrict_filesystems" [1]. It is enabled if the 
user have lsm=bpf in kernel args. I personally noticed it as a 
surprise when we enabled lsm=bpf. 

> I'm personally not *so* hung up about a pointer in struct inode but I can
> see why Christian is and I agree adding a pointer there isn't a win for
> everybody.

I can also understand Christian's motivation. However, I am a bit
frustrated because similar approach (adding a pointer to the struct) 
worked fine for other popular data structures: task_struct, sock, 
cgroup. 

> Longer term, I think it may be beneficial to come up with a way to attach
> private info to the inode in a way that doesn't cost us one pointer per
> funcionality that may possibly attach info to the inode. We already have
> i_crypt_info, i_verity_info, i_flctx, i_security, etc. It's always a tough
> call where the space overhead for everybody is worth the runtime &
> complexity overhead for users using the functionality...

It does seem to be the right long term solution, and I am willing to 
work on it. However, I would really appreciate some positive feedback
on the idea, so that I have better confidence my weeks of work has a 
better chance to worth it. 

Thanks,
Song

[1] https://github.com/systemd/systemd/blob/main/src/core/bpf/restrict_fs/restrict-fs.bpf.c