[RFC v2] lsm: fs: Use inode_free_callback to free i_security in RCU callback

Thu Dec 19 03:03:46 UTC 2024

Hi Dave, 

Thanks for your comments!

> On Dec 18, 2024, at 5:47 PM, Dave Chinner <david at fromorbit.com> wrote:
> 
> On Wed, Dec 18, 2024 at 01:16:15PM -0800, Song Liu wrote:
>> inode->i_security needes to be freed from RCU callback. A rcu_head was
>> added to i_security to call the RCU callback. However, since struct inode
>> already has i_rcu, the extra rcu_head is wasteful. Specifically, when any
>> LSM uses i_security, a rcu_head (two pointers) is allocated for each
>> inode.
>> 
>> Rename i_callback to inode_free_callback and call security_inode_free_rcu
>> from it to free i_security so that a rcu_head is saved for each inode.
>> Special care are needed for file systems that provide a destroy_inode()
>> callback, but not a free_inode() callback. Specifically, the following
>> logic are added to handle such cases:
>> 
>> - XFS recycles inode after destroy_inode. The inodes are freed from
>>   recycle logic. Let xfs_inode_free_callback() call inode_free_callback.
> 
> NACK. That's a massive layering violation.
> 
> LSMs are VFS layer functionality. They *must* be removed from the
> VFS inode before ->destroy_inode() is called. If a filesystem
> supplies ->destroy_inode(), then it -owns- the inode exclusively
> from that point onwards. All VFS and 3rd party stuff hanging off the
> inode must be removed from the inode before ->destroy_inode() is
> called.

To be honest, I am not sure this rule is true. Most filesystems 
provide a free_inode() callback:

$ grep "\.free_inode.*=" fs/*/*.c | wc
     54     221    2808

For all these cases, VFS layer is still in charge of freeing the
inode after a RCU grace period. From vfs.rst:

``free_inode``
        this method is called from RCU callback. If you use call_rcu()
        in ->destroy_inode to free 'struct inode' memory, then it's
        better to release memory in this method.

> Them's the rules, and hacking around LSM object allocation/freeing
> to try to handle how filesystems manage inodes -underneath- the VFS
> is just asking for problems. LSM object management needs to be
> handled entirely by the generic LSM VFS hooks, not tightly coupled
> to VFS and/or low level filesystem inode lifecycle management.

Unfortunately, many filesystems already call various security_* 
hooks directly:

$ grep "include <linux/security.h>" fs/*/*.c -rI | wc
     46      92    2144
(46 files in various filesystems call into the security layer.)

Therefore, I don't feel this change alone is massive layering 
violation. Instead, it is probably making existing layering 
violation a bit worse. I personally think it is not making
things too worse. On the other hand, saving a rcu_head per 
inode is non-trivial savings. 

It worth note that many systems allocate memory on i_security, 
even the user is not using any LSM. This is because some LSMs 
can only be disabled at compile time. Distro kernels are more 
likely to enable them to fit various users' needs. Therefore, 
the i_security->rcu_head "tax" is not somebody else's problem. 
We probably just don't know we are paying it. 

That being said, I am not too attached to this change. If folks
think allocating an extra rcu_head per inode is the better 
trade-off. I am OK with that answer. I put "RFC" there for 
comments. 

I hope this makes sense. 

Thanks again for your comments. 

Song