[PATCH v10 18/27] integrity/ima: Define ns_status for storing namespaced iint data

Thu Feb 24 02:49:35 UTC 2022

On 2/23/22 21:21, Stefan Berger wrote:
>
> On 2/23/22 11:12, Mimi Zohar wrote:
>> On Tue, 2022-02-01 at 15:37 -0500, Stefan Berger wrote:
>>> From: Mehmet Kayaalp <mkayaalp at linux.vnet.ibm.com>
>>>
>>> Add an rbtree to the IMA namespace structure that stores a namespaced
>>> version of iint->flags in ns_status struct. Similar to the
>>> integrity_iint_cache, both the iint and ns_status are looked up 
>>> using the
>>> inode pointer value. The lookup, allocate, and insertion code is also
>>> similar.
>>>
>>> In subsequent patches we will have to find all ns_status entries an 
>>> iint
>>> is being used in and reset flags there. To do this, connect a list of
>>> ns_status to the integrity_iint_cache and provide a reader-writer
>>> lock in the integrity_iint_cache to lock access to the list.
>>>
>>> To simplify the code in the non-namespaces case embed an ns_status in
>>> the integrity_iint_cache and have it linked into the iint's 
>>> ns_status list
>>> when calling ima_get_ns_status().
>>>
>>> When getting an ns_status first try to find it in the RB tree. Here 
>>> we can
>>> run into the situation that an ns_status found in the RB tree has a
>>> different iint associated with it for the same inode. In this case 
>>> we need
>>> to delete the ns_status structure and get a new one.
>>>
>>> There are two cases for freeing:
>>> - when the iint is freed (inode deletion): Walk the list of ns_status
>>>    entries and disconnect each ns_status from the list; take the
>>>    writer lock to protect access to the list; also, take the item 
>>> off the
>>>    per-namespace rbtree
>>>
>>> - when the ima_namepace is freed: While walking the rbtree, remove the
>>>    ns_status from the list while also holding the iint's writer lock;
>>>    to be able to grab the lock we have to have a pointer to the iint on
>>>    the ns_status structure.
>>>
>>> To avoid an ns_status to be freed by the two cases concurrently, 
>>> prevent
>>> these two cases to run concurrently. Therefore, groups of threads
>>> deleting either inodes or ima_namespaces are allowed to run 
>>> concurrently
>>> but no two threads may run and one delete an inode and the other an
>>> ima_namespace.
>> The locking involved here is really complex.  I'm sure you thought
>> about it a lot, but isn't there a better alternative?
>
> I am afraid this is a difficult question and a short and concise 
> answer is not possible...
>
> The complexity of the locking is driven by concurrency and the data 
> structures that are involved. The data structures (existing global 
> iint rbtree, ns_status structure, and per namespace rbtree for 
> ns_status) and how they are organized and connected (via linked lists) 
> are a consequence of the fact that we need to be able to handle files 
> shared between IMA namespaces (and the host) so that re-auditing, 
> re-measuring and re-appraisal of files after file modifications or 
> modifications of the security.ima xattr (by any namespaces) can be 
> done efficiently. Furthermore, it helps to efficiently remove all the 
> status information that an IMA namespace has kept for files it 
> audited/measured/appraised. The goal was to make this as scalable as 
> possible by having each namespace get out of the way of other 
> namespaces by preventing them from locking each other out too much. 
> The single biggest problem are files shared between IMA namespaces.
>
> The best argument for the design I can come up with is the 'Big O 
> notation' describing the time complexity of operations.
>
>
> The existing global iint rbtree maintains IMA status information for 
> each inode. Lookup and insertion of data into the gloab iint rbtree  
> is O(log(n)), thus optimal.
>
> To accommodate re-auditing/re-measurement/re-appraisal, which is 
> driven by resetting status flags, I connected a list of ns_status 
> structures, in which each namespace maintains its status information 
> for each inode, to the iint maintained in that global rbtree. The 
> resetting of status flags is fast because traversing the list after a 
> lookup in the tree is O(n). Lookup + resetting the flags therefore is 
> O(log(n) + n). If the list didn't exist we would have to search all 
> IMA namespaces for the inode to be able to reset the flags, resulting 
> in O(n * log(n)) time complexity, which is of course much worse. So, 
> the list of ns_status linked to an iint has a good reason: better time 
> complexity to traverse the list and reset status flags. Beside  that 
> it also supports fast handling of deletion of files where the iint has 
> to be delete from the global rbtree and the ns_status list it holds 
> must also be deleted (each ns_status also needs to be delete from a 
> per IMA-namespace rbtree then)
>
>
> There's also a per-IMA namespace rbtree for each inode that serves two 
> purposes:
>
> a) Fast lookup of ns_status (O(log(n)) for an IMA namespace; at least 
> to insert an ns_status into this tree we need not write-lock the iint 
> tree but the initial iint creation required the write-locking of the 
> iint tree
>
> b) Maintaining a collection of inodes that the namespace has 
> audited/measured/appraised for efficient deletion upon IMA namespace 
> teardown: We can traverse this tree in O(n) time and determine which 
> iints have no more namespace users and delete them from the iint tree.
>
>
> Now the dilemma with this is that an ns_status structure is connected 
> to a list hanging off the iint and on top of this it is part of an 
> rbtree. And this is where the 'group locking' is coming from. What we 
> need to prevent is that an ns_status is deleted from its iint list 
> (when a file is deleted) while it is also deleted from the per-IMA 
> namespace rbtree (when the namespace is deleted). Both must not be 
> done concurrently. What is possible is that a group of threads may 
> tear down namespaces and the other group may act on file deletion, but 
> threads from both groups must not run concurrently.
>
>
> Now we can at least look at two alternatives for the per-IMA namespace 
> rbtree.
>
> 1) One alternative is to use a list instead of an rbtree. We would 
> loose the fast lookup via the per IMA namespace tree and get O(n) 
> lookup times but quick insertion into the list [O(1)]. We still would 
> have the collection of inodes. And we would still have the dilemma 
> that an ns_status would be connected to two lists, thus requiring the 
> group locking. I don't think using a list instead of an rbtree is a 
> solution.
>
> 2) We could try to get rid of the per-IMA namespace rbtree altogether 
> and just use the global iint rbtree that exists today with a list of 
> ns_status connected to its iints. If we do this we would loose the 
> knowledge of which inodes a namespace has an ns_status structure for. 
> The only way we would find this is by traversing the global iint tree 
> (O(n)) and follow each iint list (O(m)) to see whether we find an 
> ns_status holding information about the iint. The time complexity for 
> this would be O(n*m) but much less than O(n^2). A downside would also 
> be that we would have to keep a lock on the global iint rbtree while 
> traversing it, thus locking out those that want to add inodes to the 
> tree. On the upside it would allow us to get rid of the group locking. 
> Lookup of an ns_status in the global iint tree would be O(n) + O(m) 
> and insertion would be O(n) + O(1).
>
>
> Certainly, the alternative is 2) with its own trade-offs. My guess is 
> some sort of yielding could probably also be helpful there then to 
> avoid blocking higher priority operations than deleting of a namespace.

I forgot to mention: It makes a difference if one has to walk the global 
iint tree to find the few ns_status for the namespace among possibly 
thousands of entries in that tree than having a per-IMA namespace rbtree 
that has these few ns_status right there. So walking the iint tree is 
more like O(N) versus O(n) walking the per-IMA namespace rbtree.