[RFC PATCH 02/10] landlock/hash: define (dynamic, non-resizable) hash table helpers

Mon Jun 2 13:26:42 UTC 2025

On 6/2/25 14:20, Tingmao Wang wrote:
> On 5/27/25 12:00, Mickaël Salaün wrote:
>> On Wed, May 21, 2025 at 08:31:58PM +0100, Tingmao Wang wrote:
>>> While there is already include/linux/hash.h, it relies on the static size
>>> of the array as the size of the hash table, and thus is inconvenient to
>>> use for this case where we dynamically compute how many slots we need.
>>>
>>> There is also the relativistic hash tables in rhashtable.h which supports
>>> dynamic resizes etc, but is more complicated and might be slower to access?
>>>
>>> However, on second thought, I'm wondering if we should just use hash
>>> tables for both domain and a not-yet-merged ruleset anyway (which saves us
>>> from having a union in landlock_rule).  If we do that then we should
>>> indeed just use rhashtable.
>>
>> Thinking more about it, the important properties are that we can have a
>> lot of domains/tables (i.e. sandboxed processes) with a few
>> entries/rules (e.g. ~30 rules seems reasonable for now).  We should then
>> try to minimize the total amount of memory while still making access
>> checks quick.  As you noted, the cost of hashing should also not be
>> ignored.
>>
>> Instead of a hash table per domain, what about flat arrays with binary
>> search?  Here is a proposal:
> 
> I think this makes sense in terms of reducing memory footprint.
> 
> My understanding of the Slab allocator is that objects are allocated in
> power-of-two sized caches, so with a single-layer, 44 bytes landlock_rule,
> we will still use 64 bytes.  Putting them in an array, especially with
> your proposed structure, would reduce this.
> 
> I also checked how the allocations actually are distributed in the current
> implementation.  Dumping the addresses of the landlock_rule objects
> allocated and sorting them by the address, we see:
> 
> # env LL_FS_RO=/usr:/bin:/etc:/lib:/sys:/dev:/home/mao/.config:/home/mao/linux LL_FS_RW=/tmp:/proc:/dev/tty:/dev/null:/dev/zero \
>> ./sandboxer bash
> Executing the sandboxed command...
> bash: /home/mao/.bashrc: Permission denied
> mao at linux-devbox-vm:~$ cat /proc/dump_landlock_domain
> Ruleset: ffff9268489451e0
>   Inode tree:
>     ffff92684e767100: inode 5374060 in fs ext4
>     ffff92684e767380 (+640 (rule itself was 44 bytes)): inode 2883604 in fs ext4
>     ffff92684e767440 (+192 (rule itself was 44 bytes)): inode 2883992 in fs ext4
>     ffff92684e767580 (+320 (rule itself was 44 bytes)): inode 11 in fs devtmpfs
>     ffff92684e767740 (+448 (rule itself was 44 bytes)): inode 5377989 in fs ext4
>     ffff92684e767800 (+192 (rule itself was 44 bytes)): inode 1 in fs devtmpfs
>     ffff92684e7678c0 (+192 (rule itself was 44 bytes)): inode 1 in fs tmpfs
>     ffff92684e767980 (+192 (rule itself was 44 bytes)): inode 6 in fs devtmpfs
>     ffff92684e767b40 (+448 (rule itself was 44 bytes)): inode 4 in fs devtmpfs
>     ffff92684e767bc0 (+128 (rule itself was 44 bytes)): inode 2883585 in fs ext4
>     ffff92684e767c40 (+128 (rule itself was 44 bytes)): inode 1 in fs sysfs
>     ffff92684e767cc0 (+128 (rule itself was 44 bytes)): inode 6815745 in fs ext4
>     ffff92684e767f80 (+704 (rule itself was 44 bytes)): inode 1 in fs proc
> 
> Note: source code for this at
> https://github.com/micromaomao/linux-dev/commit/16c318fcbc64c23b87ca67836771f710d0d0ccf1
> 
> (this is a typical result out of several repeat runs)
> (KASLR etc disabled, no_hash_pointers)

Sorry I meant KASAN, KASLR is intact here.

> 
> Because all landlock rules are copied at domain creation time, I
> previously thought that they are likely to be allocated close together
> (perhaps there will be some fragmentation due to a ruleset extending an
> existing rule in the parent domain), but it looks like they are still
> spreaded out a bit for some reason, and if the allocations are sparse
> during domain creation time, this situation won't fix itself and could
> slow down all access checks.  (Maybe on a single core system, with nothing
> else touching the kmem cache when the domain is being merged, they would
> be allocated together?)
> 
> I'm not sure about the performance of binary search vs rbtrees. I can see
> that the improved locality should help, but going by my benchmark of the
> hashtable domains, I'm not sure how much of a difference it make, and
> whether it would be worth the added complexity.  I could try implementing
> this flat array + binary search approach and run the existing benchmarks
> on it tho (but as of writing this I haven't).
> 
>>
>> struct landlock_domain_index {
>>     union landlock_key key;
>>     u32 shift; // value to add to this array's index to jump to the set
>>                // of mapped landlock_layer
> 
> Can you clarify the comment?  Is it the index into the layers array, or
> are you suggesting adding the index of the current landlock_domain_index
> with this shift value, and use that to index into the layers array?
> 
> While it's probably an unlikely situation, what if there are more than
> 2^29 rules, each having 16 layers?  I think that would overflow this u32
> even if it is an offset, and this is allowed as LANDLOCK_MAX_NUM_RULES is
> currently defined as U32_MAX.
> 
> (Unrelated to this patch, but I think once we have the supervise feature
> in, I can see a (probably badly implemented) supervisor getting close to
> this limit if the sandboxed application creates / delete and recreates
> lots of temporary files under /tmp)
> 
>>     u32 num_layers;
>> }; // 128-bit (or 96-bit) alligned
> 
> I guess this has to be 128 bits aligned if we have an u32 num_layers.
> Because we allow 16 layers, if we make it an u16, it would actually need
> to be the number of layers minus 1.  However maybe that is
> overcomplicating it and we should just make this 128 bits aligned
> (otherwise we have to deal with unaligned reads of the
> landlock_key->object too).
> 
>>
>> // Theoretical landlock_domain
>> struct landlock_domain {
>>     struct landlock_hierarchy *hierarchy;
>>     union {
>>         // See landlock_ruleset's union
>>     };
>>     u32 num_fs; // number of FS indexes
>>     u32 num_net; // number of net indexes
>>     struct access_masks access_masks[];
>>     struct landlock_domain_index fs_indexes[num_fs];
>>     struct landlock_layer fs_layers[sum of FS rules' layers];
>>     struct landlock_domain_index net_indexes[num_net];
>>     struct landlock_layer net_layers[sum of net rules' layers];
>> };
> 
> (Unrelated to this patch specifically, but I would probably like to use
> this domain struct for the mutable part of a domain as well later.  In
> that case the hierarchy pointer would be unused.)
> 
>>
>> // Real landlock_domain
>> struct landlock_domain {
>>     struct landlock_hierarchy *hierarchy;
>>     union {
>>         // See landlock_ruleset's union
>>     };
>>     u32 num_fs; // number of FS indexes
>>     u32 num_net; // number of net indexes
>>     u32 rules[]; // underlying typed arrays accessed via helpers
>> };
>>
>> The landlock_domain is a contiguously allocated memory buffer containing
>> variable-size arrays improving locality.  Another advantage is that we
>> would not get any potential allocation errors once the buffer is
>> allocated which should simplify domain creation.  Also, we avoid the
>> union in landlock_rule (patch 5) and only use landlock_rule in
>> landlock_ruleset.
> 
> I think doing this definitely has positives especially in terms of memory,
> however I'm worried about the complexity here.  Since we will have to do
> various index calculation and use it to access variable-sized arrays, I'm
> somewhat afraid that any mistake could end up either leaking kernel
> pointers, or at worst, memory corruption.
> 
> We should probably have a len_rules, and check (WARN_ON_ONCE and if fails
> returns -EINVAL) that any computed indices lies within range before
> accessing it.
> 
>>
>> An alternative approach would be to use a hash table instead of the
>> array (extending landlock_domain_index with a pointer/shift to the next
>> collision) but still with this big buffer.  I'm not sure the perf impact
>> would be noticable but it might be worse a try.
> 
> I can give that a try too and measure benchmarks.
> 
> BTW, going by the direction this discussion is going, I assume we're on
> board with a separate landlock_domain, and don't plan to change how
> unmerged landlock_ruleset are stored (at least for now)?  If so I can
> probably start work on quiet rules, removing access bits / rules by
> intersect, etc as discussed in
> https://github.com/landlock-lsm/linux/issues/44#issuecomment-2876500918
> 
>>
>>>
>>> Signed-off-by: Tingmao Wang <m at maowtm.org>
>>> ---
>>>  security/landlock/hash.h | 117 +++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 117 insertions(+)
>>>  create mode 100644 security/landlock/hash.h
>>>
>>> diff --git a/security/landlock/hash.h b/security/landlock/hash.h
>>> new file mode 100644
>>> index 000000000000..955c5756d4d9
>>> --- /dev/null
>>> +++ b/security/landlock/hash.h
>>> @@ -0,0 +1,117 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Landlock - Domain hashtable mainpulation
>>
>> typo
> 
> ack
> 
>>
>>> + *
>>> + * Copyright © 2025      Tingmao Wang <m at maowtm.org>
>>> + */
>>> +
>>> +#ifndef _SECURITY_LANDLOCK_HASH_H
>>> +#define _SECURITY_LANDLOCK_HASH_H
>>> +
>>> +#include <linux/slab.h>
>>> +#include <linux/hash.h>
>>> +
>>> +#include "ruleset.h"
>>> +
>>> +struct landlock_hashtable {
>>> +	struct hlist_head *hlist;
>>
>> A simple linked-list would be enough.
> 
> I'm not sure I understand what you meant by a "simple linked-list" - do
> you mean like an array of `landlock_rule`s, each itself is part of a
> linked list of all the rules that hashed to the same key?
> 
> Anyway, since we identified a flat array approach (whether we hash or
> binary search), I'm gonna try that instead.
> 
>>
>>> +
>>> +	/**
>>> +	 * @hash_bits: Number of bits in this hash index (i.e.  hlist has
>>> +	 * 2^this many elements).
>>> +	 */
>>> +	int hash_bits;
>>> +};
>>> +
>>> +#define landlock_hash_for_each(rule, ht, i)                \
>>> +	for (i = 0; i < (1ULL << (ht)->hash_bits); i += 1) \
>>> +		hlist_for_each_entry(rule, &(ht)->hlist[i], hlist)
>>> +
>>> +#define landlock_hash_for_each_safe(rule, tmp, ht, i)      \
>>> +	for (i = 0; i < (1ULL << (ht)->hash_bits); i += 1) \
>>> +		hlist_for_each_entry_safe(rule, tmp, &(ht)->hlist[i], hlist)
>>> +
>>> +static inline int landlock_hash_init(const size_t expected_num_entries,
>>> +				     struct landlock_hashtable *out_ht)
>>> +{
>>> +	size_t table_sz = 1;
>>
>> Please use variables with full name when they are not too big:
>> table_size in this case.
>>
>>> +	int hash_bits = 0;
>>> +
>>> +	if (likely(expected_num_entries > 0)) {
>>> +		table_sz = roundup_pow_of_two(expected_num_entries);
>>
>> With roundup_pow_of_two() we'll get a few collisions but a big table.
>> What would be the impact of using roundup_pow_of_two() instead to have a
>> compact hash table (with more collisions)?
> 
> I assume you meant using rounddown_pow_of_two - I can give it a quick test
> too with the flat array approach.
> 
>>
>>> +		hash_bits = fls_long(table_sz - 1);
>>> +	}
>>> +
>>> +	/*
>>> +	 * We allocate a table even if expected_num_entries == 0 to avoid
>>> +	 * unnecessary branching in lookup code
>>> +	 */
>>> +
>>> +	out_ht->hash_bits = hash_bits;
>>> +	out_ht->hlist = kcalloc(table_sz, sizeof(struct hlist_head),
>>> +				GFP_KERNEL_ACCOUNT);
>>> +	if (!out_ht->hlist) {
>>> +		return -ENOMEM;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static inline void landlock_hash_free(struct landlock_hashtable *ht,
>>> +				      const enum landlock_key_type key_type)
>>> +{
>>> +	struct landlock_rule *rule;
>>> +	struct hlist_node *tmp;
>>> +	size_t i;
>>> +
>>> +	if (key_type == LANDLOCK_KEY_INODE)
>>> +		might_sleep();
>>> +
>>> +	if (!ht->hlist)
>>> +		return;
>>> +
>>> +	landlock_hash_for_each_safe(rule, tmp, ht, i)
>>> +	{
>>> +		free_rule(rule, key_type);
>>> +	}
>>> +	kfree(ht->hlist);
>>> +	ht->hlist = NULL;
>>> +}
>>> +
>>> +static inline u32 landlock_hash_key(const union landlock_key key,
>>> +				    const int hash_bits)
>>> +{
>>> +	return hash_ptr((void *)key.data, hash_bits);
>>> +}
>>> +
>>> +static inline struct landlock_rule *
>>> +landlock_hash_find(const struct landlock_hashtable *const ht,
>>> +		   const union landlock_key key)
>>> +{
>>> +	struct hlist_head *head;
>>> +	struct landlock_rule *rule;
>>> +
>>> +	head = &ht->hlist[landlock_hash_key(key, ht->hash_bits)];
>>> +
>>> +	hlist_for_each_entry(rule, head, hlist) {
>>> +		if (rule->key.data == key.data)
>>> +			return rule;
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>> +
>>> +/**
>>> + * @landlock_hash_count - Return number of entries in the hashtable.
>>> + */
>>> +static inline size_t landlock_hash_count(const struct landlock_hashtable *ht)
>>> +{
>>> +	size_t num_entries = 0;
>>> +	struct landlock_rule *rule;
>>> +	size_t i;
>>> +	landlock_hash_for_each(rule, ht, i)
>>> +	{
>>> +		num_entries += 1;
>>> +	}
>>> +	return num_entries;
>>> +}
>>> --
>>> 2.49.0
>>>
>>>
>