[PATCH 0/2] apparmor unaligned memory fixes

Wed Nov 26 14:22:01 UTC 2025

On Wed, 26 Nov 2025 12:03:03 +0100
Helge Deller <deller at gmx.de> wrote:

> On 11/26/25 11:44, david laight wrote:
...   
> >> diff --git a/security/apparmor/match.c b/security/apparmor/match.c
> >> index 26e82ba879d44..3dcc342337aca 100644
> >> --- a/security/apparmor/match.c
> >> +++ b/security/apparmor/match.c
> >> @@ -71,10 +71,10 @@ static struct table_header *unpack_table(char *blob, size_t bsize)
> >>    				     u8, u8, byte_to_byte);  
> > 
> > Is that that just memcpy() ?  
> 
> No, it's memcpy() only on big-endian machines.

You've misread the quoting...
The 'data8' case that was only half there is a memcpy().

> On little-endian machines it converts from big-endian
> 16/32-bit ints to little-endian 16/32-bit ints.
> 
> But I see some potential for optimization here:
> a) on big-endian machines just use memcpy()

true

> b) on little-endian machines use memcpy() to copy from possibly-unaligned
>     memory to then known-to-be-aligned destination. Then use a loop with
>     be32_to_cpu() instead of get_unaligned_xx() as it's faster.

There is a function that does a loop byteswap of a buffer - no reason
to re-invent it.
But I doubt it is always (if ever) faster to do a copy and then byteswap.
The loop control and extra memory accesses kill performance.

Not that I've seen a fast get_unaligned() - I don't think gcc or clang
generate optimal code - For LE I think it is something like:
	low = *(addr & ~3);
	high = *((addr + 3) & ~3);
	shift = (addr & 3) * 8;
	value = low << shift | high >> (32 - shift);
Note that it is only 2 aligned memory reads - even for 64bit.

	David