[PATCH] Makefile: Introduce CONFIG_ZERO_CALL_USED_REGS

Mon May 10 22:01:48 UTC 2021

On Mon, May 10, 2021 at 02:45:03PM +0100, Mark Rutland wrote:
> About 31% of this seems to be due to GCC (almost) always clearing x16
> and x17 (see further down for numbers). I suspect that's because GCC has
> to assume that any (non-static) functions might be reached via a PLT
> which would clobber x16 and x17 with specific values.

Wheee.

> We also have a bunch of small functions with multiple returns, where
> each return path gets the full complement of zeroing instructions, e.g.
> 
> Stock:
> 
> | <fpsimd_sync_to_sve>:
> |        d503245f        bti     c
> |        f9400001        ldr     x1, [x0]
> |        7209003f        tst     w1, #0x800000
> |        54000040        b.eq    ffff800010014cc4 <fpsimd_sync_to_sve+0x14>  // b.none
> |        d65f03c0        ret
> |        d503233f        paciasp
> |        a9bf7bfd        stp     x29, x30, [sp, #-16]!
> |        910003fd        mov     x29, sp
> |        97fffdac        bl      ffff800010014380 <fpsimd_to_sve>
> |        a8c17bfd        ldp     x29, x30, [sp], #16
> |        d50323bf        autiasp
> |        d65f03c0        ret
> 
> With zero-call-regs:
> 
> | <fpsimd_sync_to_sve>:
> |        d503245f        bti     c
> |        f9400001        ldr     x1, [x0]
> |        7209003f        tst     w1, #0x800000
> |        540000c0        b.eq    ffff8000100152a8 <fpsimd_sync_to_sve+0x24>  // b.none
> |        d2800000        mov     x0, #0x0                        // #0
> |        d2800001        mov     x1, #0x0                        // #0
> |        d2800010        mov     x16, #0x0                       // #0
> |        d2800011        mov     x17, #0x0                       // #0
> |        d65f03c0        ret
> |        d503233f        paciasp
> |        a9bf7bfd        stp     x29, x30, [sp, #-16]!
> |        910003fd        mov     x29, sp
> |        97fffd17        bl      ffff800010014710 <fpsimd_to_sve>
> |        a8c17bfd        ldp     x29, x30, [sp], #16
> |        d50323bf        autiasp
> |        d2800000        mov     x0, #0x0                        // #0
> |        d2800001        mov     x1, #0x0                        // #0
> |        d2800010        mov     x16, #0x0                       // #0
> |        d2800011        mov     x17, #0x0                       // #0
> |        d65f03c0        ret
> 
> ... where we go from 12 instructions to 20, which is a ~67% bloat.

Yikes. Yeah, so that is likely a good example of missed optimization
opportunity.

> We have a bunch of cases like the above. Also note that per the AAPCS a
> function can clobber x0-17 (and x18 if it's not reserved for something
> like SCS), and I see a few places that clobber x1-x17.

Ah, gotcha. I wasn't quite sure which registers might qualify.

> [...]
> That's 441301 new MOVs, and the equivalent of 442511 new instructions
> overall. There are 135728 new MOVs to x16 and x17 specifically, which
> account for ~31% of that.

I assume the x16/x17 case could be addressed by the compiler if it
examined the need for PLTs, or is that too late (in the sense that the
linker is doing that phase)?

Regardless, I will update the documentation on this feature. :)

-- 
Kees Cook