[RFC] security: replace indirect calls with static calls
mathieu.desnoyers at efficios.com
Fri Feb 5 15:09:26 UTC 2021
On 20-Aug-2020 06:47:53 PM, Brendan Jackman wrote:
> From: Paul Renauld <renauld at google.com>
> LSMs have high overhead due to indirect function calls through
> retpolines. This RPC proposes to replace these with static calls 
> This overhead is especially significant for the "bpf" LSM which supports
> the implementation of LSM hooks with eBPF programs (security/bpf). In
> order to facilitate this, the "bpf" LSM provides a default nop callback for
> all LSM hooks. When enabled, the "bpf", LSM incurs an unnecessary /
> avoidable indirect call to this nop callback.
> The performance impact on a simple syscall eventfd_write (which triggers
> the file_permission hook) was measured with and without "bpf" LSM
> enabled. Activating the LSM resulted in an overhead of 4% .
> This overhead prevents the adoption of bpf LSM on performance critical
> systems, and also, in general, slows down all LSMs.
> Currently, the LSM hook callbacks are stored in a linked list and
> dispatched as indirect calls. Using static calls can remove this overhead
> by replacing all indirect calls with direct calls.
> During the discussion of the "bpf" LSM patch-set it was proposed to special
> case BPF LSM to avoid the overhead by using static keys. This was however
> not accepted and it was decided to :
> - Not special-case the "bpf" LSM.
> - Implement a general solution benefitting the whole LSM framework.
> This is based on the static call branch .
So I reviewed this quickly, and hopefully my understanding is correct.
AFAIU, your approach is limited to scenarios where the callbacks are
known at compile-time. It also appears to add the overhead of a
switch/case for every function call on the fast-path.
I am the original author of the tracepoint infrastructure in the Linux
kernel, which also needs to iterate on an array of callbacks. Recently,
Steven Rostedt pushed a change which accelerates the single-callback
case using static calls to reduce retpoline mitigation overhead, but I
would prefer if we could accelerate the multiple-callback case as well.
Note that for tracepoints, the callbacks are not known at compile-time.
This is where I think we could come up with a generic solution that
would fit both LSM and tracepoint use-cases.
Here is what I have in mind. Let's say we generate code to accelerate up
to N calls, and after that we have a fallback using indirect calls.
Then we should be able to generate the following using static keys as a
jump table and N static calls:
jump <static key label target>
<iteration and indirect calls>
So the static keys would be used to jump to the appropriate label (using
a static branch, which has pretty much 0 overhead). Static calls would
be used to implement each of the calls.
More information about the Linux-security-module-archive