[RFC] security: replace indirect calls with static calls

Mon Aug 24 14:09:09 UTC 2020

On Thu, 20 Aug 2020 at 23:46, Kees Cook <keescook at chromium.org> wrote:
>
> On Thu, Aug 20, 2020 at 06:47:53PM +0200, Brendan Jackman wrote:
> > From: Paul Renauld <renauld at google.com>
> >
> > LSMs have high overhead due to indirect function calls through
> > retpolines. This RPC proposes to replace these with static calls [1]
>
> typo: RFC

Oops, thanks - I also meant to have the [RFC] subject prefix.

>
> > instead.
>
> Yay! :)
>
> > [...]
> > This overhead prevents the adoption of bpf LSM on performance critical
> > systems, and also, in general, slows down all LSMs.
>
> I'd be curious to see other workloads too. (Your measurements are a bit
> synthetic, mostly showing "worst case": one short syscall in a tight
> loop. I'm curious how much performance gain can be had -- we should
> still do it, it'll be a direct performance improvement, but I'm curious
> about "real world" impact too.)
>

Sounds good - I'll gather some more data and get back.

(I would also reiterate what KP said in response
to James: the "worst case" relative indirect call overhead (i.e. the case
where the hook callback does minimal work) is exactly the case we care
about here. If the callback is doing enough work that the indirect call overhead
becomes negligible, that callback is probably anyway too heavyweight for the
use cases that motivated this work).

> > [...]
> > Previously, the code for this hook would have looked like this:
> >
> >       ret = DEFAULT_RET;
> >
> >         for each cb in [A, B, C]:
> >                 ret = cb(args); <--- costly indirect call here
> >                 if ret != 0:
> >                         break;
> >
> >         return ret;
> >
> > Static calls are defined at build time and are initially empty (NOP
> > instructions). When the LSMs are initialized, the slots are filled as
> > follows:
> >
> >  slot idx     content
> >            |-----------|
> >     0      |           |
> >            |-----------|
> >     1      |           |
> >            |-----------|
> >     2      |   call A  | <-- base_slot_idx = 2
> >            |-----------|
> >     3      |   call B  |
> >            |-----------|
> >     4      |   call C  |
> >            |-----------|
> >
> > The generated code will unroll the foreach loop to have a static call for
> > each possible LSM:
> >
> >         ret = DEFAULT_RET;
> >         switch(base_slot_idx):
> >
> >                 case 0:
> >                         NOP
> >                         if ret != 0:
> >                                 break;
> >                         // fallthrough
> >                 case 1:
> >                         NOP
> >                         if ret != 0:
> >                                 break;
> >                         // fallthrough
> >                 case 2:
> >                         ret = A(args); <--- direct call, no retpoline
> >                         if ret != 0:
> >                                 break;
> >                         // fallthrough
> >                 case 3:
> >                         ret = B(args); <--- direct call, no retpoline
> >                         if ret != 0:
> >                                 break;
> >                         // fallthrough
> >
> >                 [...]
> >
> >                 default:
> >                         break;
> >
> >         return ret;
> >
> > A similar logic is applied for void hooks.
> >
> > Why this trick with a switch statement? The table of static call is defined
> > at compile time. The number of hook callbacks that will be defined is
> > unknown at that time, and the table cannot be resized at runtime.  Static
> > calls do not define a conditional execution for a non-void function, so the
> > executed slots must be non-empty.  With this use of the table and the
> > switch, it is possible to jump directly to the first used slot and execute
> > all of the slots after. This essentially makes the entry point of the table
> > dynamic. Instead, it would also be possible to start from 0 and break after
> > the final populated slot, but that would require an additional conditional
> > after each slot.
>
> Instead of just "NOP", having the static branches perform a jump would
> solve this pretty cleanly, yes? Something like:
>
>         ret = DEFAULT_RET;
>
>         ret = A(args); <--- direct call, no retpoline
>         if ret != 0:
>                 goto out;
>
>         ret = B(args); <--- direct call, no retpoline
>         if ret != 0:
>                 goto out;
>
>         goto out;
>         if ret != 0:
>                 goto out;
>
> out:
>         return ret;

Hmm yeah that's a cool idea. This would either need to be implemented
with custom
code-modification logic for the LSM hooks, or we'd need to think of a
way to express it
in a sensible addition to the static_call API. I do wonder if the
latter could take
the form of a generic system for arrays of static calls.

It would also need to handle the fact that IIUC at the moment the last
static_call may be a tail
call, so we'd be patching an existing jump into a jump to a different
target, I don't know if we
can do that atomically.

More research required on my side here, on both points.

 > [...]
> > Signed-off-by: Paul Renauld <renauld at google.com>
> > Signed-off-by: KP Singh <kpsingh at google.com>
> > Signed-off-by: Brendan Jackman <jackmanb at google.com>
>
> This implies a maintainership chain, with Paul as the sole author. If
> you mean all of you worked on the patch, include Co-developed-by: as
> needed[1].

Yep, this is intentional - Paul is the sole author so far (I suppose
KP's sign-off
is not technically required since he's also at the Google).