new seccomp mode aims to improve performance

Mon Jun 1 18:16:14 UTC 2020

On Sun, May 31, 2020 at 10:19:15AM -0700, Alexei Starovoitov wrote:
> Thank you for crafting a benchmark.
> The only thing that it's not doing a fair comparison.
> The problem with that patch [1] that is using:
> 
> static noinline u32 __seccomp_benchmark(struct bpf_prog *prog,
>                                         const struct seccomp_data *sd)
> {
>         return SECCOMP_RET_ALLOW;
> }
> 
> as a benchmarking function.
> The 'noinline' keyword tells the compiler to keep the body of the function, but
> the compiler is still doing full control and data flow analysis though this
> function and it is smart enough to optimize its usage in seccomp_run_filters()
> and in __seccomp_filter() because all functions are in a single .c file.
> Lots of code gets optimized away when 'f->benchmark' is on.
> 
> To make it into fair comparison I've added the following patch
> on top of your [1].
> 
> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> index 2fdbf5ad8372..86204422e096 100644
> --- a/kernel/seccomp.c
> +++ b/kernel/seccomp.c
> @@ -244,7 +244,7 @@ static int seccomp_check_filter(struct sock_filter *filter, unsigned int flen)
>         return 0;
>  }
> 
> -static noinline u32 __seccomp_benchmark(struct bpf_prog *prog,
> +__weak noinline u32 __seccomp_benchmark(struct bpf_prog *prog,
>                                         const struct seccomp_data *sd)
> 
> Please take a look at 'make kernel/seccomp.s' before and after to see the difference
> __weak keyword makes.

Ah yeah, thanks. That does bring it up to the same overhead. Nice!

> And here is what seccomp_benchmark now reports:
> 
> Benchmarking 33554432 samples...
> 22.618269641 - 15.030812794 = 7587456847
> getpid native: 226 ns
> 30.792042986 - 22.619048831 = 8172994155
> getpid RET_ALLOW 1 filter: 243 ns
> 39.451435038 - 30.792836778 = 8658598260
> getpid RET_ALLOW 2 filters: 258 ns
> 47.616011529 - 39.452190830 = 8163820699
> getpid BPF-less allow: 243 ns
> Estimated total seccomp overhead for 1 filter: 17 ns
> Estimated total seccomp overhead for 2 filters: 32 ns
> Estimated seccomp per-filter overhead: 15 ns
> Estimated seccomp entry overhead: 2 ns
> Estimated BPF overhead per filter: 0 ns
> 
> [...]
> 
> > So, with the layered nature of seccomp filters there's a reasonable gain
> > to be seen for a O(1) bitmap lookup to skip running even a single filter,
> > even for the fastest BPF mode.
> 
> This is not true.
> The O(1) bitmap implemented as kernel C code will have exactly the same speed
> as O(1) bitmap implemented as eBPF program.

Yes, that'd be true if it was the first (and only) filter. What I'm
trying to provide is a mechanism to speed up the syscalls for all
attached filters (i.e. create a seccomp fast-path). The reality of
seccomp usage is that it's very layered: systemd sets some (or many!),
then container runtime sets some, then the process itself might set
some.

-- 
Kees Cook