答复: 答复: new seccomp mode aims to improve performance
Kees Cook
keescook at chromium.org
Tue Jun 2 18:32:25 UTC 2020
On Tue, Jun 02, 2020 at 11:34:04AM +0000, zhujianwei (C) wrote:
> And in many scenarios, the requirement for syscall filter is usually
> simple, and does not need complex filter rules, for example, just
> configure a syscall black or white list. However, we have noticed that
> seccomp will have a performance overhead that cannot be ignored in this
> simple scenario. For example, referring to Kees's t est data, this cost
> is almost 41/636 = 6.5%, and Alex's data is 17/226 = 7.5%, based on
> single rule of filtering (getpid); Our data for this overhead is 19.8%
> (refer to the previous 'orignal' test results), filtering based on our
> 20 rules (unixbench syscall).
I wonder if aarch64 has higher overhead for calling into the TIF_WORK
trace stuff? (Or if aarch64's BPF JIT is not as efficient as x86?)
> // kernel modification
> --- linux-5.7-rc7_1/arch/arm64/kernel/ptrace.c 2020-05-25 06:32:54.000000000 +0800
> +++ linux-5.7-rc7/arch/arm64/kernel/ptrace.c 2020-06-02 12:35:04.412000000 +0800
> @@ -1827,6 +1827,46 @@
> regs->regs[regno] = saved_reg;
> }
>
> +#define PID_MAX 1000000
> +#define SYSNUM_MAX 0x220
You can use NR_syscalls here, I think.
> +
> +/* all zero*/
> +bool g_light_filter_switch[PID_MAX] = {0};
> +bool g_light_filter_bitmap[PID_MAX][SYSNUM_MAX] = {0};
These can be static, and I would double-check your allocation size -- I
suspect this is allocating a byte for each bool. I would recommend
DECLARE_BITMAP() and friends.
> +static int __light_syscall_filter(void) {
> + int pid;
> + int this_syscall;
> +
> + pid = current->pid;
> + this_syscall = syscall_get_nr(current, task_pt_regs(current));
> +
> + if(g_light_filter_bitmap[pid][this_syscall] == true) {
> + printk(KERN_ERR "light syscall filter: syscall num %d denied.\n", this_syscall);
> + goto skip;
> + }
> +
> + return 0;
> +skip:
> + return -1;
> +}
> +
> +static inline int light_syscall_filter(void) {
> + if (unlikely(test_thread_flag(TIF_SECCOMP))) {
> + return __light_syscall_filter();
> + }
> +
> + return 0;
> +}
> +
> int syscall_trace_enter(struct pt_regs *regs)
> {
> unsigned long flags = READ_ONCE(current_thread_info()->flags);
> @@ -1837,9 +1877,10 @@
> return -1;
> }
>
> - /* Do the secure computing after ptrace; failures should be fast. */
> - if (secure_computing() == -1)
> + /* light check for syscall-num-only rule. */
> + if (light_syscall_filter() == -1) {
> return -1;
> + }
>
> if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
> trace_sys_enter(regs, regs->syscallno);
Given that you're still doing this in syscall_trace_enter(), I imagine
it could live in secure_computing().
Anyway, the functionality here is similar to what I've been working
on for bitmaps (having a global preallocated bitmap isn't going to be
upstreamable, but it's good for PoC). The complications are with handling
differing architecture (for compat systems), tracking/choosing between
the various basic SECCOMP_RET_* behaviors, etc.
-Kees
--
Kees Cook
More information about the Linux-security-module-archive
mailing list