答复: 答复: new seccomp mode aims to improve performance

Kees Cook keescook at chromium.org
Tue Jun 2 18:32:25 UTC 2020


On Tue, Jun 02, 2020 at 11:34:04AM +0000, zhujianwei (C) wrote:
> And in many scenarios, the requirement for syscall filter is usually
> simple, and does not need complex filter rules, for example, just
> configure a syscall black or white list. However, we have noticed that
> seccomp will have a performance overhead that cannot be ignored in this
> simple scenario. For example, referring to Kees's t est data, this cost
> is almost 41/636 = 6.5%, and Alex's data is 17/226 = 7.5%, based on
> single rule of filtering (getpid); Our data for this overhead is 19.8%
> (refer to the previous 'orignal' test results), filtering based on our
> 20 rules (unixbench syscall).

I wonder if aarch64 has higher overhead for calling into the TIF_WORK
trace stuff? (Or if aarch64's BPF JIT is not as efficient as x86?)

> // kernel modification
> --- linux-5.7-rc7_1/arch/arm64/kernel/ptrace.c	2020-05-25 06:32:54.000000000 +0800
> +++ linux-5.7-rc7/arch/arm64/kernel/ptrace.c	2020-06-02 12:35:04.412000000 +0800
> @@ -1827,6 +1827,46 @@
>  	regs->regs[regno] = saved_reg;
>  }
>  
> +#define PID_MAX    1000000
> +#define SYSNUM_MAX 0x220

You can use NR_syscalls here, I think.

> +
> +/* all zero*/
> +bool g_light_filter_switch[PID_MAX] = {0};
> +bool g_light_filter_bitmap[PID_MAX][SYSNUM_MAX] = {0};

These can be static, and I would double-check your allocation size -- I
suspect this is allocating a byte for each bool. I would recommend
DECLARE_BITMAP() and friends.

> +static int __light_syscall_filter(void) {
> +   int pid;
> +	int this_syscall;
> +
> +   pid = current->pid;
> +	this_syscall = syscall_get_nr(current, task_pt_regs(current));
> +
> +   if(g_light_filter_bitmap[pid][this_syscall] == true) {
> +       printk(KERN_ERR "light syscall filter: syscall num %d denied.\n", this_syscall);
> +		goto skip;
> +   }
> +
> +	return 0;
> +skip:	
> +	return -1;
> +}
> +
> +static inline int light_syscall_filter(void) {
> +	if (unlikely(test_thread_flag(TIF_SECCOMP))) {
> +                 return __light_syscall_filter();
> +        }
> +
> +	return 0;
> +}
> +
>  int syscall_trace_enter(struct pt_regs *regs)
>  {
>  	unsigned long flags = READ_ONCE(current_thread_info()->flags);
> @@ -1837,9 +1877,10 @@
>  			return -1;
>  	}
>  
> -	/* Do the secure computing after ptrace; failures should be fast. */
> -	if (secure_computing() == -1)
> +	/* light check for syscall-num-only rule. */
> +	if (light_syscall_filter() == -1) {
>  		return -1;
> +	}
>  
>  	if (test_thread_flag(TIF_SYSCALL_TRACEPOINT))
>  		trace_sys_enter(regs, regs->syscallno);

Given that you're still doing this in syscall_trace_enter(), I imagine
it could live in secure_computing().

Anyway, the functionality here is similar to what I've been working
on for bitmaps (having a global preallocated bitmap isn't going to be
upstreamable, but it's good for PoC). The complications are with handling
differing architecture (for compat systems), tracking/choosing between
the various basic SECCOMP_RET_* behaviors, etc.

-Kees

-- 
Kees Cook



More information about the Linux-security-module-archive mailing list