[RFC][PATCH] net/bpfilter: Remove this broken and apparently unmantained

Fri Jun 26 04:58:35 UTC 2020

On 2020/06/26 10:51, Alexei Starovoitov wrote:
> On Thu, Jun 25, 2020 at 06:36:34PM -0700, Linus Torvalds wrote:
>> On Thu, Jun 25, 2020 at 12:34 PM David Miller <davem at davemloft.net> wrote:
>>>
>>> It's kernel code executing in userspace.  If you don't trust the
>>> signed code you don't trust the signed code.
>>>
>>> Nothing is magic about a piece of code executing in userspace.
>>
>> Well, there's one real issue: the most likely thing that code is going
>> to do is execute llvm to generate more code.

Wow! Are we going to allow execution of such complicated programs?

I was hoping that fork_usermode_blob() accepts only simple program
like the content of "hello64" generated by

----------
; nasm -f elf64 hello64.asm && ld -s -m elf_x86_64 -o hello64 hello64.o
section .text
global _start

_start:
  mov rax, 1        ; write(
  mov rdi, 1        ;   1,
  mov rsi, msg      ;   "Hello world\n",
  mov rdx, 12       ;   12
  syscall           ; );
  mov rax, 231      ; _exit(
  mov rdi, 0        ;   0
  syscall           ; );

section .rodata
  msg: db "Hello world", 0x0a
----------

which can be contained by mechanisms like seccomp; there is no pathname
resolution, no networking access etc.

>>
>> And that's I think the real security issue here: the context in which
>> the code executes. It may be triggered in one namespace, but what
>> namespaces and what rules should the thing actually then execute in.
>>
>> So no, trying to dismiss this as "there are no security issues" is
>> bogus. There very much are security issues.
> 
> I think you're referring to:
> 
>>>   We might need to invent built-in "protected userspace" because existing
>>>   "unprotected userspace" is not trustworthy enough to run kernel modules.
>>>   That's not just inventing fork_usermode_blob().
> 
> Another root process can modify the memory of usermode_blob process.

I'm not familiar with ptrace(); I'm just using /usr/bin/strace and /usr/bin/ltrace .
What I'm worrying is that some root process tampers with memory which initially
contained "hello64" above in order to let that memory do something different behavior.

For example, a usermode process started by fork_usermode_blob() which was initially
containing

----------
while (read(0, &uid, sizeof(uid)) == sizeof(uid)) {
    if (uid == 0)
        write(1, "OK\n", 3);
    else
        write(1, "NG\n", 3);
}
----------

can be somehow tampered like

----------
while (read(0, &uid, sizeof(uid)) == sizeof(uid)) {
    if (uid != 0)
        write(1, "OK\n", 3);
    else
        write(1, "NG\n", 3);
}
----------

due to interference from the rest of the system, how can we say "we trust kernel
code executing in userspace" ?

My question is: how is the byte array (which was copied from kernel space) kept secure/intact
under "root can poke into kernel or any process memory." environment? It is obvious that
we can't say "we trust kernel code executing in userspace" without some mechanism.

Currently fork_usermode_blob() is not providing security context for the byte array to be
executed. We could modify fork_usermode_blob() to provide security context for LSMs, but
I'll be more happy if we can implement that mechanism without counting on in-tree LSMs, for
SELinux is too complicated to support.

> I think that's Tetsuo's point about lack of LSM hooks is kernel_sock_shutdown().
> Obviously, kernel_sock_shutdown() can be called by kernel only.

I can't catch what you mean. The kernel code executing in userspace uses syscall
interface (e.g. SYSCALL_DEFINE2(shutdown, int, fd, int, how) path), doesn't it?

> I suspect he's imaging a hypothetical situation where kernel bits of kernel module
> interact with userblob bits of kernel module.
> Then another root process tampers with memory of userblob.

Yes, how to protect the memory of userblob is a concern. The memory of userblob can
interfere (or can be interfered by) the rest of the system is a problem.

> Then userblob interaction with kernel module can do kernel_sock_shutdown()
> on something that initial design of kernel+userblob module didn't intend.

I can't catch what you mean.

> I think this is trivially enforceable without creating new features.
> Existing security_ptrace_access_check() LSM hook can prevent tampering with
> memory of userblob.

There is security_ptrace_access_check() LSM hook, but no zero-configuration
method is available.

> 
> As far as userblob calling llvm and other things in sequence.
> That is no different from systemd calling things.

Right.

> security label can carry that execution context.

If files get a chance to be associated with appropriate pathname and
security label.

> 
>> My personally strongest argument for remoiving this kernel code is
>> that it's been there for a couple of years now, and it has never
>> actually done anything useful, and there's no actual sign that it ever
>> will, or that there is a solid plan in place for it.
> 
> you probably missed the detailed plan:
> https://lore.kernel.org/bpf/20200609235631.ukpm3xngbehfqthz@ast-mbp.dhcp.thefacebook.com/
> 
> The project #3 is the above is the one we're working on right now.
> It should be ready to post in a week.

I got a question on project #3. Given that "cat /sys/fs/bpf/my_ipv6_route"
produces the same human output as "cat /proc/net/ipv6_route", how security
checks which are done for "cat /proc/net/ipv6_route" can be enforced for
"cat /sys/fs/bpf/my_ipv6_route" ? Unless same security checks (e.g. permission
to read /proc/net/ipv6_route ) is enforced, such bpf usage sounds like a method
for bypassing existing security mechanisms.