[QUESTION] Full user space process isolation?

Tue Jul 4 15:18:43 UTC 2023

On 7/3/2023 5:28 PM, Roberto Sassu wrote:
> On Mon, 2023-07-03 at 17:06 +0200, Jann Horn wrote:
>> On Thu, Jun 22, 2023 at 4:45 PM Roberto Sassu
>> <roberto.sassu at huaweicloud.com> wrote:
>>> I wanted to execute some kernel workloads in a fully isolated user
>>> space process, started from a binary statically linked with klibc,
>>> connected to the kernel only through a pipe.
>>
>> FWIW, the kernel has some infrastructure for this already, see
>> CONFIG_USERMODE_DRIVER and kernel/usermode_driver.c, with a usage
>> example in net/bpfilter/.
> 
> Thanks, I actually took that code to make a generic UMD management
> library, that can be used by all use cases:
> 
> https://lore.kernel.org/linux-kernel/20230317145240.363908-1-roberto.sassu@huaweicloud.com/
> 
>>> I also wanted that, for the root user, tampering with that process is
>>> as hard as if the same code runs in kernel space.
>>
>> I believe that actually making it that hard would probably mean that
>> you'd have to ensure that the process doesn't use swap (in other
>> words, it would have to run with all memory locked), because root can
>> choose where swapped pages are stored. Other than that, if you mark it
>> as a kthread so that no ptrace access is allowed, you can probably get
>> pretty close. But if you do anything like that, please leave some way
>> (like a kernel build config option or such) to enable debugging for
>> these processes.
> 
> I didn't think about the swapping part... thanks!
> 
> Ok to enable debugging with a config option.
> 
>> But I'm not convinced that it makes sense to try to draw a security
>> boundary between fully-privileged root (with the ability to mount
>> things and configure swap and so on) and the kernel - my understanding
>> is that some kernel subsystems don't treat root-to-kernel privilege
>> escalation issues as security bugs that have to be fixed.
> 
> Yes, that is unfortunately true, and in that case the trustworthy UMD
> would not make things worse. On the other hand, on systems where that
> separation is defined, the advantage would be to run more exploitable
> code in user space, leaving the kernel safe.
> 
> I'm thinking about all the cases where the code had to be included in
> the kernel to run at the same privilege level, but would not use any of
> the kernel facilities (e.g. parsers).

Thanks for reminding me of kexec-tools. The complete image for booting a
new kernel was originally prepared in user space. With kernel lockdown,
all this code had to move into the kernel, adding a new syscall and lots
of complexity to build purgatory code, etc. Yet, this new implementation
in the kernel does not offer all features of kexec-tools, so both code
bases continue to exist and are happily diverging...

> If the boundary is extended to user space, some of these components
> could be moved away from the kernel, and the functionality would be the
> same without decreasing the security.

All right, AFAICS your idea is limited to relatively simple cases for
now. I mean, allowing kexec-tools to run in user space is not easily
possible when UID 0 is not trusted, because kexec needs to open various
files and make various other syscalls, which would require a complex LSM
policy. It looks technically possible to write one, but then the big
question is if it would be simpler to review and maintain than adding
more kexec-tools features to the kernel.

Anyway, I can sense a general desire to run less code in the most
privileged system environment. Robert's proposal is one of few that go
in this direction. What are the alternatives?

Petr T