[RFC PATCH v2 18/19] heki: x86: Protect guest kernel memory using the KVM hypervisor

Mon Nov 27 20:03:08 UTC 2023

On Mon, Nov 27, 2023 at 11:05:23AM -0600, Madhavan T. Venkataraman wrote:
> Apologies for the late reply. I was on vacation. Please see my response below:
> 
> On 11/13/23 02:54, Peter Zijlstra wrote:
> > On Sun, Nov 12, 2023 at 09:23:25PM -0500, Mickaël Salaün wrote:
> >> From: Madhavan T. Venkataraman <madvenka at linux.microsoft.com>
> >>
> >> Implement a hypervisor function, kvm_protect_memory() that calls the
> >> KVM_HC_PROTECT_MEMORY hypercall to request the KVM hypervisor to
> >> set specified permissions on a list of guest pages.
> >>
> >> Using the protect_memory() function, set proper EPT permissions for all
> >> guest pages.
> >>
> >> Use the MEM_ATTR_IMMUTABLE property to protect the kernel static
> >> sections and the boot-time read-only sections. This enables to make sure
> >> a compromised guest will not be able to change its main physical memory
> >> page permissions. However, this also disable any feature that may change
> >> the kernel's text section (e.g., ftrace, Kprobes), but they can still be
> >> used on kernel modules.
> >>
> >> Module loading/unloading, and eBPF JIT is allowed without restrictions
> >> for now, but we'll need a way to authenticate these code changes to
> >> really improve the guests' security. We plan to use module signatures,
> >> but there is no solution yet to authenticate eBPF programs.
> >>
> >> Being able to use ftrace and Kprobes in a secure way is a challenge not
> >> solved yet. We're looking for ideas to make this work.
> >>
> >> Likewise, the JUMP_LABEL feature cannot work because the kernel's text
> >> section is read-only.
> > 
> > What is the actual problem? As is the kernel text map is already RO and
> > never changed.
> 
> For the JUMP_LABEL optimization, the text needs to be patched at some point.
> That patching requires a writable mapping of the text page at the time of
> patching.
> 
> In this Heki feature, we currently lock down the kernel text at the end of
> kernel boot just before kicking off the init process. The lockdown is
> implemented by setting the permissions of a text page to R_X in the extended
> page table and not allowing write permissions in the EPT after that. So, jump label
> patching during kernel boot is not a problem. But doing it after kernel
> boot is a problem.

But you see, that's exactly what the kernel already does with the normal
permissions. They get set to RX after init and are never changed.

See the previous patch, we establish a read-write alias and write there.

You seem to lack basic understanding of how the kernel works in this
regard, which makes me very nervous about you touching any of this.

I must also say I really dislike your extra/random permssion calls all
over the place. They don't really get us anything afaict. Why can't you
plumb into the existing set_memory_*() family?