[RFC PATCH v3 3/5] KVM: x86: Add notifications for Heki policy configuration and violation

Fri May 3 14:03:21 UTC 2024

On Fri, May 03, 2024, Mickaël Salaün wrote:
> Add an interface for user space to be notified about guests' Heki policy
> and related violations.
> 
> Extend the KVM_ENABLE_CAP IOCTL with KVM_CAP_HEKI_CONFIGURE and
> KVM_CAP_HEKI_DENIAL. Each one takes a bitmask as first argument that can
> contains KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4. The
> returned value is the bitmask of known Heki exit reasons, for now:
> KVM_HEKI_EXIT_REASON_CR0 and KVM_HEKI_EXIT_REASON_CR4.
> 
> If KVM_CAP_HEKI_CONFIGURE is set, a VM exit will be triggered for each
> KVM_HC_LOCK_CR_UPDATE hypercalls according to the requested control
> register. This enables to enlighten the VMM with the guest
> auto-restrictions.
> 
> If KVM_CAP_HEKI_DENIAL is set, a VM exit will be triggered for each
> pinned CR violation. This enables the VMM to react to a policy
> violation.
> 
> Cc: Borislav Petkov <bp at alien8.de>
> Cc: Dave Hansen <dave.hansen at linux.intel.com>
> Cc: H. Peter Anvin <hpa at zytor.com>
> Cc: Ingo Molnar <mingo at redhat.com>
> Cc: Kees Cook <keescook at chromium.org>
> Cc: Madhavan T. Venkataraman <madvenka at linux.microsoft.com>
> Cc: Paolo Bonzini <pbonzini at redhat.com>
> Cc: Sean Christopherson <seanjc at google.com>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets at redhat.com>
> Cc: Wanpeng Li <wanpengli at tencent.com>
> Signed-off-by: Mickaël Salaün <mic at digikod.net>
> Link: https://lore.kernel.org/r/20240503131910.307630-4-mic@digikod.net
> ---
> 
> Changes since v1:
> * New patch. Making user space aware of Heki properties was requested by
>   Sean Christopherson.

No, I suggested having userspace _control_ the pinning[*], not merely be notified
of pinning.

 : IMO, manipulation of protections, both for memory (this patch) and CPU state
 : (control registers in the next patch) should come from userspace.  I have no
 : objection to KVM providing plumbing if necessary, but I think userspace needs to
 : to have full control over the actual state.
 : 
 : One of the things that caused Intel's control register pinning series to stall
 : out was how to handle edge cases like kexec() and reboot.  Deferring to userspace
 : means the kernel doesn't need to define policy, e.g. when to unprotect memory,
 : and avoids questions like "should userspace be able to overwrite pinned control
 : registers".
 : 
 : And like the confidential VM use case, keeping userspace in the loop is a big
 : beneifit, e.g. the guest can't circumvent protections by coercing userspace into
 : writing to protected memory.

I stand by that suggestion, because I don't see a sane way to handle things like
kexec() and reboot without having a _much_ more sophisticated policy than would
ever be acceptable in KVM.

I think that can be done without KVM having any awareness of CR pinning whatsoever.
E.g. userspace just needs to ability to intercept CR writes and inject #GPs.  Off
the cuff, I suspect the uAPI could look very similar to MSR filtering.  E.g. I bet
userspace could enforce MSR pinning without any new KVM uAPI at all.

[*] https://lore.kernel.org/all/ZFUyhPuhtMbYdJ76@google.com