[RFC v2 01/13] x86/mktme: Document the MKTME APIs

Wed Dec 5 23:35:36 UTC 2018

>> On Dec 5, 2018, at 11:22 AM, Alison Schofield <alison.schofield at intel.com> wrote:
>>
>> On Wed, Dec 05, 2018 at 10:11:18AM -0800, Andy Lutomirski wrote:
>>
>>
>>> On Dec 3, 2018, at 11:39 PM, Alison Schofield <alison.schofield at intel.com> wrote:
>>
>> I realize you’re writing code to expose hardware behavior, but I’m not sure this
>> really makes sense in this context.
>
> Your observation is accurate. The Usage defined here is very closely
> aligned to the Intel MKTME Architecture spec. That's a starting point,
> but not the ending point. We need to implement the feature set that
> makes sense. More below...
>
>>> +
>>> +    type=
>>> +        *user*    User will supply the encryption key data. Use this
>>> +                type to directly program a hardware encryption key.
>>> +
>>
>> I think that “user” probably sense as a “key service” key, but I don’t think it is at all useful for non-persistent memory.  Even if we take for granted that MKTME for anonymous memory is useful at all, “cpu” seems to be better in all respects.
>>
>>
>> Perhaps support for “user” should be tabled until there’s a design for how to use this for pmem?  I imagine it would look quite a bit like dm-crypt.  Advanced pmem filesystems could plausibly use different keys for different files, I suppose.
>>
>> If “user” is dropped, I think a lot of the complexity goes away. Hotplug becomes automatic, right?
>
> Dropping 'user' type removes a great deal of complexity.
>
> Let me follow up in 2 ways:
> 1) Find out when MKTME support for pmem is required.
> 2) Go back to the the requirements and get the justification for user
> type.
>
>>
>>> +        *cpu*    User requests a CPU generated encryption key.
>>
>> Okay, maybe, but it’s still unclear to me exactly what the intended benefit is, though.
> *cpu* is the RANDOM key generated by the cpu. If there were no other
> options, then this would be default, and go away.
>
>>> +        *clear* User requests that a hardware encryption key be
>>> +                cleared. This will clear the encryption key from
>>> +                the hardware. On execution this hardware key gets
>>> +                TME behavior.
>>> +
>>
>> Why is this a key type?  Shouldn’t the API to select a key just have an option to ask for no key to be used?
>
> The *clear* key has been requested in order to clear/erase the users
> key data that has been programmed into a hardware slot. User does not
> want to leave a slot programmed with their encryption data when they
> are done with it.

Can’t you just clear the key when the key is deleted by the user?
Asking the user to allocate a *new* key and hope that it somehow ends
up in the same spot seems like a poor design, especially if future
hardware gains support for key slot virtualization in some way that
makes the slot allocation more dynamic.

>
>>> +        *no-encrypt*
>>> +                 User requests that hardware does not encrypt
>>> +                 memory when this key is in use.
>>
>> Same as above.  If there’s a performance benefit, then there could be a way to ask for cleartext memory.  Similarly, some pmem users may want a way to keep their pmem unencrypted.
>
> So, this is the way to ask for cleartext memory.
> The entire system will be encrypted with the system wide TME Key.
> A subset of that will be protected with MKTME Keys.
> If user wants, no encrypt, this *no-encrypt* is the way to do it.
>

Understood.  I’m saying that having a *key* (in the add_key sense) for
it seems unnecessary.  Whatever the final API for controlling the use
of keys, adding an option to ask for clear text seems reasonable.
This actually seems more useful for anonymous memory than the
cpu-generates keys are IMO.

I do think that, before you invest too much time in perfecting the
series with the current design, you should identify the use cases,
make sure the use cases are valid, and figure out whether your API
design is appropriate.  After considerable head-scratching, I haven’t
thought of a reason that explicit CPU generated keys are any better
than the default TME key, at least in the absence of additional
hardware support for locking down what code can use what key.  The
sole exception is that a key can be removed, which is probably faster
than directly zeroing large amounts of data.

I understand that it would be very nice to say "hey, cloud customer,
your VM has all its memory encrypted with a key that is unique to your
VM", but that seems to be more or less just a platitude with no actual
effect.  Anyone who snoops the memory bus or steals a DIMM learns
nothing unless they also take control of the CPU and can replay all
the data into the CPU.  On the other hand, anyone who can get the CPU
to read from a given physical address (which seems like the most
likely threat) can just get the CPU to decrypt any tenant's data.  So,
for example, if someone manages to write a couple of words to the EPT
for one VM, then they can easily read another VM's data, MKTME or no
MKTME, because the memory controller has no clue which VM initiated
the access.

I suppose there's some smallish value in rotating the key every now
and then to make old data non-replayable, but an attack that
compromises the memory bus and only later compromises the CPU is a
strange threat model.