[RFC PATCH v2 2/5] x86/sgx: Require userspace to define enclave pages' protection bits

Wed Jun 12 00:09:28 UTC 2019

On Jun 10, 2019, at 3:28 PM, Xing, Cedric <cedric.xing at intel.com> wrote:

>> From: Andy Lutomirski [mailto:luto at kernel.org]
>> Sent: Monday, June 10, 2019 12:15 PM
>> 
>> On Mon, Jun 10, 2019 at 11:29 AM Xing, Cedric <cedric.xing at intel.com>
>> wrote:
>>> 
>>>> From: Christopherson, Sean J
>>>> Sent: Wednesday, June 05, 2019 7:12 PM
>>>> 
>>>> +/**
>>>> + * sgx_map_allowed - check vma protections against the associated
>>>> enclave page
>>>> + * @encl:    an enclave
>>>> + * @start:   start address of the mapping (inclusive)
>>>> + * @end:     end address of the mapping (exclusive)
>>>> + * @prot:    protection bits of the mapping
>>>> + *
>>>> + * Verify a userspace mapping to an enclave page would not violate
>>>> +the security
>>>> + * requirements of the *kernel*.  Note, this is in no way related
>>>> +to the
>>>> + * page protections enforced by hardware via the EPCM.  The EPCM
>>>> +protections
>>>> + * can be directly extended by the enclave, i.e. cannot be relied
>>>> +upon by the
>>>> + * kernel for security guarantees of any kind.
>>>> + *
>>>> + * Return:
>>>> + *   0 on success,
>>>> + *   -EACCES if the mapping is disallowed
>>>> + */
>>>> +int sgx_map_allowed(struct sgx_encl *encl, unsigned long start,
>>>> +                 unsigned long end, unsigned long prot) {
>>>> +     struct sgx_encl_page *page;
>>>> +     unsigned long addr;
>>>> +
>>>> +     prot &= (VM_READ | VM_WRITE | VM_EXEC);
>>>> +     if (!prot || !encl)
>>>> +             return 0;
>>>> +
>>>> +     mutex_lock(&encl->lock);
>>>> +
>>>> +     for (addr = start; addr < end; addr += PAGE_SIZE) {
>>>> +             page = radix_tree_lookup(&encl->page_tree, addr >>
>>>> PAGE_SHIFT);
>>>> +
>>>> +             /*
>>>> +              * Do not allow R|W|X to a non-existent page, or
>> protections
>>>> +              * beyond those of the existing enclave page.
>>>> +              */
>>>> +             if (!page || (prot & ~page->prot))
>>>> +                     return -EACCES;
>>> 
>>> In SGX2, pages will be "mapped" before being populated.
>>> 
>>> Here's a brief summary for those who don't have enough background on
>> how new EPC pages could be added to a running enclave in SGX2:
>>>  - There are 2 new instructions - EACCEPT and EAUG.
>>>  - EAUG is used by SGX module to add (augment) a new page to an
>> existing enclave. The newly added page is *inaccessible* until the
>> enclave *accepts* it.
>>>  - EACCEPT is the instruction for an enclave to accept a new page.
>>> 
>>> And the s/w flow for an enclave to request new EPC pages is expected
>> to be something like the following:
>>>  - The enclave issues EACCEPT at the linear address that it would
>> like a new page.
>>>  - EACCEPT results in #PF, as there's no page at the linear address
>> above.
>>>  - SGX module is notified about the #PF, in form of its vma->vm_ops-
>>> fault() being called by kernel.
>>>  - SGX module EAUGs a new EPC page at the fault address, and resumes
>> the enclave.
>>>  - EACCEPT is reattempted, and succeeds at this time.
>> 
>> This seems like an odd workflow.  Shouldn't the #PF return back to
>> untrusted userspace so that the untrusted user code can make its own
>> decision as to whether it wants to EAUG a page there as opposed to, say,
>> killing the enclave or waiting to keep resource usage under control?
> 
> This may seem odd to some at the first glance. But if you can think of how static heap (pre-allocated by EADD before EINIT) works, the load parses the "metadata" coming with the enclave to decide the address/size of the heap, EADDs it, and calls it done. In the case of "dynamic" heap (allocated dynamically by EAUG after EINIT), the same thing applies - the loader determines the range of the heap, tells the SGX module about it, and calls it done. Everything else is the between the enclave and the SGX module.
> 
> In practice, untrusted code usually doesn't know much about enclaves, just like it doesn't know much about the shared objects loaded into its address space either. Without the necessary knowledge, untrusted code usually just does what it is told (via o-calls, or return value from e-calls), without judging that's right or wrong. 
> 
> When it comes to #PF like what I described, of course a signal could be sent to the untrusted code but what would it do then? Usually it'd just come back asking for a page at the fault address. So we figured it'd be more efficient to just have the kernel EAUG at #PF. 
> 
> Please don't get me wrong though, as I'm not dictating what the s/w flow shall be. It's just going to be a choice offered to user mode. And that choice was planned to be offered via mprotect() - i.e. a writable vma causes kernel to EAUG while a non-writable vma will result in a signal (then the user mode could decide whether to EAUG). The key point is flexibility - as we want to allow all reasonable s/w flows instead of dictating one over others. We had similar discussions on vDSO API before. And I think you accepted my approach because of its flexibility. Am I right?

As long as user code can turn this off, I have no real objection. But it might make sense to have it be more explicit — have an ioctl set up a range as “EAUG-on-demand”.

But this is all currently irrelevant. We can argue about it when the patches show up. :)