[RFC PATCH v4 01/12] x86/sgx: Use mmu_notifier.release() instead of per-vma refcounting
Sean Christopherson
sean.j.christopherson at intel.com
Mon Jul 8 14:57:07 UTC 2019
On Fri, Jun 21, 2019 at 12:03:36AM +0300, Jarkko Sakkinen wrote:
> On Wed, Jun 19, 2019 at 03:23:50PM -0700, Sean Christopherson wrote:
> > Using per-vma refcounting to track mm_structs associated with an enclave
> > requires hooking .vm_close(), which in turn prevents the mm from merging
> > vmas (precisely to allow refcounting).
>
> Why having sgx_vma_close() prevents that? I do not understand the
> problem statement.
vmas that define .vm_close() cannot be merged.
/*
* If the vma has a ->close operation then the driver probably needs to release
* per-vma resources, so we don't attempt to merge those.
*/
static inline int is_mergeable_vma(struct vm_area_struct *vma,
struct file *file, unsigned long vm_flags,
struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
{
...
if (vma->vm_ops && vma->vm_ops->close)
return 0;
if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
return 0;
return 1;
}
>
> > Avoid refcounting encl_mm altogether by registering an mmu_notifier at
> > .mmap(), removing the dying encl_mm at mmu_notifier.release() and
> > protecting mm_list during reclaim via a per-enclave SRCU.
>
> Right, there is the potential collision with my changes:
>
> 1. Your patch: enclave life-cycle equals life-cycle of all processes
> that are associated with the enclave.
> 2. My (yet be sent) patch: enclave life-cycle equals the life cycle.
>
> I won't rush with my patch. I rather merge neither at this point and
> you can review mine after you come back from your vacation.
>
> > Removing refcounting/vm_close() allows merging of enclave vmas, at the
> > cost of delaying removal of encl_mm structs from mm_list, i.e. an mm is
> > disassociated from an enclave when the mm exits or the enclave dies, as
> > opposed to when the last vma (in a given mm) is closed.
> >
> > The impact of delying encl_mm removal is its memory footprint and
> > whatever overhead is incurred during EPC reclaim (to walk an mm's vmas).
> > Practically speaking, a stale encl_mm will exist for a meaningful amount
> > of time if and only if the enclave is mapped in a long-lived process and
> > then passed off to another long-lived process. It is expected that the
> > vast majority of use cases will not encounter this condition, e.g. even
> > using a daemon to build enclaves should not result in a stale encl_mm as
> > the builder should never need to mmap() the enclave.
>
> This paragraph speaks only about "well behaving" software.
Malicious software isn't all that interesting as there are far easier ways
to waste system resources. That being said, the encl_mm allocation can
use GFP_KERNEL_ACCOUNT.
> > Even if there are scenarios that lead to defunct encl_mms, the cost is
> > likely far outweighed by the benefits of reducing the number of vmas
> > across all enclaves.
> >
> > Note, using SRCU to protect mm_list is not strictly necessary, i.e. the
> > existing walker with encl_mm refcounting could be massaged to work with
> > mmu_notifier.release(), but the resulting code is subtle and fragile (I
> > never actually got it working). The primary issue is that an encl_mm
> > can't be moved off the list until its refcount goes to zero, otherwise
> > the custom walker goes off into the weeds. The refcount requirement
> > then prevents using mm_list to identify if an mmu_notifier.release()
> > has fired, i.e. another mechanism is needed to guard against races
> > between exit_mmap() and sgx_release().
>
> Is it really impossible to send a separate SRCU patch?
I can split out the SRCU as a precursor. It'll likely take me a few days
to get it sent.
> I fully agree with the SRCU whereas rest of this patch is still
> under debate.
>
> If you could do that, I can merge it in no time. It is a small
> step into better direction.
>
> > Cc: Dave Hansen <dave.hansen at intel.com>
> > Cc: Andy Lutomirski <luto at kernel.org>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson at intel.com>
>
> Needs to be rebased because the master missing your earlier bug fix.
...
> > diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
> > index 9566eb72d417..c6436bbd4a68 100644
> > --- a/arch/x86/kernel/cpu/sgx/encl.c
> > +++ b/arch/x86/kernel/cpu/sgx/encl.c
> > @@ -132,103 +132,125 @@ static struct sgx_encl_page *sgx_encl_load_page(struct sgx_encl *encl,
> > return entry;
> > }
> >
> > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl,
> > - struct mm_struct *mm)
> > +static void sgx_encl_mm_release_wq(struct work_struct *work)
> > +{
> > + struct sgx_encl_mm *encl_mm =
> > + container_of(work, struct sgx_encl_mm, release_work);
> > +
> > + sgx_encl_mm_release(encl_mm);
> > +}
> > +
> > +/*
> > + * Being a call_srcu() callback, this needs to be short, and sgx_encl_release()
> > + * is anything but short. Do the final freeing in yet another async callback.
> > + */
> > +static void sgx_encl_mm_release_delayed(struct rcu_head *rcu)
>
> Would rename this either as *_tail() or *_deferred().
Deferred works for me.
> > +{
> > + struct sgx_encl_mm *encl_mm =
> > + container_of(rcu, struct sgx_encl_mm, rcu);
> > +
> > + INIT_WORK(&encl_mm->release_work, sgx_encl_mm_release_wq);
> > + schedule_work(&encl_mm->release_work);
> > +}
> > +
...
> > @@ -118,11 +123,13 @@ void sgx_encl_destroy(struct sgx_encl *encl);
> > void sgx_encl_release(struct kref *ref);
> > pgoff_t sgx_encl_get_index(struct sgx_encl *encl, struct sgx_encl_page *page);
> > struct page *sgx_encl_get_backing_page(struct sgx_encl *encl, pgoff_t index);
> > -struct sgx_encl_mm *sgx_encl_next_mm(struct sgx_encl *encl,
> > - struct sgx_encl_mm *encl_mm, int *iter);
> > -struct sgx_encl_mm *sgx_encl_mm_add(struct sgx_encl *encl,
> > - struct mm_struct *mm);
> > -void sgx_encl_mm_release(struct kref *ref);
> > +int sgx_encl_mm_add(struct sgx_encl *encl, struct mm_struct *mm);
> > +static inline void sgx_encl_mm_release(struct sgx_encl_mm *encl_mm)
> > +{
> > + kref_put(&encl_mm->encl->refcount, sgx_encl_release);
> > +
> > + kfree(encl_mm);
> > +}
>
> Please just open code this to the two call sites. Makes the code hard to
> follow.
Heh, I waffled between a helper and open coding. I chose poorly :-)
> Right now I did not find anything else questionable from the code
> changes. Repeating myself but if it is by any means possible before
> going away, can you construct a pure SRCU patch.
>
> I could then reconstruct my changes on top off that, which would
> make evalution of both heck a lot easier.
>
> /Jarkko
More information about the Linux-security-module-archive
mailing list