[RFC PATCH v2 1/3] x86/sgx: Add SGX specific LSM hooks

Tue Jul 2 06:35:42 UTC 2019

> From: Andy Lutomirski [mailto:luto at kernel.org]
> Sent: Monday, July 1, 2019 7:29 PM
> 
> On Mon, Jul 1, 2019 at 12:56 PM Xing, Cedric <cedric.xing at intel.com> wrote:
> >
> > > From: Andy Lutomirski [mailto:luto at kernel.org]
> > > Sent: Monday, July 01, 2019 12:36 PM
> > >
> > > On Mon, Jul 1, 2019 at 11:31 AM Xing, Cedric <cedric.xing at intel.com>
> > > wrote:
> > > > I intended to say the major reason I objected Sean's approach was its
> > > inability to support SGX2 smoothly - as #PF driven EAUG requires non-
> > > existent pages to be mmap()'ed, otherwise vm_ops->fault wouldn't be
> > > dispatched so EAUG couldn't be issued in response to #PF.
> > >
> > > I still think that, if the kernel wants to support #PF-driven EAUG, it
> > > should be an opt-in thing.  It would be something like
> > > SGX_IOC_ADD_LAZY_EAUG_PAGES or similar.  If it's done that way, then
> > > the driver needs to learn how to track ranges of pages efficiently,
> > > which is another reason to consider leaving all the fancy page / page
> > > range tracking in the driver.
> > >
> > > I don't think it's a good idea for a page fault on any non-EADDed page
> > > in ELRANGE to automatically populate the page.
> >
> > I'm with you. The user code shall be explicit on which range to EAUG pages upon #PF.
> What I'm saying is, a range has to be mapped before the driver could receive #PFs (in the
> form of vm_ops->fault callbacks). But Sean's series doesn’t support that because no pages
> can be mapped before coming into existence.
> >
> > For "page tracking", what information to track is LSM dependent, so it may run into
> problems if different LSMs want to track different things. And that's the major reason I
> think it should be done inside LSM.
> >
> > Besides, the current page tracking structure in the driver is page oriented and its sole
> purpose is to serve #PFs. Page protection is better tracked using range oriented
> structures. Those 2 are orthogonal. It wouldn't reduce the complexity of the whole kernel
> by moving it into SGX driver.
> 
> It seems to me that the driver is going to need to improve its data
> structures to track ranges of pages regardless of any LSM issues.  If
> we're going to have an enclave with a huge ELRANGE and we're going to
> mark some large subset of the full ELRANGE as allocate-on-demand, then
> we are going to want to track that range in some efficient way.  It
> could be a single extent or a set of power-of-two-sized extents (i.e.
> radix tree entries), or something else, but a list of pages, of which
> some are marked not-yet-allocated, isn't going to cut it.
> 
> Once that happens, it seems natural to put whatever permission
> tracking we need into the same data structure.  That's why my proposal
> had the driver getting coarse-grained info from LSM ("may execute
> dirty page", for example) rather than asking the LSM to track the
> whole state machine.
> 
> Does that make sense?

The driver will eventually need some range oriented structures for managing EAUGs. But it doesn't necessarily have to be in the same structure as other per-page information. After all, they are touched by different components of the driver and indeed pretty orthogonal. Evils are always in the details. It may be counter-intuitive but per our prototype years ago, it would be simpler to just keep them separate. 

IIUC, your idea is in fact keeping a FSM inside SGX driver and using return values from security_enclave_load() to argument it. That means LSM has to work quite "closely" with SGX driver (i.e. LSM needs to understand the FSM in SGX driver), which is quite different than all other existing hooks, which basically make binary decisions only. And I'm not sure how to chain LSMs if there are multiple active at the same time. Auditing is also a problem, as you can't generate audit log at the time a real mprotect() request is approved/denied. UAPI is also unpleasant as the enclave loader has to "predict" the maximal protection, which is not always available to the loader at load time, or significant changes to enclave build tools would be necessary. I think the FSM is really part of the policy and internal to LSM (or more particularly, SELinux, as different LSM modules may have different FSMs), so it still makes more sense to me to keep "LSM internals" internal to LSM.