[RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

Xing, Cedric cedric.xing at intel.com
Thu Jun 13 21:02:58 UTC 2019


> From: linux-sgx-owner at vger.kernel.org [mailto:linux-sgx-
> owner at vger.kernel.org] On Behalf Of Stephen Smalley
> 
> On 6/11/19 6:55 PM, Xing, Cedric wrote:
> >> From: linux-sgx-owner at vger.kernel.org [mailto:linux-sgx-
> >> owner at vger.kernel.org] On Behalf Of Stephen Smalley
> >> Sent: Tuesday, June 11, 2019 6:40 AM
> >>
> >>>
> >>> +#ifdef CONFIG_INTEL_SGX
> >>> +	rc = sgxsec_mprotect(vma, prot);
> >>> +	if (rc <= 0)
> >>> +		return rc;
> >>
> >> Why are you skipping the file_map_prot_check() call when rc == 0?
> >> What would SELinux check if you didn't do so -
> >> FILE__READ|FILE__WRITE|FILE__EXECUTE to /dev/sgx/enclave?  Is it a
> >> problem to let SELinux proceed with that check?
> >
> > We can continue the check. But in practice, all
> FILE__{READ|WRITE|EXECUTE} are needed for every enclave, then what's the
> point of checking them? FILE__EXECMOD may be the only flag that has a
> meaning, but it's kind of redundant because sigstruct file was checked
> against that already.
> 
> I don't believe FILE__EXECMOD will be checked since it is a shared file
> mapping.  We'll check at least FILE__READ and FILE__WRITE anyway upon
> open(), and possibly FILE__EXECUTE upon mmap() unless that is never
> PROT_EXEC.  We want the policy to accurately reflect the operations of
> the system, even when an operation "must" be allowed, and even here this
> only needs to be allowed to processes authorized as enclave loaders, not
> to all processes.
> 
> I don't think there are other examples where we skip a SELinux check
> like this.  If we were to do so here, we would at least need a comment
> explaining that it was intentional and why.  The risk would be that
> future checking added into file_map_prot_check() would be unwittingly
> bypassed for these mappings.  A warning there would also be advisable if
> we skip it for these mappings.

You are right! The code was written assuming file_map_prot_check() wouldn't object if sgxsec_mprotect() approves it, but that may not always be the case if new checks are added in future. I'll add the check back.
 
> 
> >
> >>> +static int selinux_enclave_load(struct file *encl, unsigned long
> addr,
> >>> +				unsigned long size, unsigned long prot,
> >>> +				struct vm_area_struct *source)
> >>> +{
> >>> +	if (source) {
> >>> +		/**
> >>> +		 * Adding page from source => EADD request
> >>> +		 */
> >>> +		int rc = selinux_file_mprotect(source, prot, prot);
> >>> +		if (rc)
> >>> +			return rc;
> >>> +
> >>> +		if (!(prot & VM_EXEC) &&
> >>> +		    selinux_file_mprotect(source, VM_EXEC, VM_EXEC))
> >>
> >> I wouldn't conflate VM_EXEC with PROT_EXEC even if they happen to be
> >> defined with the same values currently.  Elsewhere the kernel appears
> >> to explicitly translate them ala calc_vm_prot_bits().
> >
> > Thanks! I'd change them to PROT_EXEC in the next version.
> >
> >>
> >> Also, this will mean that we will always perform an execute check on
> >> all sources, thereby triggering audit denial messages for any EADD
> >> sources that are only intended to be data.  Depending on the source,
> >> this could trigger PROCESS__EXECMEM or FILE__EXECMOD or
> >> FILE__EXECUTE.  In a world where users often just run any denials
> >> they see through audit2allow, they'll end up always allowing them
> >> all.  How can they tell whether it was needed? It would be preferable
> >> if we could only trigger execute checks when there is some
> >> probability that execute will be requested in the future.
> >> Alternatives would be to silence the audit of these permission checks
> >> always via use of _noaudit() interfaces or to silence audit of these
> >> permissions via dontaudit rules in policy, but the latter would hide
> >> all denials of the permission by the process, not just those
> >> triggered from security_enclave_load().  And if we silence them, then
> we won't see them even if they were needed.
> >
> > *_noaudit() is exactly what I wanted. But I couldn't find
> selinux_file_mprotect_noaudit()/file_has_perm_noaudit(), and I'm
> reluctant to duplicate code. Any suggestions?
> 
> I would have no objection to adding _noaudit() variants of these, either
> duplicating code (if sufficiently small/simple) or creating a common
> helper with a bool audit flag that gets used for both. But the larger
> issue would be to resolve how to ultimately ensure that a denial is
> audited later if the denied permission is actually requested and blocked
> via sgxsec_mprotect().

The idea here is to precompute the answers as if a certain request were received, so that we don't have to store all inputs to the precomputation. sgxsec_mprotect(), if coded correctly, would make the same decision regardless it was precomputed or computed at the time of the real request. Auditing requires more information than making the decision itself, such as the file path and when the request was made. I'm reluctant to keep the source files open just for audit logs. I'll need a closer look at the auditing code to figure out an appropriate way.

> 
> >
> >>
> >>> +			prot = 0;
> >>> +		else {
> >>> +			prot = SGX__EXECUTE;
> >>> +			if (source->vm_file &&
> >>> +			    !file_has_perm(current_cred(), source->vm_file,
> >>> +					   FILE__EXECMOD))
> >>> +				prot |= SGX__EXECMOD;
> >>
> >> Similarly, this means that we will always perform a FILE__EXECMOD
> check
> >> on all executable sources, triggering audit denial messages for any
> EADD
> >> source that is executable but to which EXECMOD is not allowed, and
> again
> >> the most common pattern will be that users will add EXECMOD to all
> >> executable sources to avoid this.
> >>
> >>> +		}
> >>> +		return sgxsec_eadd(encl, addr, size, prot);
> >>> +	} else {
> >>> +		/**
> >>> +		  * Adding page from NULL => EAUG request
> >>> +		  */
> >>> +		return sgxsec_eaug(encl, addr, size, prot);
> >>> +	}
> >>> +}
> >>> +
> >>> +static int selinux_enclave_init(struct file *encl,
> >>> +				const struct sgx_sigstruct *sigstruct,
> >>> +				struct vm_area_struct *vma)
> >>> +{
> >>> +	int rc = 0;
> >>> +
> >>> +	if (!vma)
> >>> +		rc = -EINVAL;
> >>
> >> Is it ever valid to call this hook with a NULL vma?  If not, this
> should
> >> be handled/prevented by the caller.  If so, I'd just return -EINVAL
> >> immediately here.
> >
> > vma shall never be NULL. I'll update it in the next version.
> >
> >>
> >>> +
> >>> +	if (!rc && !(vma->vm_flags & VM_EXEC))
> >>> +		rc = selinux_file_mprotect(vma, VM_EXEC, VM_EXEC);
> >>
> >> I had thought we were trying to avoid overloading FILE__EXECUTE (or
> >> whatever gets checked here, e.g. could be PROCESS__EXECMEM or
> >> FILE__EXECMOD) on the sigstruct file, since the caller isn't truly
> >> executing code from it.
> >
> > Agreed. Another problem with FILE__EXECMOD on the sigstruct file is
> that user code would then be allowed to modify SIGSTRUCT at will, which
> effectively wipes out the protection provided by FILE__EXECUTE.
> >
> >>
> >> I'd define new ENCLAVE__* permissions, including an up-front
> >> ENCLAVE__INIT permission that governs whether the sigstruct file can
> be
> >> used at all irrespective of memory protections.
> >
> > Agreed.
> >
> >>
> >> Then you can also have ENCLAVE__EXECUTE, ENCLAVE__EXECMEM,
> >> ENCLAVE__EXECMOD for the execute-related checks.  Or you can use the
> >> /dev/sgx/enclave inode as the target for the execute checks and just
> >> reuse the file permissions there.
> >
> > Now we've got 2 options - 1) New ENCLAVE__* flags on sigstruct file or
> 2) FILE__* on /dev/sgx/enclave. Which one do you think makes more sense?
> >
> > ENCLAVE__EXECMEM seems to offer finer granularity (than
> PROCESS__EXECMEM) but I wonder if it'd have any real use in practice.
> 
> Defining a separate ENCLAVE__EXECUTE and using it here for the sigstruct
> file would avoid any ambiguity with the FILE__EXECUTE check to the
> /dev/sgx/enclave inode that might occur upon mmap() or mprotect().  A
> separate ENCLAVE__EXECMEM would enable allowing WX within the enclave
> while denying it in the host application or vice versa, which could be a
> good thing for security, particularly if SGX2 largely ends up always
> wanting WX.

Agreed. I'll include those new flags in my next version.

> 
> >
> >>> +int sgxsec_mprotect(struct vm_area_struct *vma, size_t prot) {
> >>> +	struct enclave_sec *esec;
> >>> +	int rc;
> >>> +
> >>> +	if (!vma->vm_file || !(esec = __esec(selinux_file(vma->vm_file))))
> >> {
> >>> +		/* Positive return value indicates non-enclave VMA */
> >>> +		return 1;
> >>> +	}
> >>> +
> >>> +	down_read(&esec->sem);
> >>> +	rc = enclave_mprotect(&esec->regions, vma->vm_start, vma->vm_end,
> >>> +prot);
> >>
> >> Why is it safe for this to only use down_read()? enclave_mprotect()
> can
> >> call enclave_prot_set_cb() which modifies the list?
> >
> > Probably because it was too late at night when I wrote this line:-
> ( Good catch!
> >
> >>
> >> I haven't looked at this code closely, but it feels like a lot of
> SGX-
> >> specific logic embedded into SELinux that will have to be repeated or
> >> reused for every security module.  Does SGX not track this state
> itself?
> >
> > I can tell you have looked quite closely, and I truly think you for
> your time!
> >
> > You are right that there are SGX specific stuff. More precisely, SGX
> enclaves don't have access to anything except memory, so there are only
> 3 questions that need to be answered for each enclave page: 1) whether X
> is allowed; 2) whether W->X is allowed and 3 whether WX is allowed. This
> proposal tries to cache the answers to those questions upon creation of
> each enclave page, meaning it involves a) figuring out the answers and b)
> "remember" them for every page. #b is generic, mostly captured in
> intel_sgx.c, and could be shared among all LSM modules; while #a is
> SELinux specific. I could move intel_sgx.c up one level in the directory
> hierarchy if that's what you'd suggest.
> >
> > By "SGX", did you mean the SGX subsystem being upstreamed? It doesn’t
> track that state. In practice, there's no way for SGX to track it
> because there's no vm_ops->may_mprotect() callback. It doesn't follow
> the philosophy of Linux either, as mprotect() doesn't track it for
> regular memory. And it doesn't have a use without LSM, so I believe it
> makes more sense to track it inside LSM.
> 
> Yes, the SGX driver/subsystem.  I had the impression from Sean that it
> does track this kind of per-page state already in some manner, but
> possibly he means it does under a given proposal and not in the current
> driver.

Yes, SGX subsystem does track per-page states. But this page protection flags apply to *ranges*. 

In practice, those per-page states are *not* checked at mmap/mprotect. They are used mainly by vm_ops->fault() and the page swapper thread.

That said, merging protection flags into per-page states will require page-by-page checks, which will definitely hurt performance. Unless the driver also maintains some range oriented structures just like what you see here.

> 
> Even the #b remembering might end up being SELinux-specific if we also
> have to remember the original inputs used to compute the answer so that
> we can audit that information when access is denied later upon
> mprotect().  At the least we'd need it to save some opaque data and pass
> it to a callback into SELinux to perform that auditing.

Agreed. What's commonly needed here is a data structure that supports setting/querying value on ranges. It's close to what xarray supports, but xarray doesn't support range querying.



More information about the Linux-security-module-archive mailing list