[RFC PATCH v1 2/3] LSM/x86/sgx: Implement SGX specific hooks in SELinux

Thu Jun 13 18:00:29 UTC 2019

On 6/11/19 6:55 PM, Xing, Cedric wrote:
>> From: linux-sgx-owner at vger.kernel.org [mailto:linux-sgx-
>> owner at vger.kernel.org] On Behalf Of Stephen Smalley
>> Sent: Tuesday, June 11, 2019 6:40 AM
>>
>>>
>>> +#ifdef CONFIG_INTEL_SGX
>>> +	rc = sgxsec_mprotect(vma, prot);
>>> +	if (rc <= 0)
>>> +		return rc;
>>
>> Why are you skipping the file_map_prot_check() call when rc == 0?
>> What would SELinux check if you didn't do so -
>> FILE__READ|FILE__WRITE|FILE__EXECUTE to /dev/sgx/enclave?  Is it a
>> problem to let SELinux proceed with that check?
> 
> We can continue the check. But in practice, all FILE__{READ|WRITE|EXECUTE} are needed for every enclave, then what's the point of checking them? FILE__EXECMOD may be the only flag that has a meaning, but it's kind of redundant because sigstruct file was checked against that already.

I don't believe FILE__EXECMOD will be checked since it is a shared file 
mapping.  We'll check at least FILE__READ and FILE__WRITE anyway upon 
open(), and possibly FILE__EXECUTE upon mmap() unless that is never 
PROT_EXEC.  We want the policy to accurately reflect the operations of 
the system, even when an operation "must" be allowed, and even here this 
only needs to be allowed to processes authorized as enclave loaders, not 
to all processes.

I don't think there are other examples where we skip a SELinux check 
like this.  If we were to do so here, we would at least need a comment 
explaining that it was intentional and why.  The risk would be that 
future checking added into file_map_prot_check() would be unwittingly 
bypassed for these mappings.  A warning there would also be advisable if 
we skip it for these mappings.

> 
>>> +static int selinux_enclave_load(struct file *encl, unsigned long addr,
>>> +				unsigned long size, unsigned long prot,
>>> +				struct vm_area_struct *source)
>>> +{
>>> +	if (source) {
>>> +		/**
>>> +		 * Adding page from source => EADD request
>>> +		 */
>>> +		int rc = selinux_file_mprotect(source, prot, prot);
>>> +		if (rc)
>>> +			return rc;
>>> +
>>> +		if (!(prot & VM_EXEC) &&
>>> +		    selinux_file_mprotect(source, VM_EXEC, VM_EXEC))
>>
>> I wouldn't conflate VM_EXEC with PROT_EXEC even if they happen to be
>> defined with the same values currently.  Elsewhere the kernel appears to
>> explicitly translate them ala calc_vm_prot_bits().
> 
> Thanks! I'd change them to PROT_EXEC in the next version.
> 
>>
>> Also, this will mean that we will always perform an execute check on all
>> sources, thereby triggering audit denial messages for any EADD sources
>> that are only intended to be data.  Depending on the source, this could
>> trigger PROCESS__EXECMEM or FILE__EXECMOD or FILE__EXECUTE.  In a world
>> where users often just run any denials they see through audit2allow,
>> they'll end up always allowing them all.  How can they tell whether it
>> was needed? It would be preferable if we could only trigger execute
>> checks when there is some probability that execute will be requested in
>> the future.  Alternatives would be to silence the audit of these
>> permission checks always via use of _noaudit() interfaces or to silence
>> audit of these permissions via dontaudit rules in policy, but the latter
>> would hide all denials of the permission by the process, not just those
>> triggered from security_enclave_load().  And if we silence them, then we
>> won't see them even if they were needed.
> 
> *_noaudit() is exactly what I wanted. But I couldn't find selinux_file_mprotect_noaudit()/file_has_perm_noaudit(), and I'm reluctant to duplicate code. Any suggestions?

I would have no objection to adding _noaudit() variants of these, either 
duplicating code (if sufficiently small/simple) or creating a common 
helper with a bool audit flag that gets used for both. But the larger 
issue would be to resolve how to ultimately ensure that a denial is 
audited later if the denied permission is actually requested and blocked 
via sgxsec_mprotect().

>   
>>
>>> +			prot = 0;
>>> +		else {
>>> +			prot = SGX__EXECUTE;
>>> +			if (source->vm_file &&
>>> +			    !file_has_perm(current_cred(), source->vm_file,
>>> +					   FILE__EXECMOD))
>>> +				prot |= SGX__EXECMOD;
>>
>> Similarly, this means that we will always perform a FILE__EXECMOD check
>> on all executable sources, triggering audit denial messages for any EADD
>> source that is executable but to which EXECMOD is not allowed, and again
>> the most common pattern will be that users will add EXECMOD to all
>> executable sources to avoid this.
>>
>>> +		}
>>> +		return sgxsec_eadd(encl, addr, size, prot);
>>> +	} else {
>>> +		/**
>>> +		  * Adding page from NULL => EAUG request
>>> +		  */
>>> +		return sgxsec_eaug(encl, addr, size, prot);
>>> +	}
>>> +}
>>> +
>>> +static int selinux_enclave_init(struct file *encl,
>>> +				const struct sgx_sigstruct *sigstruct,
>>> +				struct vm_area_struct *vma)
>>> +{
>>> +	int rc = 0;
>>> +
>>> +	if (!vma)
>>> +		rc = -EINVAL;
>>
>> Is it ever valid to call this hook with a NULL vma?  If not, this should
>> be handled/prevented by the caller.  If so, I'd just return -EINVAL
>> immediately here.
> 
> vma shall never be NULL. I'll update it in the next version.
> 
>>
>>> +
>>> +	if (!rc && !(vma->vm_flags & VM_EXEC))
>>> +		rc = selinux_file_mprotect(vma, VM_EXEC, VM_EXEC);
>>
>> I had thought we were trying to avoid overloading FILE__EXECUTE (or
>> whatever gets checked here, e.g. could be PROCESS__EXECMEM or
>> FILE__EXECMOD) on the sigstruct file, since the caller isn't truly
>> executing code from it.
> 
> Agreed. Another problem with FILE__EXECMOD on the sigstruct file is that user code would then be allowed to modify SIGSTRUCT at will, which effectively wipes out the protection provided by FILE__EXECUTE.
> 
>>
>> I'd define new ENCLAVE__* permissions, including an up-front
>> ENCLAVE__INIT permission that governs whether the sigstruct file can be
>> used at all irrespective of memory protections.
> 
> Agreed.
> 
>>
>> Then you can also have ENCLAVE__EXECUTE, ENCLAVE__EXECMEM,
>> ENCLAVE__EXECMOD for the execute-related checks.  Or you can use the
>> /dev/sgx/enclave inode as the target for the execute checks and just
>> reuse the file permissions there.
> 
> Now we've got 2 options - 1) New ENCLAVE__* flags on sigstruct file or 2) FILE__* on /dev/sgx/enclave. Which one do you think makes more sense?
> 
> ENCLAVE__EXECMEM seems to offer finer granularity (than PROCESS__EXECMEM) but I wonder if it'd have any real use in practice.

Defining a separate ENCLAVE__EXECUTE and using it here for the sigstruct 
file would avoid any ambiguity with the FILE__EXECUTE check to the 
/dev/sgx/enclave inode that might occur upon mmap() or mprotect().  A 
separate ENCLAVE__EXECMEM would enable allowing WX within the enclave 
while denying it in the host application or vice versa, which could be a 
good thing for security, particularly if SGX2 largely ends up always 
wanting WX.

> 
>>> +int sgxsec_mprotect(struct vm_area_struct *vma, size_t prot) {
>>> +	struct enclave_sec *esec;
>>> +	int rc;
>>> +
>>> +	if (!vma->vm_file || !(esec = __esec(selinux_file(vma->vm_file))))
>> {
>>> +		/* Positive return value indicates non-enclave VMA */
>>> +		return 1;
>>> +	}
>>> +
>>> +	down_read(&esec->sem);
>>> +	rc = enclave_mprotect(&esec->regions, vma->vm_start, vma->vm_end,
>>> +prot);
>>
>> Why is it safe for this to only use down_read()? enclave_mprotect() can
>> call enclave_prot_set_cb() which modifies the list?
> 
> Probably because it was too late at night when I wrote this line:-( Good catch!
> 
>>
>> I haven't looked at this code closely, but it feels like a lot of SGX-
>> specific logic embedded into SELinux that will have to be repeated or
>> reused for every security module.  Does SGX not track this state itself?
> 
> I can tell you have looked quite closely, and I truly think you for your time!
> 
> You are right that there are SGX specific stuff. More precisely, SGX enclaves don't have access to anything except memory, so there are only 3 questions that need to be answered for each enclave page: 1) whether X is allowed; 2) whether W->X is allowed and 3 whether WX is allowed. This proposal tries to cache the answers to those questions upon creation of each enclave page, meaning it involves a) figuring out the answers and b) "remember" them for every page. #b is generic, mostly captured in intel_sgx.c, and could be shared among all LSM modules; while #a is SELinux specific. I could move intel_sgx.c up one level in the directory hierarchy if that's what you'd suggest.
> 
> By "SGX", did you mean the SGX subsystem being upstreamed? It doesn’t track that state. In practice, there's no way for SGX to track it because there's no vm_ops->may_mprotect() callback. It doesn't follow the philosophy of Linux either, as mprotect() doesn't track it for regular memory. And it doesn't have a use without LSM, so I believe it makes more sense to track it inside LSM.

Yes, the SGX driver/subsystem.  I had the impression from Sean that it 
does track this kind of per-page state already in some manner, but 
possibly he means it does under a given proposal and not in the current 
driver.

Even the #b remembering might end up being SELinux-specific if we also 
have to remember the original inputs used to compute the answer so that 
we can audit that information when access is denied later upon 
mprotect().  At the least we'd need it to save some opaque data and pass 
it to a callback into SELinux to perform that auditing.