[RFC PATCH 2/7] x86/sci: add core implementation for system call isolation

Fri Apr 26 15:07:27 UTC 2019

> On Apr 26, 2019, at 7:57 AM, James Bottomley <James.Bottomley at hansenpartnership.com> wrote:
> 
>> On Fri, 2019-04-26 at 07:46 -0700, Dave Hansen wrote:
>>> On 4/25/19 2:45 PM, Mike Rapoport wrote:
>>> After the isolated system call finishes, the mappings created
>>> during its execution are cleared.
>> 
>> Yikes.  I guess that stops someone from calling write() a bunch of
>> times on every filesystem using every block device driver and all the
>> DM code to get a lot of code/data faulted in.  But, it also means not
>> even long-running processes will ever have a chance of behaving
>> anything close to normally.
>> 
>> Is this something you think can be rectified or is there something
>> fundamental that would keep SCI page tables from being cached across
>> different invocations of the same syscall?
> 
> There is some work being done to look at pre-populating the isolated
> address space with the expected execution footprint of the system call,
> yes.  It lessens the ROP gadget protection slightly because you might
> find a gadget in the pre-populated code, but it solves a lot of the
> overhead problem.
> 

I’m not even remotely a ROP expert, but: what stops a ROP payload from using all the “fault-in” gadgets that exist — any function that can return on an error without doing to much will fault in the whole page containing the function.

To improve this, we would want some thing that would try to check whether the caller is actually supposed to call the callee, which is more or less the hard part of CFI.  So can’t we just do CFI and call it a day?

On top of that, a robust, maintainable implementation of this thing seems very complicated — for example, what happens if vfree() gets called?