[PATCH RFC] seccomp: Implement syscall isolation based on memory areas

Mon Jun 1 17:53:11 UTC 2020

Matthew Wilcox <willy at infradead.org> writes:

> On Sun, May 31, 2020 at 03:39:33PM +0300, Paul Gofman wrote:
>> > Paul (cc'ed) is the wine expert, but my understanding is that memory
>> > allocation and initial program load of the emulated binary will go
>> > through wine.  It does the allocation and mark the vma accordingly
>> > before returning the allocated range to the windows application.
>> Yes, exactly. Pretty much any memory allocation which Wine does needs
>> syscalls (if those are ever encountered later during executing code from
>> those areas) to be trapped by Wine and passed to Wine's implementation
>> of the corresponding Windows API function. Linux native libraries
>> loading and memory allocations performed by them go outside of Wine control.
>
> I don't like Gabriel's approach very much.  Could we do something like

Hi Matthew,

I don't oppose your suggestion, as Paul said, it should be enough for
us.  But could you elaborate on the problems you see in the original
approach, even if only for my own education?

> issue a syscall before executing a Windows region and then issue another
> syscall when exiting?  If so, we could switch the syscall entry point (ie
> change MSR_LSTAR).  I'm thinking something like a personality() syscall.
> But maybe that would be too high an overhead.
>

-- 
Gabriel Krisman Bertazi