[PATCH 10/17] prmem: documentation

Tue Oct 30 17:06:51 UTC 2018

> On Oct 30, 2018, at 9:37 AM, Kees Cook <keescook at chromium.org> wrote:
> 
>> On Tue, Oct 30, 2018 at 8:26 AM, Peter Zijlstra <peterz at infradead.org> wrote:
>> I suppose the 'normal' attack goes like:
>> 
>> 1) find buffer-overrun / bound check failure
>> 2) use that to write to 'interesting' location
>> 3) that write results arbitrary code execution
>> 4) win
>> 
>> Of course, if the store of 2 is to the current cred structure, and
>> simply sets the effective uid to 0, we can skip 3.
> 
> In most cases, yes, gaining root is game over. However, I don't want
> to discount other threat models: some systems have been designed not
> to trust root, so a cred attack doesn't always get an attacker full
> control (e.g. lockdown series, signed modules, encrypted VMs, etc).
> 
>> Which seems to suggest all cred structures should be made r/o like this.
>> But I'm not sure I remember these patches doing that.
> 
> There are things that attempt to protect cred (and other things, like
> page tables) via hypervisors (see Samsung KNOX) or via syscall
> boundary checking (see Linux Kernel Runtime Guard). They're pretty
> interesting, but I'm not sure if there is a clear way forward on it
> working in upstream, but that's why I think these discussions are
> useful.
> 
>> Also, there is an inverse situation with all this. If you make
>> everything R/O, then you need this allow-write for everything you do,
>> which then is about to include a case with an overflow / bound check
>> fail, and we're back to square 1.
> 
> Sure -- this is the fine line in trying to build these defenses. The
> point is to narrow the scope of attack. Stupid metaphor follows: right
> now we have only a couple walls; if we add walls we can focus on make
> sure the doors and windows are safe. If we make the relatively
> easy-to-find-in-memory page tables read-only-at-rest then a whole
> class of very powerful exploits that depend on page table attacks go
> away.
> 
> As part of all of this is the observation that there are two types of
> things clearly worth protecting: that which is updated rarely (no need
> to leave it writable for so much of its lifetime), and that which is
> especially sensitive (page tables, security policy, function pointers,
> etc). Finding a general purpose way to deal with these (like we have
> for other data-lifetime cases like const and __ro_after_init) would be
> very nice. I don't think there is a slippery slope here.
> 
> 

Since I wasn’t cc’d on this series:

I support the addition of a rare-write mechanism to the upstream kernel.  And I think that there is only one sane way to implement it: using an mm_struct. That mm_struct, just like any sane mm_struct, should only differ from init_mm in that it has extra mappings in the *user* region.

If anyone wants to use CR0.WP instead, I’ll remind them that they have to fix up the entry code and justify the added complexity. And make performance not suck in a VM (i.e. CR0 reads on entry are probably off the table).  None of these will be easy.

If anyone wants to use kmap_atomic-like tricks, I’ll point out that we already have enough problems with dangling TLB entries due to SMP issues. The last thing we need is more of them. If someone proposes a viable solution that doesn’t involve CR3 fiddling, I’ll be surprised.

Keep in mind that switch_mm() is actually decently fast on modern CPUs.  It’s probably considerably faster than writing CR0, although I haven’t benchmarked it. It’s certainly faster than writing CR4.  It’s also faster than INVPCID, surprisingly, which means that it will be quite hard to get better performance using any sort of trickery.

Nadav’s patch set would be an excellent starting point.

P.S. EFI is sort of grandfathered in as a hackish alternate page table hierarchy. We’re not adding another one.