[PATCH 10/17] prmem: documentation

Tue Dec 4 12:34:31 UTC 2018

Hello,
apologies for the delayed answer.
Please find my reply to both last mails in the thread, below.

On 22/11/2018 22:53, Andy Lutomirski wrote:
> On Thu, Nov 22, 2018 at 12:04 PM Matthew Wilcox <willy at infradead.org> wrote:
>>
>> On Thu, Nov 22, 2018 at 09:27:02PM +0200, Igor Stoppa wrote:
>>> I have studied the code involved with Nadav's patchset.
>>> I am perplexed about these sentences you wrote.
>>>
>>> More to the point (to the best of my understanding):
>>>
>>> poking_init()
>>> -------------
>>>    1. it gets one random poking address and ensures to have at least 2
>>>       consecutive PTEs from the same PMD
>>>    2. it then proceeds to map/unmap an address from the first of the 2
>>>       consecutive PTEs, so that, later on, there will be no need to
>>>       allocate pages, which might fail, if poking from atomic context.
>>>    3. at this point, the page tables are populated, for the address that
>>>       was obtained at point 1, and this is ok, because the address is fixed
>>>
>>> write_rare
>>> ----------
>>>    4. it can happen on any available core / thread at any time, therefore
>>>       each of them needs a different address
>>
>> No?  Each CPU has its own CR3 (eg each CPU might be running a different
>> user task).  If you have _one_ address for each allocation, it may or
>> may not be mapped on other CPUs at the same time -- you simply don't care.

Yes, somehow I lost track of that aspect.

>> The writable address can even be a simple formula to calculate from
>> the read-only address, you don't have to allocate an address in the
>> writable mapping space.
>>
> 
> Agreed.  I suggest the formula:
> 
> writable_address = readable_address - rare_write_offset.  For
> starters, rare_write_offset can just be a constant.  If we want to get
> fancy later on, it can be randomized.

ok, I hope I captured it here [1]

> If we do it like this, then we don't need to modify any pagetables at
> all when we do a rare write.  Instead we can set up the mapping at
> boot or when we allocate the rare write space, and the actual rare
> write code can just switch mms and do the write.

I did it. I have little feeling about the actual amount of data 
involved, but there is a (probably very remote) chance that the remap 
wouldn't work, at least in the current implementation.

It's a bit different from what I had in mind initially, since I was 
thinking to have one single approach to both statically allocated memory 
(is there a better way to describe it) and what is provided from the 
allocator that would come next.

As I wrote, I do not particularly like the way I implemented multiple 
functionality vs remapping, but I couldn't figure out any better way to 
do it, so eventually I kept this one, hoping to get some advice on how 
to improve it.

I did not provide yet an example, yet, but IMA has some flags that are 
probably very suitable, since they depend on policy reloading, which can 
happen multiple times, but could be used to disable it.

[1] https://www.openwall.com/lists/kernel-hardening/2018/12/04/3

--
igor