[PATCH 0/2 v2] remove PF_MEMALLOC_NORECLAIM
Kent Overstreet
kent.overstreet at linux.dev
Mon Sep 2 22:32:33 UTC 2024
On Mon, Sep 02, 2024 at 02:52:52PM GMT, Andrew Morton wrote:
> On Mon, 2 Sep 2024 05:53:59 -0400 Kent Overstreet <kent.overstreet at linux.dev> wrote:
>
> > On Mon, Sep 02, 2024 at 11:51:48AM GMT, Michal Hocko wrote:
> > > The previous version has been posted in [1]. Based on the review feedback
> > > I have sent v2 of patches in the same threat but it seems that the
> > > review has mostly settled on these patches. There is still an open
> > > discussion on whether having a NORECLAIM allocator semantic (compare to
> > > atomic) is worthwhile or how to deal with broken GFP_NOFAIL users but
> > > those are not really relevant to this particular patchset as it 1)
> > > doesn't aim to implement either of the two and 2) it aims at spreading
> > > PF_MEMALLOC_NORECLAIM use while it doesn't have a properly defined
> > > semantic now that it is not widely used and much harder to fix.
> > >
> > > I have collected Reviewed-bys and reposting here. These patches are
> > > touching bcachefs, VFS and core MM so I am not sure which tree to merge
> > > this through but I guess going through Andrew makes the most sense.
> > >
> > > Changes since v1;
> > > - compile fixes
> > > - rather than dropping PF_MEMALLOC_NORECLAIM alone reverted eab0af905bfc
> > > ("mm: introduce PF_MEMALLOC_NORECLAIM, PF_MEMALLOC_NOWARN") suggested
> > > by Matthew.
> >
> > To reiterate:
> >
>
> It would be helpful to summarize your concerns.
>
> What runtime impact do you expect this change will have upon bcachefs?
For bcachefs: I try really hard to minimize tail latency and make
performance robust in extreme scenarios - thrashing. A large part of
that is that btree locks must be held for no longer than necessary.
We definitely don't want to recurse into other parts of the kernel,
taking other locks (i.e. in memory reclaim) while holding btree locks;
that's a great way to stack up (and potentially multiply) latencies.
But gfp flags don't work with vmalloc allocations (and that's unlikely
to change), and we require vmalloc fallbacks for e.g. btree node
allocation. That's the big reason we want MEMALLOC_PF_NORECLAIM.
Besides that, it's just cleaner, memalloc flags are the direction we
want to be moving in, and it's going to be necessary if we ever want to
do a malloc() that doesn't require a gfp flags parameter. That would be
a win for safety and correctness in the kernel, and it's also likely
required for proper Rust support.
And the "GFP_NOFAIL must not fail" argument makes no sense, because a
failing a GFP_NOFAIL allocation is the only sane thing to do if the
allocation is buggy (too big, i.e. resulting from an integer overflow
bug, or wrong context). The alternatives are at best never returning
(stuck unkillable process), or a scheduling while atomic bug, or Michal
was even proposing killing the process (handling it like a BUG()!).
But we don't use BUG_ON() for things that we can't prove won't happen in
the wild if we can write an error path.
That is, PF_MEMALLOC_NORECLAIM lets us turn bugs into runtime errors.
More information about the Linux-security-module-archive
mailing list