[PATCH] Revert "Enable '-Werror' by default for all kernel builds"

Tue Sep 7 20:30:42 UTC 2021

On Tue, Sep 7, 2021 at 12:16 PM Linus Torvalds
<torvalds at linux-foundation.org> wrote:
>
> On Tue, Sep 7, 2021 at 11:39 AM Nick Desaulniers
> <ndesaulniers at google.com> wrote:
> >
> > The above commit seems as though it was merged in response to
> > https://lore.kernel.org/linux-hardening/CAHk-=wj4EG=kCOaqyPEq5VXa97kyUHsBpBn3DWwE91qcnDytOQ@mail.gmail.com/.
>
> No. It was merged in response of _years_ of pain, with the last one
> just being the final drop.

Well, I think it's important to enumerate some of the pain points
explicitly so that we might all better understand what they are, and
come to an agreement together on what the right way to resolve them
is.

Impulse commits without discussion over a holiday weekend are firmly
in the category of "I wish you'd rather not have done that."

Let's look closer at the final drop.  I think there's a bigger question here:
1. did the CI have enough time to run in multiple configurations with
multiple toolchains to even find a warning, let alone report it?
2. what did the developer do with the report (if they even got it, see
1 above)? Ignore it? Proceed with the PR anyways?

it looks to me as though the "final drop" fits into 1 above. I'm not
sure that -Werror is the correct way to resolve issues from that.

> I'm not going to revert that change. I probably will have to limit it
> (by making that WERROR option depend on certain expectations), but
> basically any maintainer who has code that causes warnings should
> expect that they will have to fix those warnings.

I'm not 100% against it; I think it could land in a more useful
variation. But it would be good to discuss that on-list, and give it
time to soak. This is a conversation we should be having with CI
maintainers IMO, and folks that maintain the build infra, at least.
I'm happy to kick off that discussion with this RFC.

> If it's clang that generates bogus warnings, then we'll have to start
> disable clang warnings. The clang people tend to be proud of thir
> fewer false positives, but so far looking at things, I am not
> convinced.

Now you're just attacking a strawman.

> And I'm most definitely not convinced when the "let's finally enable
> -Werror after years of talking about it", people end up going "but but
> but I have thousands of warnings".

The kernel is full of thousands of warnings at the moment.  It might
not be for your limited set of configs, targets, and toolchains, but
the folks running CI are very very well aware that the kernel is far
from enabling -Werror seriously.

Any given maintainer sending you a PR cannot (and should not, IMO)
know under what combination of configs, targets, and toolchains you'll
test under, and -Werror isn't going to help them figure it out.

Not every commit that makes its way to mainline has spent equal
amounts of soak time in -next. (I suspect there are commits going into
mainline that spent zero time in -next, not just from you).  Patches
merged by maintainers shortly before the merge window open have
minimal coverage compared to older commits they've accepted.  So CI
systems can't find diagnostics (warnings/errors) in various
combinations of config/target/toolchain if patches are skipping -next
or spending a short amount of time in -next.

To be clear, you have merged patches into mainline that broke the
build for combinations of configs/targets/toolchains that you are not
testing.  It's not realistic for you or any one person to test all
such combinations either.  Other software projects have solved this by
relying on bots to handle merges. Bots that don't forget to test
combinations of various targets, or enable developers to ignore
warnings (if the warnings are even reported).

> That's the POINT of that commit. That "but but but I have thousands of
> warnings" is not acceptable.
>
> I spent hours yesterday getting rid of some warnings. It shouldn't be
> on me fixing peoples code. It shouldn't be on me noticing that people
> send me crap that warns.

I agree. I disagree that -Werror will solve that.  Developers will get
the same report from 0day bot, just now as an error rather than
warning, and they will continue to ignore the report because "well it
works for me."  (Maybe I'm attacking the strawman now).

> And it really shouldn't be "Linus cares about warnings, so
> configurations that Linus doesn't test can continue for years to have
> them".

I agree, and we have been making progress on this. I'd say the advent
of the various CI systems over the past 5 years is a much welcomed
improvement in at least understanding which combinations are exposing
issues.

> My "no warnings" policy isn't exactly new, and people shouldn't be
> shocked when I then say "time to clean up *YOUR* house too".

We have been cleaning house, for nearly half a decade in my
experience. Fixing warnings in the Linux kernel for all possible
configs, targets, and toolchains while the toolchains continue to add
more diagnostics is more sisyphean than digging a hole in wet sand.
-- 
Thanks,
~Nick Desaulniers