INFO: rcu detected stall in sys_sendfile64

Tue Jan 7 13:02:47 UTC 2020

On Sat, Jan 4, 2020 at 12:09 PM Tetsuo Handa
<penguin-kernel at i-love.sakura.ne.jp> wrote:
>
> On 2018/12/20 3:42, Dmitry Vyukov wrote:
> > On Wed, Dec 19, 2018 at 11:13 AM Tetsuo Handa
> > <penguin-kernel at i-love.sakura.ne.jp> wrote:
> >>
> >> On 2018/12/19 18:27, syzbot wrote:
> >>> HEAD commit:    ddfbab46539f Merge tag 'scsi-fixes' of git://git.kernel.or..
> >>> git tree:       upstream
> >>> console output: https://syzkaller.appspot.com/x/log.txt?x=15b87fa3400000
> >>> kernel config:  https://syzkaller.appspot.com/x/.config?x=861a3573f4e78ba1
> >>> dashboard link: https://syzkaller.appspot.com/bug?extid=bcad772bbc241b4c6147
> >>> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> >>> syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=13912ccd400000
> >>> C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=145781db400000
> >>
> >> This is not a LSM problem, for the reproducer is calling
> >> sched_setattr(SCHED_DEADLINE) with very large values.
> >>
> >>   sched_setattr(0, {size=0, sched_policy=0x6 /* SCHED_??? */, sched_flags=0, sched_nice=0, sched_priority=0, sched_runtime=2251799813724439, sched_deadline=4611686018427453437, sched_period=0}, 0) = 0
> >>
> >> I think that this problem is nothing but an insane sched_setattr() parameter.
> >>
> >> #syz invalid
> >
> > Note there was another one with sched_setattr, which turned out to be
> > some serious problem in kernel (sched_setattr should not cause CPU
> > stall for 3 minutes):
> > INFO: rcu detected stall in do_idle
> > https://syzkaller.appspot.com/bug?extid=385468161961cee80c31
> > https://groups.google.com/forum/#!msg/syzkaller-bugs/crrfvusGtwI/IoD_zus4BgAJ
> >
> > Maybe it another incarnation of the same bug, that one is still not fixed.
> >
>
> Can we let syzbot blacklist sched_setattr() for now? There are many stall reports
> doing sched_setattr(SCHED_RR) which makes it difficult to find stall reports not
> using sched_setattr().

Hi Tetsuo,

If we start practice of disabling whole syscalls, I would really like
"for now" to be very well defined. When will it end? How will it
happen? Is the problem on the radar of relevant people? Will it stay
on somebody's radar until it's fixed? Normal practise of project
sheriffing is to file a P1 bug assigned to somebody when something
gets disabled. But I am not sure how we implement this for kernel.
Since the problem is there for a long time and we disable it without
defining any criteria, I afraid we disable it forever (then more bugs
will pile and re-enabling it will be painful). At the very least we
need to acknowledge that we stopping testing schedler for foreseeable
future and schedler maintainers need to be notified about this.
Blacklisting it and un-blacklisting will cause some churn. Was the bug
given at least some attention? Significant number of bugs are
relatively easy to fix and fixing it would solve all of the problems
in a much better way.