Flush the hold queue fall into an infinite loop.

Fri Jan 14 22:35:18 UTC 2022

On Thu, Jan 13, 2022 at 8:22 PM cuigaosheng <cuigaosheng1 at huawei.com> wrote:
>
> I want to stop droping the logs into audit_hold_queue when the auditd is abnormal.it
> seems that this modification goes against the design intent of audit_hold_queue. its
> effect is similar to removing the audit_hold_queue.
>
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 2a38cbaf3ddb..a8091b1a6587 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -748,6 +748,7 @@ static int kauditd_send_queue(struct sock *sk, u32
> portid,
>                                          (*err_hook)(skb);
>                                  if (rc == -EAGAIN)
>                                          rc = 0;
> +                               audit_default = AUDIT_OFF;
>                                  /* continue to drain the queue */
>                                  continue;
>                          } else
> @@ -755,6 +756,7 @@ static int kauditd_send_queue(struct sock *sk, u32
> portid,
>                  } else {
>                          /* skb sent - drop the extra reference and
> continue */
>                          consume_skb(skb);
> +                       audit_default = audit_enabled;
>                          failed = 0;
>                  }
>          }

We can't toggle the audit_default setting like this, that isn't
acceptable upstream.  I believe I have a fix, but I need to finish the
testing before I can post it for further review.

> 在 2022/1/13 23:22, Paul Moore 写道:
> > On Thu, Jan 13, 2022 at 6:57 AM cuigaosheng <cuigaosheng1 at huawei.com> wrote:
> >> When we add "audit=1" to the cmdline, kauditd will take up 100%
> >> cpu resource.As follows:
> >>
> >> configurations:
> >> auditctl -b 64
> >> auditctl --backlog_wait_time 60000
> >> auditctl -r 0
> >> auditctl -w /root/aaa  -p wrx
> >> shell scripts：
> >> #!/bin/bash
> >> i=0
> >> while [ $i -le 66 ]
> >> do
> >>     touch /root/aaa
> >>     let i++
> >> done
> >> mandatory conditions:
> >>
> >> add "audit=1" to the cmdline, and kill -19 pid_number(for /sbin/auditd).
> >>
> >>   As long as we keep the audit_hold_queue non-empty, flush the hold queue will fall into
> >>   an infinite loop.
> >>
> >> 713 static int kauditd_send_queue(struct sock *sk, u32 portid,
> >>   714                               struct sk_buff_head *queue,
> >>   715                               unsigned int retry_limit,
> >>   716                               void (*skb_hook)(struct sk_buff *skb),
> >>   717                               void (*err_hook)(struct sk_buff *skb))
> >>   718 {
> >>   719         int rc = 0;
> >>   720         struct sk_buff *skb;
> >>   721         unsigned int failed = 0;
> >>   722
> >>   723         /* NOTE: kauditd_thread takes care of all our locking, we just use
> >>   724          *       the netlink info passed to us (e.g. sk and portid) */
> >>   725
> >>   726         while ((skb = skb_dequeue(queue))) {
> >>   727                 /* call the skb_hook for each skb we touch */
> >>   728                 if (skb_hook)
> >>   729                         (*skb_hook)(skb);
> >>   730
> >>   731                 /* can we send to anyone via unicast? */
> >>   732                 if (!sk) {
> >>   733                         if (err_hook)
> >>   734                                 (*err_hook)(skb);
> >>   735                         continue;
> >>   736                 }
> >>   737
> >>   738 retry:
> >>   739                 /* grab an extra skb reference in case of error */
> >>   740                 skb_get(skb);
> >>   741                 rc = netlink_unicast(sk, skb, portid, 0);
> >>   742                 if (rc < 0) {
> >>   743                         /* send failed - try a few times unless fatal error */
> >>   744                         if (++failed >= retry_limit ||
> >>   745                             rc == -ECONNREFUSED || rc == -EPERM) {
> >>   746                                 sk = NULL;
> >>   747                                 if (err_hook)
> >>   748                                         (*err_hook)(skb);
> >>   749                                 if (rc == -EAGAIN)
> >>   750                                         rc = 0;
> >>   751                                 /* continue to drain the queue */
> >>   752                                 continue;
> >>   753                         } else
> >>   754                                 goto retry;
> >>   755                 } else {
> >>   756                         /* skb sent - drop the extra reference and continue */
> >>   757                         consume_skb(skb);
> >>   758                         failed = 0;
> >>   759                 }
> >>   760         }
> >>   761
> >>   762         return (rc >= 0 ? 0 : rc);
> >>   763 }
> >>
> >> When kauditd attempt to flush the hold queue, the queue parameter is &audit_hold_queue,
> >> and if netlink_unicast(line 741 ) return -EAGAIN, sk will be NULL(line 746), so err_hook(kauditd_rehold_skb)
> >> will be call. Then continue, skb_dequeue(line 726) and err_hook(kauditd_rehold_skb,line 733) will
> >> fall into an infinite loop.
> >> I don't really understand the value of audit_hold_queue, can we remove it, or stop droping the logs
> >> into kauditd_rehold_skb when the auditd is abnormal?
> > Thanks Gaosheng for the bug report, I'm able to reproduce this and I'm
> > looking into it now.  I'll report back when I have a better idea of
> > the problem and a potential fix.
> >

-- 
paul moore
www.paul-moore.com