Looks like issue in handling active_nodes count in 4.19 kernel .

Ravi Kumar Siddojigari rsiddoji at codeaurora.org
Wed Dec 18 05:58:47 UTC 2019


Yes this is the first time that we are getting this stress tested done on v4.19 kernel . 
We had not tested this prior version of kernel though . Current proposed changes seems to really help and testing is still going on . 
As per the delta it looks  change  6b6bc620  seem to be missing in earlier version of kernel not sure if this was the cause. 

Br , 
Ravi.
-----Original Message-----
From: Stephen Smalley <sds at tycho.nsa.gov> 
Sent: Tuesday, December 17, 2019 9:54 PM
To: Ravi Kumar Siddojigari <rsiddoji at codeaurora.org>; selinux at vger.kernel.org
Cc: paul at paul-moore.com; linux-security-module at vger.kernel.org
Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .

On 12/17/19 10:52 AM, Stephen Smalley wrote:
> On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote:
>> Yes  indeed this is a stress test on ARM64 device with multicore 
>> where most of the cores /tasks are stuck  in avc_reclaim_node .
>> We still see this issue even after picking the earlier patch " 
>> selinux: ensure we cleanup the internal AVC counters on error in
>> avc_insert() commit: d8db60cb23e4"
>> Where selinux_state  during issue was as below where all the slots 
>> are  NULL and the count was more than threshold.
>> Which seem to be calling avc_reclaim_node always and as the all the 
>> slots are empty its going for full for- loop with locks and unlock 
>> and taking too long .
>> Not sure what could make the  slots null , for sure its not due to
>> flush() /Reset(). We think that still we need to call  avc_kill_node 
>> in update_node function .
>> Adding the patch below can you please review or correct the following 
>> patch .
>>
>>
>>    selinux_state = (
>>      disabled = FALSE,
>>      enforcing = TRUE,
>>      checkreqprot = FALSE,
>>      initialized = TRUE,
>>      policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
>>      avc = 0xFFFFFF9BEFF1E890 -> (
>>        avc_cache_threshold = 512,  /* <<<<<not configured and its 
>> with default*/
>>        avc_cache = (
>>          slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first 
>> = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0),
>> (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first   
>> /*<<<< all are NULL */
>>          slots_lock = ((rlock = (raw_lock = (val = (counter = 0), 
>> locked = 0, pending = 0, locked_pending = 0, tail = 0), magic = 
>> 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF, 
>> dep_map = (key = 0xFFFFFF9BEFF298A8, cla
>>          lru_hint = (counter = 616831529),
>>          active_nodes = (counter = 547),   /*<<<<< increased more 
>> than 512*/
>>          latest_notif = 1)),
>>      ss = 0xFFFFFF9BEFF2E578)
>>
>>
>> --
>> In AVC update we don't call avc_node_kill() when 
>> avc_xperms_populate() fails, resulting in the 
>> avc->avc_cache.active_nodes counter having a false value.In last patch this changes was missed , so correcting it.
>>
>> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
>> Signed-off-by: Jaihind Yadav<jaihindyadav at codeaurora.org>
>> ---
>>   security/selinux/avc.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/security/selinux/avc.c b/security/selinux/avc.c index 
>> 91d24c2..3d1cff2 100644
>> --- a/security/selinux/avc.c
>> +++ b/security/selinux/avc.c
>> @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc 
>> *avc,
>>          if (orig->ae.xp_node) {
>>                  rc = avc_xperms_populate(node, orig->ae.xp_node);
>>                  if (rc) {
>> -                       kmem_cache_free(avc_node_cachep, node);
>> +                       avc_node_kill(avc, node);
>>                          goto out_unlock;
>>                  }
>>          }
>> --
> 
> That looks correct to me; I guess that one got missed by the prior fix.
> Still not sure how your AVC got into that state though...
> 
> Acked-by: Stephen Smalley <sds at tycho.nsa.gov>

BTW, have you been running these stress tests on earlier kernels too? 
If so, what version(s) are known to pass them?  I ask because this code has been present since v4.3 and this is the first such report.




More information about the Linux-security-module-archive mailing list