Looks like issue in handling active_nodes count in 4.19 kernel .

Stephen Smalley sds at tycho.nsa.gov
Wed Dec 18 13:53:38 UTC 2019


On 12/18/19 12:58 AM, Ravi Kumar Siddojigari wrote:
> Yes this is the first time that we are getting this stress tested done on v4.19 kernel .
> We had not tested this prior version of kernel though . Current proposed changes seems to really help and testing is still going on .
> As per the delta it looks  change  6b6bc620  seem to be missing in earlier version of kernel not sure if this was the cause.

6b6bc620 shouldn't have altered any behavior; it was purely an 
encapsulation of the data structures.  Both of the bugs you've 
identified were introduced by the xperms support in fa1aa143ac4a68. 
Maybe they were harder to trigger when the AVC was still using 
GFP_ATOMIC instead of GFP_NOWAIT, but they were bugs nonetheless.

> 
> Br ,
> Ravi.
> -----Original Message-----
> From: Stephen Smalley <sds at tycho.nsa.gov>
> Sent: Tuesday, December 17, 2019 9:54 PM
> To: Ravi Kumar Siddojigari <rsiddoji at codeaurora.org>; selinux at vger.kernel.org
> Cc: paul at paul-moore.com; linux-security-module at vger.kernel.org
> Subject: Re: Looks like issue in handling active_nodes count in 4.19 kernel .
> 
> On 12/17/19 10:52 AM, Stephen Smalley wrote:
>> On 12/17/19 10:40 AM, Ravi Kumar Siddojigari wrote:
>>> Yes  indeed this is a stress test on ARM64 device with multicore
>>> where most of the cores /tasks are stuck  in avc_reclaim_node .
>>> We still see this issue even after picking the earlier patch "
>>> selinux: ensure we cleanup the internal AVC counters on error in
>>> avc_insert() commit: d8db60cb23e4"
>>> Where selinux_state  during issue was as below where all the slots
>>> are  NULL and the count was more than threshold.
>>> Which seem to be calling avc_reclaim_node always and as the all the
>>> slots are empty its going for full for- loop with locks and unlock
>>> and taking too long .
>>> Not sure what could make the  slots null , for sure its not due to
>>> flush() /Reset(). We think that still we need to call  avc_kill_node
>>> in update_node function .
>>> Adding the patch below can you please review or correct the following
>>> patch .
>>>
>>>
>>>     selinux_state = (
>>>       disabled = FALSE,
>>>       enforcing = TRUE,
>>>       checkreqprot = FALSE,
>>>       initialized = TRUE,
>>>       policycap = (TRUE, TRUE, TRUE, FALSE, FALSE, TRUE),
>>>       avc = 0xFFFFFF9BEFF1E890 -> (
>>>         avc_cache_threshold = 512,  /* <<<<<not configured and its
>>> with default*/
>>>         avc_cache = (
>>>           slots = ((first = 0x0), (first = 0x0), (first = 0x0), (first
>>> = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0),
>>> (first = 0x0), (first = 0x0), (first = 0x0), (first = 0x0), (first
>>> /*<<<< all are NULL */
>>>           slots_lock = ((rlock = (raw_lock = (val = (counter = 0),
>>> locked = 0, pending = 0, locked_pending = 0, tail = 0), magic =
>>> 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFFFFFFFFFF,
>>> dep_map = (key = 0xFFFFFF9BEFF298A8, cla
>>>           lru_hint = (counter = 616831529),
>>>           active_nodes = (counter = 547),   /*<<<<< increased more
>>> than 512*/
>>>           latest_notif = 1)),
>>>       ss = 0xFFFFFF9BEFF2E578)
>>>
>>>
>>> --
>>> In AVC update we don't call avc_node_kill() when
>>> avc_xperms_populate() fails, resulting in the
>>> avc->avc_cache.active_nodes counter having a false value.In last patch this changes was missed , so correcting it.
>>>
>>> Change-Id: Ic0298162cc766c0f21be7ab232e259766654dad3
>>> Signed-off-by: Jaihind Yadav<jaihindyadav at codeaurora.org>
>>> ---
>>>    security/selinux/avc.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/security/selinux/avc.c b/security/selinux/avc.c index
>>> 91d24c2..3d1cff2 100644
>>> --- a/security/selinux/avc.c
>>> +++ b/security/selinux/avc.c
>>> @@ -913,7 +913,7 @@ static int avc_update_node(struct selinux_avc
>>> *avc,
>>>           if (orig->ae.xp_node) {
>>>                   rc = avc_xperms_populate(node, orig->ae.xp_node);
>>>                   if (rc) {
>>> -                       kmem_cache_free(avc_node_cachep, node);
>>> +                       avc_node_kill(avc, node);
>>>                           goto out_unlock;
>>>                   }
>>>           }
>>> --
>>
>> That looks correct to me; I guess that one got missed by the prior fix.
>> Still not sure how your AVC got into that state though...
>>
>> Acked-by: Stephen Smalley <sds at tycho.nsa.gov>
> 
> BTW, have you been running these stress tests on earlier kernels too?
> If so, what version(s) are known to pass them?  I ask because this code has been present since v4.3 and this is the first such report.
> 



More information about the Linux-security-module-archive mailing list