Linux 5.18-rc4

Mon Jun 6 19:19:36 UTC 2022

On 6/6/22 11:28, Linus Torvalds wrote:
> On Mon, Jun 6, 2022 at 8:19 AM Eric W. Biederman <ebiederm at xmission.com> wrote:
>> Has anyone looked into this lock ordering issues?
> 
> The deadlock is
> 
>>>> [78140.503821]        CPU0                    CPU1
>>>> [78140.503823]        ----                    ----
>>>> [78140.503824]   lock(&newf->file_lock);
>>>> [78140.503826]                                lock(&p->alloc_lock);
>>>> [78140.503828]                                lock(&newf->file_lock);
>>>> [78140.503830]   lock(&ctx->lock);
> 
> and the alloc_lock -> file_lock on CPU1 is trivial - it's seq_show()
> in fs/proc/fd.c:
> 
>         task_lock(task);
>         files = task->files;
>         if (files) {
>                 unsigned int fd = proc_fd(m->private);
> 
>                 spin_lock(&files->file_lock);
> 
> and that looks all normal.
> 
> But the other chains look painful.
> 
> I do see the IPC code doing ugly things, in particular I detest this code:
> 
>         task_lock(current);
>         list_add(&shp->shm_clist, &current->sysvshm.shm_clist);
>         task_unlock(current);
> 
> where it is using the task lock to protect the shm_clist list. Nasty.
> 
> And it's doing that inside the shm_ids.rwsem lock _and_ inside the
> shp->shm_perm.lock.
> 
> So the IPC code has newseg() doing
> 
>    shmget ->
>     ipcget():
>      down_write(ids->rwsem) ->
>        newseg():
>          ipc_addid gets perm->lock
>          task_lock(current)
> 
> so you have
> 
>   ids->rwsem -> perm->lock -> alloc_lock
> 
> there.
> 
> So now we have that
> 
>    ids->rwsem -> ipcperm->lock -> alloc_lock -> file_lock
> 
> when you put those sequences together.
> 
> But I didn't figure out what the security subsystem angle is and how
> that then apparently mixes things up with execve.
> 
> Yes, newseg() is doing that
> 
>         error = security_shm_alloc(&shp->shm_perm);
> 
> while holding rwsem, but I can't see how that matters. From the
> lockdep output, rwsem doesn't actually seem to be part of the whole
> sequence.
> 
> It *looks* like we have
> 
>    apparmour ctx->lock -->
>       radix_tree_preloads.lock -->
>          ipcperm->lock
> 
> and apparently that's called under the file_lock somewhere, completing
> the circle.
> 
> I guess the execve component is that
> 
>   begin_new_exec ->
>     security_bprm_committing_creds ->
>       apparmor_bprm_committing_creds ->
>         aa_inherit_files ->
>           iterate_fd ->   *takes file_lock*
>             match_file ->
>               aa_file_perm ->
>                 update_file_ctx *takes ctx->lock*
> 
> so that's how you get file_lock -> ctx->lock.
> 
yes

> So you have:
> 
>  SHMGET:
>     ipcperm->lock -> alloc_lock
>  /proc:
>     alloc_lock -> file_lock
>  apparmor_bprm_committing_creds:
>     file_lock -> ctx->lock
> 
> and then all you need is ctx->lock -> ipcperm->lock but I didn't find that part.
> 
yeah that is the part I got stuck on, before being pulled away from this

> I suspect that part is that both Apparmor and IPC use the idr local lock.
> 
bingo,

apparmor moved its secids allocation from a custom radix tree to idr in

  99cc45e48678 apparmor: Use an IDR to allocate apparmor secids

and ipc is using the idr for its id allocation as well

I can easily lift the secid() allocation out of the ctx->lock but that
would still leave it happening under the file_lock and not fix the problem.
I think the quick solution would be for apparmor to stop using idr, reverting
back at least temporarily to the custom radix tree.