[PATCH v17] exec: Fix dead-lock in de_thread with ptrace_attach

Bernd Edlinger bernd.edlinger at hotmail.de
Tue Nov 11 11:07:39 UTC 2025


On 11/11/25 10:21, Christian Brauner wrote:
> On Wed, Nov 05, 2025 at 03:32:10PM +0100, Oleg Nesterov wrote:
>> I am still thinking about another approach, will write another email.
>> But let me take a closer look at your patch.
>>
>> First of all, can you split it? See below.
>>
>> On 08/21, Bernd Edlinger wrote:
>>>
>>> -static int de_thread(struct task_struct *tsk)
>>> +static int de_thread(struct task_struct *tsk, struct linux_binprm *bprm)
>>>  {
>>>  	struct signal_struct *sig = tsk->signal;
>>>  	struct sighand_struct *oldsighand = tsk->sighand;
>>>  	spinlock_t *lock = &oldsighand->siglock;
>>> +	struct task_struct *t;
>>> +	bool unsafe_execve_in_progress = false;
>>>
>>>  	if (thread_group_empty(tsk))
>>>  		goto no_thread_group;
>>> @@ -932,6 +934,19 @@ static int de_thread(struct task_struct *tsk)
>>>  	if (!thread_group_leader(tsk))
>>>  		sig->notify_count--;
>>>
>>> +	for_other_threads(tsk, t) {
>>> +		if (unlikely(t->ptrace)
>>> +		    && (t != tsk->group_leader || !t->exit_state))
>>> +			unsafe_execve_in_progress = true;
>>
>> you can add "break" into the "if ()" block...
>>
>> But this is minor. Why do we need "bool unsafe_execve_in_progress" ?
>> If this patch is correct, de_thread() can drop/reacquire cred_guard_mutex
>> unconditionally.
>>
>> If you really think it makes sense, please make another patch with the
>> changelog.
>>
>> I'd certainly prefer to avoid this boolean at least for the start. If nothing
>> else to catch the potential problems earlier.
>>
>>> +	if (unlikely(unsafe_execve_in_progress)) {
>>> +		spin_unlock_irq(lock);
>>> +		sig->exec_bprm = bprm;
>>> +		mutex_unlock(&sig->cred_guard_mutex);
>>> +		spin_lock_irq(lock);
>>
>> I don't think spin_unlock_irq() + spin_lock_irq() makes any sense...
>>
>>> @@ -1114,13 +1139,31 @@ int begin_new_exec(struct linux_binprm * bprm)
>>>  	 */
>>>  	trace_sched_prepare_exec(current, bprm);
>>>
>>> +	/* If the binary is not readable then enforce mm->dumpable=0 */
>>> +	would_dump(bprm, bprm->file);
>>> +	if (bprm->have_execfd)
>>> +		would_dump(bprm, bprm->executable);
>>> +
>>> +	/*
>>> +	 * Figure out dumpability. Note that this checking only of current
>>> +	 * is wrong, but userspace depends on it. This should be testing
>>> +	 * bprm->secureexec instead.
>>> +	 */
>>> +	if (bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP ||
>>> +	    is_dumpability_changed(current_cred(), bprm->cred) ||
>>> +	    !(uid_eq(current_euid(), current_uid()) &&
>>> +	      gid_eq(current_egid(), current_gid())))
>>> +		set_dumpable(bprm->mm, suid_dumpable);
>>> +	else
>>> +		set_dumpable(bprm->mm, SUID_DUMP_USER);
>>> +
>>
>> OK, we need to do this before de_thread() drops cred_guard_mutex.
>> But imo this too should be done in a separate patch, the changelog should
>> explain this change.
>>
>>> @@ -1361,6 +1387,11 @@ static int prepare_bprm_creds(struct linux_binprm *bprm)
>>>  	if (mutex_lock_interruptible(&current->signal->cred_guard_mutex))
>>>  		return -ERESTARTNOINTR;
>>>
>>> +	if (unlikely(current->signal->exec_bprm)) {
>>> +		mutex_unlock(&current->signal->cred_guard_mutex);
>>> +		return -ERESTARTNOINTR;
>>> +	}
>>
>> OK, if signal->exec_bprm != NULL, then current is already killed. But
>> proc_pid_attr_write() and ptrace_traceme() do the same. So how about
>> something like
>>
>> 	int lock_current_cgm(void)
>> 	{
>> 		if (mutex_lock_interruptible(&current->signal->cred_guard_mutex))
>> 			return -ERESTARTNOINTR;
>>
>> 		if (!current->signal->group_exec_task)
>> 			return 0;
>>
>> 		WARN_ON(!fatal_signal_pending(current));
>> 		mutex_unlock(&current->signal->cred_guard_mutex);
>> 		return -ERESTARTNOINTR;
>> 	}
>>
>> ?
>>
>> Note that it checks ->group_exec_task, not ->exec_bprm. So this change can
>> come in a separate patch too, but I won't insist.
>>
>>> @@ -453,6 +454,28 @@ static int ptrace_attach(struct task_struct *task, long request,
>>>  				return retval;
>>>  		}
>>>
>>> +		if (unlikely(task == task->signal->group_exec_task)) {
>>> +			retval = down_write_killable(&task->signal->exec_update_lock);
>>> +			if (retval)
>>> +				return retval;
>>> +
>>> +			scoped_guard (task_lock, task) {
>>> +				struct linux_binprm *bprm = task->signal->exec_bprm;
>>> +				const struct cred __rcu *old_cred = task->real_cred;
>>> +				struct mm_struct *old_mm = task->mm;
>>> +
>>> +				rcu_assign_pointer(task->real_cred, bprm->cred);
>>> +				task->mm = bprm->mm;
>>> +				retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
>>> +				rcu_assign_pointer(task->real_cred, old_cred);
>>> +				task->mm = old_mm;
>>> +			}
>>
>> This is the most problematic change which I can't review...
>>
>> Firstly, it changes task->mm/real_cred for __ptrace_may_access() and this
>> looks dangerous to me.
> 
> Yeah, that is not ok. This is effectively override_creds for real_cred
> and that is not a pattern I want to see us establish at all! Temporary
> credential overrides for the subjective credentials is already terrible
> but at least we have the explicit split between real_cred and cred
> expressely for that. So no, that's not an acceptable solution.
> 

Well when this is absolutely not acceptable then I would have to change
all security engines to be aware of the current and the new credentials.
That may be as well be possible but would be a rather big change.
Of course that was only meant as a big exception, and somehow safe
as long as it is protected under the right mutexes: cred_guard_mutex,
exec_update_lock and task_lock at least.

>>
>> Say, current_is_single_threaded() called by another CLONE_VM process can
>> miss group_exec_task and falsely return true. Probably not that bad, in
>> this case old_mm should go away soon, but still...
>>
>> And I don't know if this can fool the users of task_cred_xxx/__task_cred
>> somehow.
>>
>> Or. check_unsafe_exec() sets LSM_UNSAFE_PTRACE if ptrace. Is it safe to
>> ptrace the execing task after that? I have no idea what the security hooks
>> can do...
>>
>> Again, can't review this part.
>>

Never mind, your review was really helpful.  At the very least it pointed
out some places where better comments are needed.

Thanks
Bernd.

>> Oleg.
>>




More information about the Linux-security-module-archive mailing list