[PATCH 10/11] fs: fix hungtask due to repeated traversal of inodes list

Wed Dec 4 11:17:49 UTC 2024

On Mon, Nov 18, 2024 at 07:45:07PM +0800, Ye Bin wrote:
> From: Ye Bin <yebin10 at huawei.com>
> 
> There's a issue when remove scsi disk, the invalidate_inodes() function
> cannot exit for a long time, then trigger hungtask:
> INFO: task kworker/56:0:1391396 blocked for more than 122 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Workqueue: events_freezable virtscsi_handle_event [virtio_scsi]
> Call Trace:
>  __schedule+0x33c/0x7f0
>  schedule+0x46/0xb0
>  schedule_preempt_disabled+0xa/0x10
>  __mutex_lock.constprop.0+0x22b/0x490
>  mutex_lock+0x52/0x70
>  scsi_scan_target+0x6d/0xf0
>  virtscsi_handle_event+0x152/0x1a0 [virtio_scsi]
>  process_one_work+0x1b2/0x350
>  worker_thread+0x49/0x310
>  kthread+0xfb/0x140
>  ret_from_fork+0x1f/0x30
> 
> PID: 540499  TASK: ffff9b15e504c080  CPU: 44  COMMAND: "kworker/44:0"
> Call trace:
>  invalidate_inodes at ffffffff8f3b4784
>  __invalidate_device at ffffffff8f3dfea3
>  invalidate_partition at ffffffff8f526b49
>  del_gendisk at ffffffff8f5280fb
>  sd_remove at ffffffffc0186455 [sd_mod]
>  __device_release_driver at ffffffff8f738ab2
>  device_release_driver at ffffffff8f738bc4
>  bus_remove_device at ffffffff8f737f66
>  device_del at ffffffff8f73341b
>  __scsi_remove_device at ffffffff8f780340
>  scsi_remove_device at ffffffff8f7803a2
>  virtscsi_handle_event at ffffffffc017204f [virtio_scsi]
>  process_one_work at ffffffff8f1041f2
>  worker_thread at ffffffff8f104789
>  kthread at ffffffff8f109abb
>  ret_from_fork at ffffffff8f001d6f
> 
> As commit 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> introduces the retry logic. In the problem environment, the 'i_count'
> of millions of files is not zero. As a result, the time slice for each
> traversal to the matching inode process is almost used up, and then the
> traversal is started from scratch. The worst-case scenario is that only
> one inode can be processed after each wakeup. Because this process holds
> a lock, other processes will be stuck for a long time, causing a series
> of problems.
> To solve the problem of repeated traversal from the beginning, each time
> the CPU needs to be freed, a cursor is inserted into the linked list, and
> the traversal continues from the cursor next time.
> 
> Fixes: 04646aebd30b ("fs: avoid softlockups in s_inodes iterators")
> Signed-off-by: Ye Bin <yebin10 at huawei.com>
> ---
>  fs/inode.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index dc966990bda6..b78895af8779 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -857,11 +857,16 @@ static void dispose_list(struct list_head *head)
>  void evict_inodes(struct super_block *sb)
>  {
>  	struct inode *inode, *next;
> +	struct inode cursor;

It seems pretty adventurous to me to just add in a random inode whose
only fiels that is initialized is i_state. That would need a proper
analysis and argument that this is safe to do and won't cause trouble
for any filesystem.

Jan, do you have thoughts on this?

>  	LIST_HEAD(dispose);
>  
> +	cursor.i_state = I_CURSOR;
> +	INIT_LIST_HEAD(&cursor.i_sb_list);
> +	inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
> +
>  again:
>  	spin_lock(&sb->s_inode_list_lock);
> -	sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
> +	sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
>  		if (atomic_read(&inode->i_count))
>  			continue;
>  
> @@ -886,12 +891,16 @@ void evict_inodes(struct super_block *sb)
>  		 * bit so we don't livelock.
>  		 */
>  		if (need_resched()) {
> +			list_del(&cursor.i_sb_list);
> +			list_add(&cursor.i_sb_list, &inode->i_sb_list);
> +			inode = &cursor;
>  			spin_unlock(&sb->s_inode_list_lock);
>  			cond_resched();
>  			dispose_list(&dispose);
>  			goto again;
>  		}
>  	}
> +	list_del(&cursor.i_sb_list);
>  	spin_unlock(&sb->s_inode_list_lock);
>  
>  	dispose_list(&dispose);
> @@ -907,11 +916,16 @@ EXPORT_SYMBOL_GPL(evict_inodes);
>  void invalidate_inodes(struct super_block *sb)
>  {
>  	struct inode *inode, *next;
> +	struct inode cursor;
>  	LIST_HEAD(dispose);
>  
> +	cursor.i_state = I_CURSOR;
> +	INIT_LIST_HEAD(&cursor.i_sb_list);
> +	inode = list_entry(&sb->s_inodes, typeof(*inode), i_sb_list);
> +
>  again:
>  	spin_lock(&sb->s_inode_list_lock);
> -	sb_for_each_inodes_safe(inode, next, &sb->s_inodes) {
> +	sb_for_each_inodes_continue_safe(inode, next, &sb->s_inodes) {
>  		spin_lock(&inode->i_lock);
>  		if (inode->i_state & (I_NEW | I_FREEING | I_WILL_FREE)) {
>  			spin_unlock(&inode->i_lock);
> @@ -927,12 +941,16 @@ void invalidate_inodes(struct super_block *sb)
>  		spin_unlock(&inode->i_lock);
>  		list_add(&inode->i_lru, &dispose);
>  		if (need_resched()) {
> +			list_del(&cursor.i_sb_list);
> +			list_add(&cursor.i_sb_list, &inode->i_sb_list);
> +			inode = &cursor;
>  			spin_unlock(&sb->s_inode_list_lock);
>  			cond_resched();
>  			dispose_list(&dispose);
>  			goto again;
>  		}
>  	}
> +	list_del(&cursor.i_sb_list);
>  	spin_unlock(&sb->s_inode_list_lock);
>  
>  	dispose_list(&dispose);
> -- 
> 2.34.1
>