Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
btrfs: fix the ordered extent hang after certain delalloc range failure
[BUG] Even with the recent double ordered extent freeing bugs fixed, there is still a low chance to cause hang ordered extents that will never be able to finish. This is especially problematic if the block size (4K) is smaller than page size (64K, aarch64), and can cause the following hang: BTRFS info (device dm-3): checking UUID tree BTRFS error (device dm-3): cow_file_range failed, root=446 inode=274 start=1523712 len=49152: -28 BTRFS error (device dm-3): run_delalloc_nocow failed, root=446 inode=274 start=1523712 len=49152: -28 BTRFS error (device dm-3): failed to run delalloc range, root=446 ino=274 folio=1507328 submit_bitmap=4-15 start=1523712 len=49152: -28 BTRFS error (device dm-3): cow_file_range failed, root=3274 inode=283 start=454656 len=4096: -28 BTRFS error (device dm-3): run_delalloc_nocow failed, root=3274 inode=283 start=454656 len=4096: -28 BTRFS error (device dm-3): failed to run delalloc range, root=3274 ino=283 folio=393216 submit_bitmap=15 start=454656 len=4096: -28 INFO: task kworker/u17:5:2948764 blocked for more than 122 seconds. Not tainted 6.13.0-rc1-custom+ torvalds#89 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u17:5 state:D stack:0 pid:2948764 tgid:2948764 ppid:2 flags:0x00000008 Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs] Call trace: __switch_to+0xf4/0x168 (T) __schedule+0x32c/0x7a8 schedule+0x54/0x140 schedule_timeout+0xe8/0x140 __wait_for_common+0xf4/0x228 wait_for_completion+0x28/0x40 btrfs_wait_ordered_extents+0x2d0/0x400 [btrfs] btrfs_wait_ordered_roots+0x148/0x238 [btrfs] shrink_delalloc+0x164/0x280 [btrfs] flush_space+0x288/0x328 [btrfs] btrfs_async_reclaim_metadata_space+0xc8/0x240 [btrfs] process_one_work+0x228/0x680 worker_thread+0x1bc/0x360 kthread+0x100/0x118 ret_from_fork+0x10/0x20 INFO: task fsstress:3706703 blocked for more than 122 seconds. [CAUSE] The above dmesg shows a case where run_delalloc_nocow() failed for a range inside the folio, not covering the whole folio. When run_delalloc_nocow() failed, we call btrfs_cleanup_ordered_extents() inside btrfs_run_delalloc_range() to finish involved ordered extents. But if btrfs_cleanup_ordered_extents() is passed with @locked_folio, and the range is inside the folio, btrfs_cleanup_ordered_extents() will not call btrfs_mark_ordered_io_finished() to finish the ordered extents, preventing the involved ordered extents from finishing. [FIX] Just do not pass @locked_folio into btrfs_cleanup_ordered_extents(). The @locked_folio parameter is only utilized to skip the btrfs_mark_ordered_io_finished(), which should never be skipped at all. Furthermore @locked_folio for btrfs_mark_ordered_io_finished() is only to clear the ordered flags for folios, to co-operate with the out-of-band dirty folio detection. In our error handling case, the folios will already have their ordered flags cleared inside btrfs_cleanup_ordered_extents(), so there is no need to pass a @locked_folio into it. Furthermore this is a long existing bug, and needs a backport, thus this patch is the minimal fix. There will be a later patch to remove the @locked_folio parameter completely. Fixes: d1051d6 ("btrfs: Fix error handling in btrfs_cleanup_ordered_extents") Signed-off-by: Qu Wenruo <wqu@suse.com>
- Loading branch information