Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in spa_history_log_sync while running ZFS test suite, at the point of running L2ARC tests. #11051

Closed
prakashsurya opened this issue Oct 12, 2020 · 3 comments
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@prakashsurya
Copy link
Member

prakashsurya commented Oct 12, 2020

System information

Type Version/Name
Distribution Name Delphix (Ubuntu derivative)
Distribution Version n/a
Linux Kernel
Architecture x86_64
ZFS Version 61868bb
SPL Version 61868bb

Describe the problem you're observing

Panic message from dmesg:

[ 6125.734831]: VERIFY3(tx->tx_txg <= spa_final_dirty_txg(os->os_spa)) failed (39 <= 38)$fRB
[ 6125.736060]: PANIC at dbuf.c:2068:dbuf_dirty()
[ 6125.736754]: Kernel panic - not syncing: VERIFY3(tx->tx_txg <= spa_final_dirty_txg(os->os_spa)) failed (39 <= 38)OuB
[ 6125.738348]: CPU: 1 PID: 24103 Comm: txg_sync Kdump: loaded Tainted: P           OE     5.4.0-42-generic #46~18.04.1-Ubuntu
[ 6125.740067]: Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
[ 6125.741731]: Call Trace:
[ 6125.742138]:  dump_stack+0x6d/0x95
[ 6125.742674]:  panic+0xfe/0x2e4
[ 6125.743170]:  spl_panic+0xce/0xfb [spl]
[ 6125.743832]:  ? gethrtime+0x20/0x50 [zfs]QB
[ 6125.744496]:  ? dmu_zfetch+0x3dd/0x4a0 [zfs]
[ 6125.745191]:  ? __raw_spin_unlock+0x9/0x10 [zfs]
[ 6125.745968]:  ? atomic_dec_return+0x9/0x10 [zfs]
[ 6125.746746]:  ? zrl_remove+0x12/0x20 [zfs]
[ 6125.747423]:  ? dbuf_read+0x291/0x5a0 [zfs]
[ 6125.748132]:  ? atomic_dec_return+0x9/0x10 [zfs]
[ 6125.748883]:  ? list_is_empty+0xd/0x10 [zfs]
[ 6125.749581]:  dbuf_dirty+0x408/0x6e0 [zfs]
[ 6125.750253]:  dmu_buf_will_dirty_impl+0xaf/0x110 [zfs]
[ 6125.751523]:  dmu_buf_will_dirty+0x16/0x20 [zfs]
[ 6125.752729]:  spa_history_log_sync+0xd5/0x4a0 [zfs]
[ 6125.753934]:  ? percpu_counter_add+0xf/0x20 [spl]uC
[ 6125.755155]:  ? do_raw_spin_unlock+0x9/0x10 [zfs]C
[ 6125.756348]:  dsl_sync_task_sync+0xa9/0xf0 [zfs]
[ 6125.757531]:  dsl_pool_sync+0x340/0x420 [zfs]duC
[ 6125.758674]:  spa_sync_iterate_to_convergence+0xd5/0x1e0 [zfs]
[ 6125.760035]:  spa_sync+0x32e/0x600 [zfs]
[ 6125.761102]:  txg_sync_thread+0x273/0x300 [zfs]
[ 6125.762259]:  ? txg_dispatch_callbacks+0x100/0x100 [zfs]
[ 6125.763486]:  thread_generic_wrapper+0x74/0x90 [spl]
[ 6125.764656]:  kthread+0x121/0x140(D
[ 6125.765575]:  ? __thread_exit+0x20/0x20 [spl]6%D
[ 6125.766653]:  ? kthread_park+0x90/0x90

Last line output from ZFS test suite:

21:14:38  Test: /usr/share/zfs/zfs-tests/tests/functional/l2arc/setup (run as root) [00:00] [PASS]

Panicked thread:

sdb> stacks -c panic
TASK_STRUCT        STATE             COUNT
==========================================
0xffff990fc4331740 RUNNING               1
                  __crash_kexec+0x9d
                  panic+0x11a
                  spl_panic+0xce
                  dbuf_dirty+0x408
                  dmu_buf_will_dirty_impl+0xaf
                  dmu_buf_will_dirty+0x16
                  spa_history_log_sync+0xd5
                  dsl_sync_task_sync+0xa9
                  dsl_pool_sync+0x340
                  spa_sync_iterate_to_convergence+0xd5
                  spa_sync+0x32e
                  txg_sync_thread+0x273
                  thread_generic_wrapper+0x74
                  kthread+0x121
                  ret_from_fork+0x1f

Running threads:

sdb> stacks -t running
TASK_STRUCT        STATE             COUNT
==========================================
0xffff9910e6e19740 RUNNING               1
                  __schedule+0x293
                  schedule+0x33
                  schedule_timeout+0x15d
                  rcu_gp_kthread+0x5c8
                  kthread+0x121
                  ret_from_fork+0x1f

0xffff990fc4335d00 RUNNING               1
                  __schedule+0x293
                  schedule+0x33
                  cv_wait_common+0x125
                  __cv_wait_idle+0x4f
                  txg_thread_wait.constprop.0+0x31
                  txg_quiesce_thread+0x146
                  thread_generic_wrapper+0x74
                  kthread+0x121
                  ret_from_fork+0x1f

0xffff990fc4331740 RUNNING               1
                  __crash_kexec+0x9d
                  panic+0x11a
                  spl_panic+0xce
                  dbuf_dirty+0x408
                  dmu_buf_will_dirty_impl+0xaf
                  dmu_buf_will_dirty+0x16
                  spa_history_log_sync+0xd5
                  dsl_sync_task_sync+0xa9
                  dsl_pool_sync+0x340
                  spa_sync_iterate_to_convergence+0xd5
                  spa_sync+0x32e
                  txg_sync_thread+0x273
                  thread_generic_wrapper+0x74
                  kthread+0x121
                  ret_from_fork+0x1f

Uninterruptible threads:

sdb> stacks -t uninterruptible
TASK_STRUCT        STATE             COUNT
==========================================
0xffff9910d916ae80 UNINTERRUPTIBLE       1
                  __schedule+0x293
                  schedule+0x33
                  schedule_preempt_disabled+0xe
                  __mutex_lock.isra.9+0x26d
                  __mutex_lock_slowpath+0x13
                  mutex_lock+0x2f
                  l2arc_dev_get_next+0x1a
                  l2arc_feed_thread+0xd9
                  thread_generic_wrapper+0x74
                  kthread+0x121
                  ret_from_fork+0x1f

0xffff991055628000 UNINTERRUPTIBLE       1
                  __schedule+0x293
                  schedule+0x33
                  io_schedule+0x16
                  cv_wait_common+0xd2
                  __cv_wait_io+0x18
                  txg_wait_synced_impl+0xbc
                  txg_wait_synced+0x10
                  vdev_initialize_stop_all+0x8e
                  spa_unload+0x6a
                  spa_export_common+0x18f
                  spa_export+0x1b
                  zfs_ioc_pool_export+0x34
                  zfsdev_ioctl_common+0x568
                  zfsdev_ioctl+0x54
                  do_vfs_ioctl+0xa9
                  ksys_ioctl+0x75
                  __x64_sys_ioctl+0x1a
                  do_syscall_64+0x57
                  entry_SYSCALL_64+0x7c

ZFS debug messages:

sdb> zfs_dbgmsg | tail | zfs_dbgmsg
metaslab.c:2432:metaslab_load_impl(): metaslab_load: txg 4, spa testpool, vdev_id 0, ms_id 13, smp_length 0, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 6111513 ms, loading_time 0 ms, ms_max_size 67108864, max size error 67108864, old_weight 680000000000001, new_weight 680000000000001
metaslab.c:2432:metaslab_load_impl(): metaslab_load: txg 4, spa testpool, vdev_id 0, ms_id 14, smp_length 0, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 6111513 ms, loading_time 0 ms, ms_max_size 67108864, max size error 67108864, old_weight 680000000000001, new_weight 680000000000001
spa_history.c:296:spa_history_log_sync(): command: zpool create -f testpool /var/tmp/testdir/disk.l2arc/a cache /var/tmp/testdir/disk.l2arc/b
metaslab.c:2432:metaslab_load_impl(): metaslab_load: txg 2034, spa rpool, vdev_id 0, ms_id 40, smp_length 4192, unflushed_allocs 0, unflushed_frees 0, freed 0, defer 0 + 0, unloaded time 898608 ms, loading_time 53 ms, ms_max_size 445755392, max size error 0, old_weight 100000002d9bf04a, new_weight 100000002d9bf04a
spa.c:8202:spa_async_request(): spa=testpool async request task=2048
spa.c:8202:spa_async_request(): spa=testpool async request task=4096
spa_history.c:296:spa_history_log_sync(): command: zpool offline testpool /var/tmp/testdir/disk.l2arc/b
spa_history.c:296:spa_history_log_sync(): command: zpool online testpool /var/tmp/testdir/disk.l2arc/b
spa_history.c:296:spa_history_log_sync(): command: zpool export testpool
spa_misc.c:411:spa_load_note(): spa_load(testpool, config trusted): UNLOADING
@prakashsurya prakashsurya added Type: Defect Incorrect behavior (e.g. crash, hang) Status: Triage Needed New issue which needs to be triaged labels Oct 12, 2020
@behlendorf behlendorf removed the Status: Triage Needed New issue which needs to be triaged label Oct 13, 2020
@lundman
Copy link
Contributor

lundman commented Apr 24, 2021

functional/cli_root/zfs_create/zfs_create_013_pos (run as root) [00:29] [PASS]

panic(cpu 0 caller 0xffffff7f96d2e163): VERIFY3(tx->tx_txg <= spa_final_dirty_txg(os->os_spa)) failed (3052 <= 3049)

0xffffff8882b43870 : 0xffffff7f96d2e163 net.lundman.zfs : _spl_panic + 0x73
0xffffff8882b439e0 : 0xffffff7f96aa94bb net.lundman.zfs : _dbuf_dirty + 0x1cb
0xffffff8882b43a60 : 0xffffff7f96b6ec28 net.lundman.zfs : _zap_lockdir_impl + 0x448
0xffffff8882b43b30 : 0xffffff7f96b70996 net.lundman.zfs : _zap_update + 0x76
0xffffff8882b43ba0 : 0xffffff7f96b773ee net.lundman.zfs : _feature_sync + 0x4e
0xffffff8882b43be0 : 0xffffff7f96b3c761 net.lundman.zfs : _space_map_alloc + 0x41
0xffffff8882b43c20 : 0xffffff7f96b334e8 net.lundman.zfs : _spa_generate_syncing_log_sm + 0x138
0xffffff8882b43c70 : 0xffffff7f96b33145 net.lundman.zfs : _spa_flush_metaslabs + 0xa5
0xffffff8882b43cd0 : 0xffffff7f96b2b1d1 net.lundman.zfs : _spa_sync + 0xdb1
0xffffff8882b43f00 : 0xffffff7f96b3d9d3 net.lundman.zfs : _txg_sync_thread + 0x243

Possibly related.. happens quite randomly, so not easy to replicate here, but could be macOS specific due to messages near-by

zvolRemoveDeviceTerminate
zvolRemoveDeviceTerminate
zvolRemoveDeviceTerminate
testpool[0x1000015c0]::terminate(kIOServiceSynchronous): THREAD_TIMED_OUT. Attempting to auto-resolve your deadlock. PLEASE FIX!
spa_misc.c:413:spa_load_note(): spa_load(testpool, config trusted): UNLOADING
VERIFY3(tx->tx_txg <= spa_final_dirty_txg(os->os_spa)) failed (3052 <= 3049)

@stale
Copy link

stale bot commented Apr 24, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Apr 24, 2022
@behlendorf
Copy link
Contributor

I believe this was resolved by #13048.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants