Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PANIC: blkptr has invalid CHECKSUM 0 #12349

Closed
zrav opened this issue Jul 11, 2021 · 4 comments
Closed

PANIC: blkptr has invalid CHECKSUM 0 #12349

zrav opened this issue Jul 11, 2021 · 4 comments
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@zrav
Copy link

zrav commented Jul 11, 2021

I have seen a few stale issues with similar title, however my stack traces look somewhat different, so I'm opening a new issue. I'm trying to figure out if I have a hardware problem or if my pool is actually borked, possibly caused by a bug.

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 20.04, 21.04
Linux Kernel 5.8.0-55-generic, 5.11.0-22-generic
Architecture AMD64
ZFS Version tracking master, 2.0.2-1ubuntu5, 2.1.0
SPL Version tracking master, 2.0.2-1ubuntu5, 2.1.0

Describe the problem you're observing

The system functions purely as a backup receiving sends and deleting old snapshots. After 1-10 days of running, the system panics while receiving with strack trace variations of "blkptr has invalid CHECKSUM 0". The pool then hangs indefinitely, requiring a reboot. Subsequent scrubs comes back clean.

Describe how to reproduce the problem

I can reproduce by leaving the system receiving backups for a few days. I don't know how to reproduce on another system.

Include any warning/errors/backtraces from the system logs

  • The first two traces are from Ubuntu 20.04 with zfs compiled from master at the time
  • The next three are from Ubuntu 21.04 with the provided packages for zfs 2.0.2
  • The last one is from Ubuntu 21.04 with packages compiled from the 2.1.0 tag to test if the changes from Avoid deadlock when removing L2ARC devices under I/O #12054 have any effect.
Jun 20 01:52:37 ubackup kernel: [482903.384554] PANIC: tank: blkptr at 00000000dbbffcf1 has invalid CHECKSUM 0
Jun 20 01:52:37 ubackup kernel: [482903.384572] Showing stack for process 3425
Jun 20 01:52:37 ubackup kernel: [482903.384580] CPU: 1 PID: 3425 Comm: z_wr_iss Tainted: P           OE     5.8.0-55-generic #62~20.04.1-Ubuntu
Jun 20 01:52:37 ubackup kernel: [482903.384582] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD3/GA-990FXA-UD3, BIOS FB 10/13/2011
Jun 20 01:52:37 ubackup kernel: [482903.384584] Call Trace:
Jun 20 01:52:37 ubackup kernel: [482903.384598]  dump_stack+0x74/0x92
Jun 20 01:52:37 ubackup kernel: [482903.384625]  spl_dumpstack+0x29/0x2b [spl]
Jun 20 01:52:37 ubackup kernel: [482903.384645]  vcmn_err.cold+0x60/0x94 [spl]
Jun 20 01:52:37 ubackup kernel: [482903.384654]  ? __blk_mq_sched_dispatch_requests+0x10e/0x170
Jun 20 01:52:37 ubackup kernel: [482903.384662]  ? __blk_mq_run_hw_queue+0x5a/0x110
Jun 20 01:52:37 ubackup kernel: [482903.384669]  ? ptr_to_id+0xbe/0x220
Jun 20 01:52:37 ubackup kernel: [482903.384916]  zfs_panic_recover+0x6f/0x90 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.385130]  zfs_blkptr_verify_log+0x94/0x100 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.385342]  ? vdev_disk_io_start+0x49f/0x8e0 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.385351]  ? _cond_resched+0x19/0x30
Jun 20 01:52:37 ubackup kernel: [482903.385356]  ? mutex_lock+0x13/0x40
Jun 20 01:52:37 ubackup kernel: [482903.385568]  ? zio_wait_for_children+0x8e/0xd0 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.385780]  zfs_blkptr_verify+0x3c9/0x480 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.385993]  zio_free+0x27/0x100 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.386191]  dsl_free+0x11/0x20 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.386380]  dsl_dataset_block_kill+0x4bf/0x4f0 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.386386]  ? down_write+0x13/0x50
Jun 20 01:52:37 ubackup kernel: [482903.386565]  dbuf_write_done+0x1b3/0x200 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.386740]  arc_write_done+0x8f/0x410 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.386953]  zio_done+0x407/0x1050 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.387166]  zio_execute+0x93/0xf0 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.387185]  taskq_thread+0x2fb/0x510 [spl]
Jun 20 01:52:37 ubackup kernel: [482903.387193]  ? wake_up_q+0xa0/0xa0
Jun 20 01:52:37 ubackup kernel: [482903.387406]  ? zio_taskq_member.isra.0.constprop.0+0x60/0x60 [zfs]
Jun 20 01:52:37 ubackup kernel: [482903.387413]  kthread+0x114/0x150
Jun 20 01:52:37 ubackup kernel: [482903.387431]  ? task_done+0xb0/0xb0 [spl]
Jun 20 01:52:37 ubackup kernel: [482903.387434]  ? kthread_park+0x90/0x90
Jun 20 01:52:37 ubackup kernel: [482903.387440]  ret_from_fork+0x22/0x30


Jun 25 01:43:58 ubackup kernel: [367708.395650] PANIC: tank: blkptr at 000000004322d8f5 has invalid CHECKSUM 0
Jun 25 01:43:58 ubackup kernel: [367708.395659] Showing stack for process 3381
Jun 25 01:43:58 ubackup kernel: [367708.395663] CPU: 1 PID: 3381 Comm: dp_sync_taskq Tainted: P           OE     5.8.0-55-generic #62~20.04.1-Ubuntu
Jun 25 01:43:58 ubackup kernel: [367708.395664] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD3/GA-990FXA-UD3, BIOS FB 10/13/2011
Jun 25 01:43:58 ubackup kernel: [367708.395664] Call Trace:
Jun 25 01:43:58 ubackup kernel: [367708.395672]  dump_stack+0x74/0x92
Jun 25 01:43:58 ubackup kernel: [367708.395684]  spl_dumpstack+0x29/0x2b [spl]
Jun 25 01:43:58 ubackup kernel: [367708.395691]  vcmn_err.cold+0x60/0x94 [spl]
Jun 25 01:43:58 ubackup kernel: [367708.395698]  ? spl_kmem_cache_alloc+0xa9/0x7d0 [spl]
Jun 25 01:43:58 ubackup kernel: [367708.395701]  ? ptr_to_id+0xbe/0x220
Jun 25 01:43:58 ubackup kernel: [367708.395826]  zfs_panic_recover+0x6f/0x90 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.395902]  zfs_blkptr_verify_log+0x94/0x100 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.395963]  ? arc_hdr_destroy+0x200/0x200 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.395967]  ? try_to_wake_up+0x66/0x540
Jun 25 01:43:58 ubackup kernel: [367708.396028]  ? dbuf_issue_final_prefetch+0xd0/0xd0 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396031]  ? default_wake_function+0x1a/0x30
Jun 25 01:43:58 ubackup kernel: [367708.396032]  ? __wake_up_common+0x7e/0x140
Jun 25 01:43:58 ubackup kernel: [367708.396034]  ? __wake_up_common_lock+0x8a/0xc0
Jun 25 01:43:58 ubackup kernel: [367708.396108]  zfs_blkptr_verify+0x3c9/0x480 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396182]  zio_free+0x27/0x100 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396251]  dsl_free+0x11/0x20 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396317]  dsl_dataset_block_kill+0x4bf/0x4f0 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396382]  free_blocks+0xea/0x1d0 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396456]  ? zio_nowait+0xc1/0x150 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396521]  dnode_sync_free_range+0x23c/0x270 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396583]  ? dbuf_sync_leaf+0x23b/0x500 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396649]  ? free_children+0x3d0/0x3d0 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396718]  range_tree_walk+0x118/0x1f0 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396784]  dnode_sync+0x2e8/0xa90 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396787]  ? __switch_to+0x157/0x450
Jun 25 01:43:58 ubackup kernel: [367708.396789]  ? _cond_resched+0x19/0x30
Jun 25 01:43:58 ubackup kernel: [367708.396791]  ? mutex_lock+0x13/0x40
Jun 25 01:43:58 ubackup kernel: [367708.396855]  sync_dnodes_task+0x79/0xb0 [zfs]
Jun 25 01:43:58 ubackup kernel: [367708.396861]  taskq_thread+0x2fb/0x510 [spl]
Jun 25 01:43:58 ubackup kernel: [367708.396864]  ? wake_up_q+0xa0/0xa0
Jun 25 01:43:58 ubackup kernel: [367708.396866]  kthread+0x114/0x150
Jun 25 01:43:58 ubackup kernel: [367708.396873]  ? task_done+0xb0/0xb0 [spl]
Jun 25 01:43:58 ubackup kernel: [367708.396874]  ? kthread_park+0x90/0x90
Jun 25 01:43:58 ubackup kernel: [367708.396876]  ret_from_fork+0x22/0x30


Jul  1 02:01:26 ubackup kernel: [443020.371564] PANIC: tank: blkptr at 000000002c351e10 has invalid CHECKSUM 0
Jul  1 02:01:26 ubackup kernel: [443020.371590] Showing stack for process 3346
Jul  1 02:01:26 ubackup kernel: [443020.371597] CPU: 1 PID: 3346 Comm: z_wr_iss Tainted: P           OE     5.11.0-22-generic #23~20.04.1-Ubuntu
Jul  1 02:01:26 ubackup kernel: [443020.371608] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD3/GA-990FXA-UD3, BIOS FB 10/13/2011
Jul  1 02:01:26 ubackup kernel: [443020.371613] Call Trace:
Jul  1 02:01:26 ubackup kernel: [443020.371622]  dump_stack+0x74/0x92
Jul  1 02:01:26 ubackup kernel: [443020.371640]  spl_dumpstack+0x29/0x2b [spl]
Jul  1 02:01:26 ubackup kernel: [443020.371683]  vcmn_err.cold+0x60/0x94 [spl]
Jul  1 02:01:26 ubackup kernel: [443020.371722]  ? ptr_to_id+0xbe/0x220
Jul  1 02:01:26 ubackup kernel: [443020.371739]  zfs_panic_recover+0x6f/0x90 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.372186]  zfs_blkptr_verify_log+0x94/0x100 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.372613]  ? vdev_disk_io_start+0x4a1/0x8e0 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.373033]  ? _cond_resched+0x19/0x30
Jul  1 02:01:26 ubackup kernel: [443020.373045]  ? mutex_lock+0x13/0x40
Jul  1 02:01:26 ubackup kernel: [443020.373052]  ? zio_wait_for_children+0x8e/0xd0 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.373476]  zfs_blkptr_verify+0x3c9/0x480 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.373902]  zio_free+0x27/0x100 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.374326]  dsl_free+0x11/0x20 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.374719]  dsl_dataset_block_kill+0x4bf/0x4f0 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.375097]  ? down_write+0x13/0x50
Jul  1 02:01:26 ubackup kernel: [443020.375106]  dbuf_write_done+0x1b3/0x200 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.375463]  arc_write_done+0x8f/0x410 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.375812]  zio_done+0x407/0x1050 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.376237]  zio_execute+0x93/0xf0 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.376661]  taskq_thread+0x2fb/0x510 [spl]
Jul  1 02:01:26 ubackup kernel: [443020.376698]  ? wake_up_q+0xa0/0xa0
Jul  1 02:01:26 ubackup kernel: [443020.376710]  ? zio_taskq_member.isra.0.constprop.0+0x60/0x60 [zfs]
Jul  1 02:01:26 ubackup kernel: [443020.377136]  kthread+0x114/0x150
Jul  1 02:01:26 ubackup kernel: [443020.377146]  ? task_done+0xb0/0xb0 [spl]
Jul  1 02:01:26 ubackup kernel: [443020.377181]  ? kthread_park+0x90/0x90
Jul  1 02:01:26 ubackup kernel: [443020.377190]  ret_from_fork+0x22/0x30


Jul  2 12:41:32 ubackup kernel: [ 7228.720883] PANIC: tank: blkptr at 0000000084e788bd has invalid CHECKSUM 0
Jul  2 12:41:32 ubackup kernel: [ 7228.720901] Showing stack for process 4218
Jul  2 12:41:32 ubackup kernel: [ 7228.720904] CPU: 1 PID: 4218 Comm: z_wr_iss Tainted: P           O      5.11.0-22-generic #23-Ubuntu
Jul  2 12:41:32 ubackup kernel: [ 7228.720907] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD3/GA-990FXA-UD3, BIOS FB 10/13/2011
Jul  2 12:41:32 ubackup kernel: [ 7228.720909] Call Trace:
Jul  2 12:41:32 ubackup kernel: [ 7228.720913]  show_stack+0x52/0x58
Jul  2 12:41:32 ubackup kernel: [ 7228.720918]  dump_stack+0x70/0x8b
Jul  2 12:41:32 ubackup kernel: [ 7228.720922]  spl_dumpstack+0x29/0x2b [spl]
Jul  2 12:41:32 ubackup kernel: [ 7228.720938]  vcmn_err.cold+0x60/0x94 [spl]
Jul  2 12:41:32 ubackup kernel: [ 7228.720951]  ? ptr_to_id+0xbd/0x270
Jul  2 12:41:32 ubackup kernel: [ 7228.720955]  ? pointer+0x19b/0x4d0
Jul  2 12:41:32 ubackup kernel: [ 7228.720958]  zfs_panic_recover+0x6d/0x90 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.721155]  zfs_blkptr_verify_log+0x94/0x100 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.721297]  ? spl_kmem_cache_alloc+0x3b/0x100 [spl]
Jul  2 12:41:32 ubackup kernel: [ 7228.721309]  ? _cond_resched+0x1a/0x50
Jul  2 12:41:32 ubackup kernel: [ 7228.721312]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.721452]  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.721592]  ? txg_all_lists_empty+0x62/0xb0 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.721731]  ? do_softirq_own_stack+0x3d/0x50
Jul  2 12:41:32 ubackup kernel: [ 7228.721735]  ? irq_exit_rcu+0x42/0xd0
Jul  2 12:41:32 ubackup kernel: [ 7228.721738]  ? common_interrupt+0x88/0x140
Jul  2 12:41:32 ubackup kernel: [ 7228.721740]  ? vdev_queue_max_async_writes+0x42/0xe0 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.721881]  ? spl_kmem_cache_alloc+0x3b/0x100 [spl]
Jul  2 12:41:32 ubackup kernel: [ 7228.721892]  zfs_blkptr_verify+0x359/0x470 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.722034]  zio_free+0x27/0x100 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.722175]  dsl_free+0x11/0x20 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.722309]  dsl_dataset_block_kill+0x45e/0x490 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.722439]  ? _cond_resched+0x1a/0x50
Jul  2 12:41:32 ubackup kernel: [ 7228.722442]  dbuf_write_done+0x19a/0x1c0 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.722566]  arc_write_done+0x25e/0x420 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.722688]  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.722830]  zio_done+0x39d/0xdc0 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.722972]  zio_execute+0x92/0xe0 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.723113]  taskq_thread+0x236/0x420 [spl]
Jul  2 12:41:32 ubackup kernel: [ 7228.723125]  ? wake_up_q+0xa0/0xa0
Jul  2 12:41:32 ubackup kernel: [ 7228.723128]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
Jul  2 12:41:32 ubackup kernel: [ 7228.723271]  kthread+0x12f/0x150
Jul  2 12:41:32 ubackup kernel: [ 7228.723274]  ? param_set_taskq_kick+0xf0/0xf0 [spl]
Jul  2 12:41:32 ubackup kernel: [ 7228.723285]  ? __kthread_bind_mask+0x70/0x70
Jul  2 12:41:32 ubackup kernel: [ 7228.723288]  ret_from_fork+0x22/0x30


Jul  8 01:32:19 ubackup kernel: [463029.048674] PANIC: tank: blkptr at 000000008e4479a1 has invalid CHECKSUM 0
Jul  8 01:32:19 ubackup kernel: [463029.048684] Showing stack for process 3084
Jul  8 01:32:19 ubackup kernel: [463029.048686] CPU: 0 PID: 3084 Comm: z_wr_iss Tainted: P           O      5.11.0-22-generic #23-Ubuntu
Jul  8 01:32:19 ubackup kernel: [463029.048689] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD3/GA-990FXA-UD3, BIOS FB 10/13/2011
Jul  8 01:32:19 ubackup kernel: [463029.048691] Call Trace:
Jul  8 01:32:19 ubackup kernel: [463029.048695]  show_stack+0x52/0x58
Jul  8 01:32:19 ubackup kernel: [463029.048700]  dump_stack+0x70/0x8b
Jul  8 01:32:19 ubackup kernel: [463029.048704]  spl_dumpstack+0x29/0x2b [spl]
Jul  8 01:32:19 ubackup kernel: [463029.048720]  vcmn_err.cold+0x60/0x94 [spl]
Jul  8 01:32:19 ubackup kernel: [463029.048733]  ? ptr_to_id+0xbd/0x270
Jul  8 01:32:19 ubackup kernel: [463029.048737]  ? pointer+0x19b/0x4d0
Jul  8 01:32:19 ubackup kernel: [463029.048740]  zfs_panic_recover+0x6d/0x90 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.048932]  zfs_blkptr_verify_log+0x94/0x100 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.049075]  ? spl_kmem_cache_alloc+0x3b/0x100 [spl]
Jul  8 01:32:19 ubackup kernel: [463029.049087]  ? _cond_resched+0x1a/0x50
Jul  8 01:32:19 ubackup kernel: [463029.049090]  ? do_raw_spin_unlock+0x9/0x10 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.049229]  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.049369]  ? txg_all_lists_empty+0x62/0xb0 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.049509]  ? spa_has_pending_synctask+0x46/0x60 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.049648]  ? vdev_queue_max_async_writes+0x42/0xe0 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.049788]  ? vdev_queue_class_to_issue+0xf7/0x120 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.049928]  zfs_blkptr_verify+0x359/0x470 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.050070]  ? _cond_resched+0x1a/0x50
Jul  8 01:32:19 ubackup kernel: [463029.050072]  zio_free+0x27/0x100 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.050213]  dsl_free+0x11/0x20 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.050347]  dsl_dataset_block_kill+0x45e/0x490 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.050477]  ? _cond_resched+0x1a/0x50
Jul  8 01:32:19 ubackup kernel: [463029.050480]  dbuf_write_done+0x19a/0x1c0 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.050604]  arc_write_done+0x25e/0x420 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.050726]  ? __raw_spin_unlock+0x9/0x10 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.050868]  zio_done+0x39d/0xdc0 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.051010]  zio_execute+0x92/0xe0 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.051151]  taskq_thread+0x236/0x420 [spl]
Jul  8 01:32:19 ubackup kernel: [463029.051163]  ? wake_up_q+0xa0/0xa0
Jul  8 01:32:19 ubackup kernel: [463029.051167]  ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
Jul  8 01:32:19 ubackup kernel: [463029.051309]  kthread+0x12f/0x150
Jul  8 01:32:19 ubackup kernel: [463029.051312]  ? param_set_taskq_kick+0xf0/0xf0 [spl]
Jul  8 01:32:19 ubackup kernel: [463029.051323]  ? __kthread_bind_mask+0x70/0x70
Jul  8 01:32:19 ubackup kernel: [463029.051326]  ret_from_fork+0x22/0x30


Jul 10 01:56:12 ubackup kernel: [47704.696345] PANIC: tank: blkptr at 00000000fa306804 has invalid CHECKSUM 0
Jul 10 01:56:12 ubackup kernel: [47704.696355] Showing stack for process 3313
Jul 10 01:56:12 ubackup kernel: [47704.696358] CPU: 0 PID: 3313 Comm: z_wr_int Tainted: P           OE     5.11.0-22-generic #23-Ubuntu
Jul 10 01:56:12 ubackup kernel: [47704.696361] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD3/GA-990FXA-UD3, BIOS FB 10/13/2011
Jul 10 01:56:12 ubackup kernel: [47704.696363] Call Trace:
Jul 10 01:56:12 ubackup kernel: [47704.696367]  show_stack+0x52/0x58
Jul 10 01:56:12 ubackup kernel: [47704.696372]  dump_stack+0x70/0x8b
Jul 10 01:56:12 ubackup kernel: [47704.696376]  spl_dumpstack+0x29/0x2b [spl]
Jul 10 01:56:12 ubackup kernel: [47704.696390]  vcmn_err.cold+0x60/0x94 [spl]
Jul 10 01:56:12 ubackup kernel: [47704.696400]  ? zfs_btree_insert_into_leaf+0x24a/0x2c0 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.696545]  ? ptr_to_id+0xbd/0x270
Jul 10 01:56:12 ubackup kernel: [47704.696549]  ? pointer+0x19b/0x4d0
Jul 10 01:56:12 ubackup kernel: [47704.696552]  zfs_panic_recover+0x6d/0x90 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.696666]  zfs_blkptr_verify_log+0x94/0x100 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.696782]  ? range_tree_add+0x11/0x20 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.696893]  ? metaslab_free_concrete+0x10b/0x260 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697004]  ? metaslab_free_impl+0xaf/0xe0 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697114]  ? _cond_resched+0x1a/0x50
Jul 10 01:56:12 ubackup kernel: [47704.697119]  zfs_blkptr_verify+0x359/0x470 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697234]  zio_free+0x27/0x100 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697349]  dsl_free+0x11/0x20 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697458]  dsl_dataset_block_kill+0x4b6/0x4f0 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697564]  dbuf_write_done+0x1ad/0x1f0 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697664]  arc_write_done+0x8f/0x420 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697763]  zio_done+0x405/0x11b0 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697879]  zio_execute+0x8b/0x130 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.697994]  taskq_thread+0x2b7/0x500 [spl]
Jul 10 01:56:12 ubackup kernel: [47704.698003]  ? wake_up_q+0xa0/0xa0
Jul 10 01:56:12 ubackup kernel: [47704.698007]  ? zio_gang_tree_free+0x70/0x70 [zfs]
Jul 10 01:56:12 ubackup kernel: [47704.698123]  kthread+0x12f/0x150
Jul 10 01:56:12 ubackup kernel: [47704.698126]  ? taskq_thread_spawn+0x60/0x60 [spl]
Jul 10 01:56:12 ubackup kernel: [47704.698135]  ? __kthread_bind_mask+0x70/0x70
Jul 10 01:56:12 ubackup kernel: [47704.698138]  ret_from_fork+0x22/0x30
@zrav zrav added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Jul 11, 2021
@rincebrain
Copy link
Contributor

rincebrain commented Jul 11, 2021

Immediate question: maybe 958826b makes your life better? I don't think it's a listed failure mode, but I'd give it a shot.

Other thought: are any of the datasets you're receiving encrypted? Because there are a couple of outstanding issues with invalid data winding up in the pipeline with encrypted send/recv...

(Amused aside: hey, I used to own that exact motherboard.)

@zrav
Copy link
Author

zrav commented Jul 11, 2021

I'll give current master a spin to see if 958826b helps and report back when/if it panics again.

Of the sent datasets a single one is encrypted (sent raw), but IIRC the panics/hangs happenned on the other datasets.

@zrav
Copy link
Author

zrav commented Jul 11, 2021

That was quick:

Jul 11 17:27:09 ubackup kernel: [ 1561.161467] BUG: kernel NULL pointer dereference, address: 0000000000000000
Jul 11 17:27:09 ubackup kernel: [ 1561.161480] #PF: supervisor read access in kernel mode
Jul 11 17:27:09 ubackup kernel: [ 1561.161483] #PF: error_code(0x0000) - not-present page
Jul 11 17:27:09 ubackup kernel: [ 1561.161486] PGD 0 P4D 0
Jul 11 17:27:09 ubackup kernel: [ 1561.161491] Oops: 0000 [#1] SMP NOPTI
Jul 11 17:27:09 ubackup kernel: [ 1561.161495] CPU: 0 PID: 3574 Comm: txg_sync Tainted: P           OE     5.11.0-22-generic #23-Ubuntu
Jul 11 17:27:09 ubackup kernel: [ 1561.161501] Hardware name: Gigabyte Technology Co., Ltd. GA-990FXA-UD3/GA-990FXA-UD3, BIOS FB 10/13/2011
Jul 11 17:27:09 ubackup kernel: [ 1561.161504] RIP: 0010:arc_release+0x1d/0x770 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.161638] Code: 09 df e2 c7 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 4c 8d 6f 10 41 54 53 48 83 ec 50 <4c> 8b 3f 4c 89 ef e8 48 8c e3 c7 49 8d 46 30 48 89 45 d0 48 8b 05
Jul 11 17:27:09 ubackup kernel: [ 1561.161643] RSP: 0018:ffffa62202ef79d0 EFLAGS: 00010282
Jul 11 17:27:09 ubackup kernel: [ 1561.161647] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000001
Jul 11 17:27:09 ubackup kernel: [ 1561.161650] RDX: dead000000000100 RSI: ffff9a951e16b080 RDI: 0000000000000000
Jul 11 17:27:09 ubackup kernel: [ 1561.161653] RBP: ffffa62202ef7a48 R08: 0000000000000200 R09: ffff9a95201b6c00
Jul 11 17:27:09 ubackup kernel: [ 1561.161655] R10: 0000000000000000 R11: 00000001fffffe00 R12: ffff9a9531fb2e20
Jul 11 17:27:09 ubackup kernel: [ 1561.161658] R13: 0000000000000010 R14: 0000000000000000 R15: ffff9a951e16b080
Jul 11 17:27:09 ubackup kernel: [ 1561.161661] FS:  0000000000000000(0000) GS:ffff9a9723c00000(0000) knlGS:0000000000000000
Jul 11 17:27:09 ubackup kernel: [ 1561.161665] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 11 17:27:09 ubackup kernel: [ 1561.161667] CR2: 0000000000000000 CR3: 000000010c6d4000 CR4: 00000000000006f0
Jul 11 17:27:09 ubackup kernel: [ 1561.161671] Call Trace:
Jul 11 17:27:09 ubackup kernel: [ 1561.161676]  ? spl_kvmalloc+0x82/0xb0 [spl]
Jul 11 17:27:09 ubackup kernel: [ 1561.161688]  ? spl_kmem_alloc_impl+0xfe/0x120 [spl]
Jul 11 17:27:09 ubackup kernel: [ 1561.161699]  dbuf_dirty+0x739/0x8f0 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.161801]  ? dbuf_read+0x2af/0x5c0 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.161904]  dmu_buf_will_dirty_impl+0xc5/0x180 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162006]  dmu_buf_will_dirty+0x16/0x20 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162108]  dmu_write_impl+0x42/0xd0 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162211]  dmu_write.part.0+0x9e/0x130 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162315]  dmu_write+0x14/0x20 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162418]  space_map_write+0x151/0x8b0 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162533]  ? _cond_resched+0x1a/0x50
Jul 11 17:27:09 ubackup kernel: [ 1561.162538]  metaslab_sync+0x1d2/0x940 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162651]  vdev_sync+0x7f/0x4e0 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162766]  ? mutex_lock+0x13/0x40
Jul 11 17:27:09 ubackup kernel: [ 1561.162769]  spa_sync+0x602/0x1000 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162883]  ? spa_txg_history_init_io+0x106/0x110 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.162999]  txg_sync_thread+0x278/0x3f0 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.163114]  ? txg_init+0x260/0x260 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.163229]  thread_generic_wrapper+0x79/0x90 [spl]
Jul 11 17:27:09 ubackup kernel: [ 1561.163240]  kthread+0x12f/0x150
Jul 11 17:27:09 ubackup kernel: [ 1561.163245]  ? __thread_exit+0x20/0x20 [spl]
Jul 11 17:27:09 ubackup kernel: [ 1561.163255]  ? __kthread_bind_mask+0x70/0x70
Jul 11 17:27:09 ubackup kernel: [ 1561.163259]  ret_from_fork+0x22/0x30
Jul 11 17:27:09 ubackup kernel: [ 1561.163265] Modules linked in: zfs(POE) zunicode(POE) zzstd(O) zlua(OE) zavl(POE) icp(POE) zcommon(POE) znvpair(POE) edac_mce_amd spl(OE) kvm_amd ccp kvm snd_hda_codec_hdmi snd_hda_intel serio_raw snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence snd_hda_codec snd_hda_core wmi_bmof snd_hwdep k10temp soundwire_bus nouveau snd_soc_core snd_compress mxm_wmi video ac97_bus snd_pcm_dmaengine drm_ttm_helper snd_pcm ttm drm_kms_helper snd_timer cec snd rc_core i2c_algo_bit soundcore fb_sys_fops syscopyarea sysfillrect sysimgblt mac_hid sch_fq_codel it87 hwmon_vid lp parport msr drm ip_tables x_tables autofs4 pata_acpi mpt3sas r8169 raid_class ahci xhci_pci realtek i2c_piix4 libahci scsi_transport_sas pata_jmicron xhci_pci_renesas wmi
Jul 11 17:27:09 ubackup kernel: [ 1561.163328] CR2: 0000000000000000
Jul 11 17:27:09 ubackup kernel: [ 1561.163331] ---[ end trace 11e8be3f617e89f8 ]---
Jul 11 17:27:09 ubackup kernel: [ 1561.163334] RIP: 0010:arc_release+0x1d/0x770 [zfs]
Jul 11 17:27:09 ubackup kernel: [ 1561.163435] Code: 09 df e2 c7 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 49 89 fe 41 55 4c 8d 6f 10 41 54 53 48 83 ec 50 <4c> 8b 3f 4c 89 ef e8 48 8c e3 c7 49 8d 46 30 48 89 45 d0 48 8b 05
Jul 11 17:27:09 ubackup kernel: [ 1561.163441] RSP: 0018:ffffa62202ef79d0 EFLAGS: 00010282
Jul 11 17:27:09 ubackup kernel: [ 1561.163444] RAX: dead000000000122 RBX: 0000000000000000 RCX: 0000000000000001
Jul 11 17:27:09 ubackup kernel: [ 1561.163447] RDX: dead000000000100 RSI: ffff9a951e16b080 RDI: 0000000000000000
Jul 11 17:27:09 ubackup kernel: [ 1561.163450] RBP: ffffa62202ef7a48 R08: 0000000000000200 R09: ffff9a95201b6c00
Jul 11 17:27:09 ubackup kernel: [ 1561.163453] R10: 0000000000000000 R11: 00000001fffffe00 R12: ffff9a9531fb2e20
Jul 11 17:27:09 ubackup kernel: [ 1561.163456] R13: 0000000000000010 R14: 0000000000000000 R15: ffff9a951e16b080
Jul 11 17:27:09 ubackup kernel: [ 1561.163459] FS:  0000000000000000(0000) GS:ffff9a9723c00000(0000) knlGS:0000000000000000
Jul 11 17:27:09 ubackup kernel: [ 1561.163463] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 11 17:27:09 ubackup kernel: [ 1561.163466] CR2: 0000000000000000 CR3: 000000010c6d4000 CR4: 00000000000006f0

@zrav
Copy link
Author

zrav commented Sep 2, 2021

I've recreated the pool, so there's nothing to be done here anymore except wonder what the original problem cause was...

@zrav zrav closed this as completed Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants