Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% arc_prune during zfs send/recv #10119

Closed
devZer0 opened this issue Mar 11, 2020 · 4 comments
Closed

100% arc_prune during zfs send/recv #10119

devZer0 opened this issue Mar 11, 2020 · 4 comments

Comments

@devZer0
Copy link

devZer0 commented Mar 11, 2020

Distribution Name Centos
Distribution Version 7
Linux Kernel Linux zfsonlinux-zstd 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
zfs version: 0.8.0-1
spl version: 0.8.0-1
KVM virtual machine (proxmox)

After Transfer of a larger Dataset with syncoid to this zstd enabled Pool ( #9735 ), i got this hang.

not sure if this is related to zstd

i cannot send zpool layout as zpool command hangs now.

zpool consists of 6x5tb virtual disk and 1x 90GB special vdev for metadata allocation (not mirrored, because i'm only testing)

top - 09:52:14 up 4 days, 14:39,  1 user,  load average: 9,29, 9,35, 9,33
Tasks: 294 total,   8 running, 286 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0,1 us, 99,8 sy,  0,0 ni,  0,0 id,  0,0 wa,  0,0 hi,  0,1 si,  0,0 st
KiB Mem :  8008688 total,   355044 free,  6634012 used,  1019632 buff/cache
KiB Swap:  3354620 total,  2298676 free,  1055944 used.  1119068 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
23247 root      20   0       0      0      0 R  99,7  0,0 335:24.01 arc_prune
  937 root      20   0       0      0      0 R  99,3  0,0 723:45.58 arc_prune
31854 root      20   0       0      0      0 R  99,0  0,0 335:02.11 arc_prune
26475 root      20   0       0      0      0 R  98,4  0,0 335:26.29 arc_prune
  938 root      39  19       0      0      0 R   1,3  0,0  43:35.15 zthr_procedure
    9 root      20   0       0      0      0 R   0,3  0,0  23:46.39 rcu_sched
 [378358.156439] INFO: task txg_sync:2923 blocked for more than 120 seconds.
 [378358.156492] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 [378358.156703] txg_sync        D ffff9a24e9275230     0  2923      2 0x00000080
 [378358.156708] Call Trace:
 [378358.156736]  [<ffffffff81f80a29>] schedule+0x29/0x70
 [378358.156785]  [<ffffffffc0455135>] taskq_wait+0x85/0xf0 [spl]
 [378358.156792]  [<ffffffff818c72e0>] ? wake_up_atomic_t+0x30/0x30
 [378358.156982]  [<ffffffffc086ad1b>] dmu_objset_sync+0x39b/0x510 [zfs]
 [378358.157036]  [<ffffffffc0867be0>] ? checksum_changed_cb+0x30/0x30 [zfs]
 [378358.157066]  [<ffffffffc08688f0>] ? sync_dnodes_task+0x150/0x150 [zfs]
 [378358.157120]  [<ffffffffc0892285>] dsl_dataset_sync+0x75/0x270 [zfs]
 [378358.157174]  [<ffffffffc08a7992>] dsl_pool_sync+0xc2/0x520 [zfs]
 [378358.157230]  [<ffffffffc08d5e53>] spa_sync+0x5e3/0x1090 [zfs]
 [378358.157325]  [<ffffffffc09543cb>] ? spa_txg_history_init_io+0x10b/0x120 [zfs]
 [378358.157373]  [<ffffffffc08f029f>] txg_sync_thread+0x2cf/0x4d0 [zfs]
 [378358.157426]  [<ffffffffc08effd0>] ? txg_fini+0x2a0/0x2a0 [zfs]
 [378358.157434]  [<ffffffffc0455e23>] thread_generic_wrapper+0x73/0x80 [spl]
 [378358.157441]  [<ffffffffc0455db0>] ? __thread_exit+0x20/0x20 [spl]
 [378358.157445]  [<ffffffff818c61f1>] kthread+0xd1/0xe0
 [378358.157448]  [<ffffffff818c6120>] ? insert_kthread_work+0x40/0x40
 [378358.157452]  [<ffffffff81f8dd37>] ret_from_fork_nospec_begin+0x21/0x21
 [378358.157455]  [<ffffffff818c6120>] ? insert_kthread_work+0x40/0x40
 [378478.155943] INFO: task txg_sync:2923 blocked for more than 120 seconds.
 [378478.156004] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
 [378478.156209] txg_sync        D ffff9a24e9275230     0  2923      2 0x00000080
 [378478.156214] Call Trace:
 [378478.156224]  [<ffffffff81f80a29>] schedule+0x29/0x70
 [378478.156242]  [<ffffffffc0455135>] taskq_wait+0x85/0xf0 [spl]
 [378478.156248]  [<ffffffff818c72e0>] ? wake_up_atomic_t+0x30/0x30
 [378478.156293]  [<ffffffffc086ad1b>] dmu_objset_sync+0x39b/0x510 [zfs]
 [378478.156323]  [<ffffffffc0867be0>] ? checksum_changed_cb+0x30/0x30 [zfs]
 [378478.156353]  [<ffffffffc08688f0>] ? sync_dnodes_task+0x150/0x150 [zfs]
 [378478.156409]  [<ffffffffc0892285>] dsl_dataset_sync+0x75/0x270 [zfs]
 [378478.156452]  [<ffffffffc08a7992>] dsl_pool_sync+0xc2/0x520 [zfs]
 [378478.156498]  [<ffffffffc08d5e53>] spa_sync+0x5e3/0x1090 [zfs]
 [378478.156548]  [<ffffffffc09543cb>] ? spa_txg_history_init_io+0x10b/0x120 [zfs]
 [378478.156600]  [<ffffffffc08f029f>] txg_sync_thread+0x2cf/0x4d0 [zfs]
 [378478.156671]  [<ffffffffc08effd0>] ? txg_fini+0x2a0/0x2a0 [zfs]
 [378478.156678]  [<ffffffffc0455e23>] thread_generic_wrapper+0x73/0x80 [spl]
 [378478.156685]  [<ffffffffc0455db0>] ? __thread_exit+0x20/0x20 [spl]
 [378478.156688]  [<ffffffff818c61f1>] kthread+0xd1/0xe0
 [378478.156692]  [<ffffffff818c6120>] ? insert_kthread_work+0x40/0x40
 [378478.156696]  [<ffffffff81f8dd37>] ret_from_fork_nospec_begin+0x21/0x21
 [378478.156698]  [<ffffffff818c6120>] ? insert_kthread_work+0x40/0x40
 [378598.154523] INFO: task txg_sync:2923 blocked for more than 120 seconds.	 
[root@zfsonlinux-zstd ~]# zpool list -v
NAME                                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
zfszstdpool                               30,1T  17,3T  12,7T        -         -     0%    57%  1.00x    ONLINE  -
  raidz2                                  30,0T  17,3T  12,7T        -         -     0%  57,6%      -  ONLINE
    scsi-0QEMU_QEMU_HARDDISK_drive-scsi6      -      -      -        -         -      -      -      -  ONLINE
    scsi-0QEMU_QEMU_HARDDISK_drive-scsi5      -      -      -        -         -      -      -      -  ONLINE
    scsi-0QEMU_QEMU_HARDDISK_drive-scsi4      -      -      -        -         -      -      -      -  ONLINE
    scsi-0QEMU_QEMU_HARDDISK_drive-scsi3      -      -      -        -         -      -      -      -  ONLINE
    scsi-0QEMU_QEMU_HARDDISK_drive-scsi2      -      -      -        -         -      -      -      -  ONLINE
    scsi-0QEMU_QEMU_HARDDISK_drive-scsi1      -      -      -        -         -      -      -      -  ONLINE
special                                       -      -      -        -         -      -      -      -  -
  scsi-0QEMU_QEMU_HARDDISK_drive-scsi10   89,5G  62,5G  27,0G        -         -    69%  69,8%      -  ONLINE
@devZer0
Copy link
Author

devZer0 commented Mar 11, 2020

# cat /proc/spl/kstat/zfs/arcstats
12 1 0x01 98 26656 4665194876 399536785689570
name                            type data
hits                            4    6767890965
misses                          4    166864893
demand_data_hits                4    18445
demand_data_misses              4    669
demand_metadata_hits            4    6093312669
demand_metadata_misses          4    97460790
prefetch_data_hits              4    259
prefetch_data_misses            4    486
prefetch_metadata_hits          4    674559592
prefetch_metadata_misses        4    69402948
mru_hits                        4    1384902051
mru_ghost_hits                  4    11815366
mfu_hits                        4    4765961879
mfu_ghost_hits                  4    36978895
deleted                         4    316220690
mutex_miss                      4    128854
access_skip                     4    494
evict_skip                      4    397795048
evict_not_enough                4    1094762
evict_l2_cached                 4    0
evict_l2_eligible               4    21551592579072
evict_l2_ineligible             4    356552675328
evict_l2_skip                   4    0
hash_elements                   4    32081
hash_elements_max               4    1165805
hash_collisions                 4    57932517
hash_chains                     4    499
hash_chain_max                  4    9
p                               4    2282767042
c                               4    4100448256
c_min                           4    256278016
c_max                           4    4100448256
size                            4    4197620672
compressed_size                 4    685075456
uncompressed_size               4    2346935808
overhead_size                   4    3189757440
hdr_size                        4    16252224
data_size                       4    1043528192
metadata_size                   4    2831304704
dbuf_size                       4    60057600
dnode_size                      4    209778112
bonus_size                      4    36699840
anon_size                       4    1127100416
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    1586360832
mru_evictable_data              4    0
mru_evictable_metadata          4    0
mru_ghost_size                  4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    1161371648
mfu_evictable_data              4    0
mfu_evictable_metadata          4    0
mfu_ghost_size                  4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
memory_throttle_count           4    0
memory_direct_count             4    5267
memory_indirect_count           4    17463930
memory_all_bytes                4    8200896512
memory_free_bytes               4    2181275648
memory_available_bytes          3    2053140480
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    909302869
arc_meta_used                   4    3154092480
arc_meta_limit                  4    3075336192
arc_dnode_limit                 4    307533619
arc_meta_max                    4    4098473146
arc_meta_min                    4    16777216
async_upgrade_sync              4    137396
demand_hit_predictive_prefetch  4    4302131
demand_hit_prescient_prefetch   4    3280612
arc_need_free                   4    0
arc_sys_free                    4    128139008
arc_raw_size                    4    0

@devZer0
Copy link
Author

devZer0 commented Mar 11, 2020

this helped: #9966 (comment)

i saw that there is an updatedb process active on the datasets, and as i set snapdir to visible, i assume it was crawling the entire dataset tree including all snapshots, which is a lot of files.

anyhow, that should not crash/overload zfs

@PrivatePuffin
Copy link
Contributor

You can't/shouldn't file issues if you run custom builds... (such as zstd)
it needlessly confuses people...

That being said: This is a known issue for which already multiple(!) issues are filed. Please close this and join the club ;)

@devZer0
Copy link
Author

devZer0 commented Mar 11, 2020

ok, sorry, didn't know

@devZer0 devZer0 closed this as completed Mar 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants