Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[arm64] scrub uses all machine memory and locks up the machine #12150

Closed
omarkilani opened this issue May 28, 2021 · 33 comments
Closed

[arm64] scrub uses all machine memory and locks up the machine #12150

omarkilani opened this issue May 28, 2021 · 33 comments
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@omarkilani
Copy link

System information

Type Version/Name
Distribution Name RHEL
Distribution Version 8.4
Linux Kernel 4.18.0-305.el8.aarch64
Architecture aarch64
ZFS Version 2.1.0-rc5
SPL Version 2.1.0-rc5

Describe the problem you're observing

I was doing a stress test on a zfs pool running Postgres. I left it running overnight and came back to a locked up VM. Nothing on the console from the lock up that I could see, but I suspect zfs was behind the lock up.

When I rebooted the VM, I ran scrub on the pool. The machine ran out of memory in about 5 seconds and the OOM kicked in, and eventually the machine rebooted.

If I import the pool again the scrub kicks off again automatically and the machine runs out of memory again.

Will try 2.0.4 soon.

Describe how to reproduce the problem

zpool scrub tank

Include any warning/errors/backtraces from the system logs

[  119.225278] free invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[  119.227275] CPU: 12 PID: 12804 Comm: free Kdump: loaded Tainted: P           OE    --------- -t - 4.18.0-305.el8.aarch64 #1
[  119.228849] Hardware name: QEMU KVM Virtual Machine, BIOS 1.4.1 12/03/2020
[  119.229796] Call trace:
[  119.230130]  dump_backtrace+0x0/0x188
[  119.230610]  show_stack+0x24/0x30
[  119.231110]  dump_stack+0x9c/0xbc
[  119.231553]  dump_header+0x48/0x1dc
[  119.232044]  oom_kill_process+0x188/0x190
[  119.232567]  out_of_memory+0x178/0x4e8
[  119.233610]  __alloc_pages_nodemask+0xd40/0xdc8
[  119.234368]  alloc_pages_vma+0x90/0x1f8
[  119.234967]  __read_swap_cache_async+0xfc/0x290
[  119.235641]  read_swap_cache_async+0x5c/0xa0
[  119.236228]  swap_cluster_readahead+0x28c/0x2e8
[  119.236886]  swapin_readahead+0x2a0/0x3c0
[  119.237436]  do_swap_page+0x554/0x878
[  119.237932]  __handle_mm_fault+0x4b4/0x578
[  119.238480]  handle_mm_fault+0xd8/0x170
[  119.238979]  do_page_fault+0x15c/0x478
[  119.239422]  do_translation_fault+0x9c/0xac
[  119.239934]  do_mem_abort+0x50/0xa8
[  119.240332]  el0_da+0x24/0x28
[  119.240707] Mem-Info:
[  119.240994] active_anon:13 inactive_anon:62 isolated_anon:3
[  119.240994]  active_file:3 inactive_file:0 isolated_file:0
[  119.240994]  unevictable:485 dirty:0 writeback:32
[  119.240994]  slab_reclaimable:675 slab_unreclaimable:14673
[  119.240994]  mapped:302 shmem:48 pagetables:176 bounce:0
[  119.240994]  free:83658 free_pcp:0 free_cma:0
[  119.244652] Node 0 active_anon:832kB inactive_anon:3968kB active_file:192kB inactive_file:0kB unevictable:31040kB isolated(anon):192kB isolated(file):0kB mapped:19328kB dirty:0kB writeback:2048kB shmem:3072kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB all_unreclaimable? no
[  119.247743] Node 0 DMA32 free:506624kB min:128448kB low:160512kB high:192576kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3145728kB managed:2581184kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  119.251008] lowmem_reserve[]: 0 5932 5932
[  119.251543] Node 0 Normal free:4847488kB min:4848256kB low:6060288kB high:7272320kB active_anon:832kB inactive_anon:3968kB active_file:192kB inactive_file:0kB unevictable:31040kB writepending:2048kB present:97386496kB managed:97258048kB mlocked:27968kB kernel_stack:41088kB pagetables:11264kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  119.255039] lowmem_reserve[]: 0 0 0
[  119.255458] Node 0 DMA32: 4*64kB (UM) 2*128kB (M) 3*256kB (M) 3*512kB (M) 2*1024kB (M) 3*2048kB (M) 1*4096kB (U) 0*8192kB 2*16384kB (UM) 0*32768kB 1*65536kB (M) 1*131072kB (M) 1*262144kB (M) 0*524288kB = 506624kB
[  119.257844] Node 0 Normal: 221*64kB (UME) 247*128kB (UME) 200*256kB (ME) 153*512kB (M) 114*1024kB (M) 71*2048kB (UM) 26*4096kB (UM) 8*8192kB (M) 3*16384kB (ME) 0*32768kB 2*65536kB (UE) 3*131072kB (UME) 4*262144kB (UME) 5*524288kB (M) = 4852928kB
[  119.260573] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB
[  119.261721] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB
[  119.262836] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  119.263933] 416 total pagecache pages
[  119.264397] 1 pages in swap cache
[  119.264815] Swap cache stats: add 4246, delete 4245, find 31/64
[  119.265559] Free swap  = 8099456kB
[  119.265987] Total swap = 8388480kB
[  119.266422] 1570816 pages RAM
[  119.266815] 0 pages HighMem/MovableOnly
[  119.267303] 10828 pages reserved
[  119.267726] 0 pages hwpoisoned
[  119.268117] Unreclaimable slab info:
[  119.268545] Name                      Used          Total
[  119.269292] zfs_znode_cache           63KB         63KB
[  119.269938] nf_conntrack             892KB        892KB
[  119.270653] virtio-gpu-vbufs         255KB        255KB
[  119.271351] xfs_buf                 1149KB       1149KB
[  119.272000] xfs_ili                 1023KB       1023KB
[  119.272649] xfs_efi_item             763KB        763KB
[  119.273296] xfs_efd_item             763KB        763KB
[  119.273997] xfs_buf_item             956KB        956KB
[  119.274689] xf_trans                1022KB       1022KB
[  119.275337] xfs_ifork               1023KB       1023KB
[  119.276025] xfs_da_state            1020KB       1020KB
[  119.276701] xfs_btree_cur            958KB        958KB
[  119.277370] xfs_bmap_free_item        768KB        768KB
[  119.278077] xfs_log_ticket          1023KB       1023KB
[  119.278744] bio-4                    127KB        127KB
[  119.279447] bio-3                   1912KB       1912KB
[  119.280136] scsi_sense_cache        1152KB       1152KB
[  119.280821] virtio_scsi_cmd           63KB         63KB
[  119.281517] rpc_buffers               64KB         64KB
[  119.282191] rpc_tasks                 64KB         64KB
[  119.282928] fib6_nodes               256KB        256KB
[  119.283829] ip6_dst_cache            382KB        382KB
[  119.284501] RAWv6                    189KB        189KB
[  119.285183] UDPv6                    695KB        695KB
[  119.285874] TCPv6                    369KB        369KB
[  119.286563] sd_ext_cdb                64KB         64KB
[  119.287248] sgpool-128               256KB        256KB
[  119.287866] sgpool-64                 64KB         64KB
[  119.288470] sgpool-32                704KB        704KB
[  119.289032] sgpool-16                960KB        960KB
[  119.289580] sgpool-8                1024KB       1024KB
[  119.291086] mqueue_inode_cache         64KB         64KB
[  119.291637] kioctx                   383KB        383KB
[  119.292302] aio_kiocb                511KB        511KB
[  119.293045] dnotify_mark              63KB         63KB
[  119.293611] dnotify_struct            64KB         64KB
[  119.294234] bio-2                    512KB        512KB
[  119.294833] fasync_cache              63KB         63KB
[  119.295452] posix_timers_cache        191KB        191KB
[  119.296083] UNIX                    1008KB       1008KB
[  119.296591] tcp_bind_bucket          832KB        832KB
[  119.297045] inet_peer_cache          447KB        447KB
[  119.297504] ip_fib_trie              319KB        319KB
[  119.297978] ip_fib_alias             383KB        383KB
[  119.298419] ip_dst_cache            1024KB       1024KB
[  119.298868] RAW                      126KB        126KB
[  119.299467] UDP                      828KB        828KB
[  119.300016] tw_sock_TCP              255KB        255KB
[  119.300602] request_sock_TCP          63KB         63KB
[  119.301172] TCP                      823KB        823KB
[  119.301750] hugetlbfs_inode_cache        127KB        127KB
[  119.302350] bio-1                   1083KB       1083KB
[  119.302866] eventpoll_pwq           1023KB       1023KB
[  119.303363] eventpoll_epi           1024KB       1024KB
[  119.303809] inotify_inode_mark       1023KB       1023KB
[  119.304279] request_queue            880KB        880KB
[  119.304766] blkdev_ioc              1023KB       1023KB
[  119.305260] bio-0                   1600KB       1600KB
[  119.305719] biovec-max              4096KB       4096KB
[  119.306189] biovec-64                768KB        768KB
[  119.307545] biovec-16               1024KB       1024KB
[  119.308005] bio_integrity_payload         64KB         64KB
[  119.308569] uid_cache                575KB        575KB
[  119.309074] dmaengine-unmap-256         63KB         63KB
[  119.309595] dmaengine-unmap-128         63KB         63KB
[  119.310177] dmaengine-unmap-16         63KB         63KB
[  119.310739] dmaengine-unmap-2         64KB         64KB
[  119.311329] audit_buffer             959KB        959KB
[  119.315623] skbuff_fclone_cache       1017KB       1017KB
[  119.316334] skbuff_head_cache       1216KB       1216KB
[  119.317043] file_lock_cache         1022KB       1022KB
[  119.317775] file_lock_ctx            959KB        959KB
[  119.318512] fsnotify_mark_connector       1023KB       1023KB
[  119.319281] net_namespace            250KB        250KB
[  119.319950] task_delay_info         1023KB       1023KB
[  119.320617] taskstats                765KB        765KB
[  119.321303] proc_dir_entry           896KB        896KB
[  119.321994] pde_opener              1023KB       1023KB
[  119.322683] seq_file                1024KB       1024KB
[  119.323421] sigqueue                1023KB       1023KB
[  119.324133] shmem_inode_cache       2358KB       2358KB
[  119.324877] kernfs_iattrs_cache        703KB        703KB
[  119.325632] kernfs_node_cache       5110KB       5110KB
[  119.326361] mnt_cache                956KB        956KB
[  119.327106] filp                    8704KB       8704KB
[  119.327836] names_cache             2048KB       2048KB
[  119.328567] avc_node                 639KB        639KB
[  119.329298] selinux_file_security       1024KB       1024KB
[  119.330060] selinux_inode_security       1599KB       1599KB
[  119.330849] key_jar                 1024KB       1024KB
[  119.331604] nsproxy                  447KB        447KB
[  119.332340] vm_area_struct          6900KB       6900KB
[  119.333077] mm_struct               1024KB       1024KB
[  119.333820] fs_cache                1024KB       1024KB
[  119.334561] files_cache             1023KB       1023KB
[  119.335299] signal_cache            3209KB       3209KB
[  119.336028] sighand_cache           2429KB       2429KB
[  119.336742] task_struct            10748KB      11347KB
[  119.337458] cred_jar                2429KB       2429KB
[  119.338120] anon_vma_chain          2112KB       2112KB
[  119.338754] anon_vma                1534KB       1534KB
[  119.339428] pid                     1024KB       1024KB
[  119.340098] Acpi-Operand             511KB        511KB
[  119.340748] Acpi-ParseExt            127KB        127KB
[  119.341431] Acpi-Parse                63KB         63KB
[  119.342155] Acpi-State               191KB        191KB
[  119.342908] Acpi-Namespace            63KB         63KB
[  119.343609] numa_policy               63KB         63KB
[  119.344281] trace_event_file         447KB        447KB
[  119.344970] ftrace_event_field        447KB        447KB
[  119.345666] pool_workqueue           768KB        768KB
[  119.346353] task_group              1022KB       1022KB
[  119.347050] pgd_cache               4352KB       4352KB
[  119.347723] vmap_area              88768KB      88768KB
[  119.348312] kmalloc-128k            4608KB       4608KB
[  119.348903] kmalloc-64k             9216KB       9216KB
[  119.349465] kmalloc-32k             8192KB       8192KB
[  119.350009] kmalloc-16k            15808KB      19968KB
[  119.351182] kmalloc-8k              5120KB       5120KB
[  119.351739] kmalloc-4k             10180KB      11136KB
[  119.352315] kmalloc-2k              8576KB       8576KB
[  119.352969] kmalloc-1k             24282KB      24320KB
[  119.353925] kmalloc-512             3840KB       3840KB
[  119.354622] kmalloc-256             4352KB       4352KB
[  119.355330] kmalloc-128           365120KB     365120KB
[  119.356019] kmem_cache_node          704KB        704KB
[  119.356707] kmem_cache               704KB        704KB
[  119.357401] Tasks state (memory values in pages):
[  119.358008] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[  119.359182] [   1307]     0  1307      528       55   393216       94             0 systemd-journal
[  119.360512] [   1338]     0  1338      803       60   393216      115         -1000 systemd-udevd
[  119.361683] [   1708]    32  1708      239       62   393216       59             0 rpcbind
[  119.362743] [   1710]     0  1710     1716       13   393216       54         -1000 auditd
[  119.363772] [   1712]     0  1712      163       24   393216       47             0 sedispatch
[  119.364752] [   1735]     0  1735      202       18   393216       51             0 smartd
[  119.365610] [   1739]     0  1739     4578       65   458752      154             0 sssd
[  119.376035] [   1742]   991  1742      222       23   393216       21             0 chronyd
[  119.377148] [   1743]     0  1743     1378       22   458752       51             0 irqbalance
[  119.378328] [   1751]   997  1751    30608       45   720896      193             0 polkitd
[  119.379513] [   1752]   996  1752      192       23   393216       14             0 lsmd
[  119.380632] [   1753]    81  1753      191       51   393216       56          -900 dbus-daemon
[  119.381852] [   1758]   990  1758     4927       25   458752       79             0 rngd
[  119.382925] [   1768]     0  1768     4596       56   458752      179             0 sssd_be
[  119.383977] [   1772]     0  1772     4793      215   458752      146             0 sssd_nss
[  119.385060] [   1792]     0  1792      500       56   393216       81             0 systemd-logind
[  119.386229] [   1793]     0  1793     5685       77   393216      510             0 firewalld
[  119.387364] [   1844]     0  1844     7539       81   458752      163             0 NetworkManager
[  119.388696] [   1849]     0  1849      443      438   393216        0           -17 iscsid
[  119.389787] [   1851]     0  1851     7696       72   458752      358             0 tuned
[  119.390848] [   1856]     0  1856     4160       23   393216       57             0 gssproxy
[  119.392044] [   2201]     0  2201      323       73   393216       90             0 dhclient
[  119.393225] [   2300]   989  2300     1812        0   393216      139             0 agent
[  119.394372] [   2301]     0  2301     2625       57   458752      105             0 rsyslogd
[  119.395515] [   2313]   988  2313     1774        0   393216       58             0 updater
[  119.396468] [   2360]     0  2360      512       77   393216       67         -1000 sshd
[  119.397349] [   2488]   989  2488     1838        0   393216      150             0 gomon
[  119.399176] [   4474]     0  4474      333       29   393216       31             0 atd
[  119.400031] [   4475]     0  4475     3474       18   327680       11             0 agetty
[  119.401120] [   4478]     0  4478     3559       32   458752       39             0 crond
[  119.402290] [   4485]     0  4485     3481       14   393216       11             0 agetty
[  119.403429] [   5446]     0  5446      807       70   393216      112             0 sshd
[  119.404535] [   5449]  1000  5449      393       55   393216       96             0 systemd
[  119.405722] [   5450]  1000  5450     6501        0   458752      175             0 (sd-pam)
[  119.406906] [   5458]  1000  5458      810       38   393216      125             0 sshd
[  119.408022] [   5459]  1000  5459     3556       23   458752       38             0 bash
[  119.409153] [   5507]     0  5507     3982       41   458752      105             0 sudo
[  119.410285] [   5508]     0  5508     3561       26   458752       41             0 bash
[  119.411443] [  12804]     0 12804       20        2   327680       10             0 free
[  119.412591] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/firewalld.service,task=firewalld,pid=1793,uid=0
[  119.414736] Out of memory: Killed process 1793 (firewalld) total-vm:363840kB, anon-rss:0kB, file-rss:4928kB, shmem-rss:0kB, UID:0 pgtables:384kB oom_score_adj:0
[  119.419619] oom_reaper: reaped process 1793 (firewalld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[  119.474897] gmain invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
[  119.475852] CPU: 13 PID: 1845 Comm: gmain Kdump: loaded Tainted: P           OE    --------- -t - 4.18.0-305.el8.aarch64 #1
[  119.477095] Hardware name: QEMU KVM Virtual Machine, BIOS 1.4.1 12/03/2020
[  119.477803] Call trace:
[  119.478056]  dump_backtrace+0x0/0x188
[  119.478449]  show_stack+0x24/0x30
[  119.478817]  dump_stack+0x9c/0xbc
[  119.479192]  dump_header+0x48/0x1dc
[  119.479576]  oom_kill_process+0x188/0x190
[  119.480006]  out_of_memory+0x178/0x4e8
[  119.480419]  __alloc_pages_nodemask+0xd40/0xdc8
[  119.480887]  alloc_pages_vma+0x90/0x1f8
[  119.481308]  __read_swap_cache_async+0xfc/0x290
[  119.481834]  swap_cluster_readahead+0x17c/0x2e8
[  119.482308]  swapin_readahead+0x2a0/0x3c0
[  119.482785]  do_swap_page+0x554/0x878
[  119.483210]  __handle_mm_fault+0x4b4/0x578
[  119.483647]  handle_mm_fault+0xd8/0x170
[  119.484101]  do_page_fault+0x15c/0x478
[  119.484547]  do_translation_fault+0x9c/0xac
[  119.484996]  do_mem_abort+0x50/0xa8
[  119.485327]  el0_da+0x24/0x28
[  119.485602] Mem-Info:
[  119.485810] active_anon:13 inactive_anon:62 isolated_anon:3
[  119.485810]  active_file:3 inactive_file:0 isolated_file:0
[  119.485810]  unevictable:485 dirty:0 writeback:0
[  119.485810]  slab_reclaimable:675 slab_unreclaimable:14673
[  119.485810]  mapped:302 shmem:48 pagetables:176 bounce:0
[  119.485810]  free:83655 free_pcp:9 free_cma:0
[  119.488667] Node 0 active_anon:832kB inactive_anon:3968kB active_file:192kB inactive_file:0kB unevictable:31040kB isolated(anon):192kB isolated(file):0kB mapped:19328kB dirty:0kB writeback:0kB shmem:3072kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB all_unreclaimable? no
[  119.490917] Node 0 DMA32 free:506624kB min:128448kB low:160512kB high:192576kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3145728kB managed:2581184kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[  119.493726] lowmem_reserve[]: 0 5932 5932
[  119.494205] Node 0 Normal free:4847296kB min:4848256kB low:6060288kB high:7272320kB active_anon:832kB inactive_anon:3968kB active_file:192kB inactive_file:0kB unevictable:31040kB writepending:0kB present:97386496kB managed:97258048kB mlocked:27968kB kernel_stack:41152kB pagetables:11264kB bounce:0kB free_pcp:576kB local_pcp:0kB free_cma:0kB
[  119.497439] lowmem_reserve[]: 0 0 0
[  119.497854] Node 0 DMA32: 4*64kB (UM) 2*128kB (M) 3*256kB (M) 3*512kB (M) 2*1024kB (M) 3*2048kB (M) 1*4096kB (U) 0*8192kB 2*16384kB (UM) 0*32768kB 1*65536kB (M) 1*131072kB (M) 1*262144kB (M) 0*524288kB = 506624kB
[  119.500070] Node 0 Normal: 210*64kB (ME) 245*128kB (ME) 199*256kB (ME) 152*512kB (M) 113*1024kB (M) 70*2048kB (UM) 27*4096kB (UM) 8*8192kB (M) 3*16384kB (ME) 0*32768kB 2*65536kB (UE) 3*131072kB (UME) 4*262144kB (UME) 5*524288kB (M) = 4852224kB
[  119.502633] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=16777216kB
[  119.503667] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=524288kB
[  119.504631] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[  119.505583] 417 total pagecache pages
[  119.505982] 1 pages in swap cache
[  119.506301] Swap cache stats: add 4251, delete 4250, find 31/70
[  119.506925] Free swap  = 8128128kB
[  119.507306] Total swap = 8388480kB
[  119.507670] 1570816 pages RAM
[  119.507999] 0 pages HighMem/MovableOnly
[  119.508446] 10828 pages reserved
[  119.508773] 0 pages hwpoisoned
[  119.509100] Unreclaimable slab info:
[  119.509503] Name                      Used          Total
[  119.510059] zfs_znode_cache           63KB         63KB
[  119.510607] nf_conntrack             892KB        892KB
[  119.511194] virtio-gpu-vbufs         255KB        255KB
[  119.511769] xfs_buf                 1149KB       1149KB
[  119.512358] xfs_ili                 1023KB       1023KB
[  119.512981] xfs_efi_item             763KB        763KB
[  119.513550] xfs_efd_item             763KB        763KB
[  119.514144] xfs_buf_item             956KB        956KB
[  119.514656] xf_trans                1022KB       1022KB
[  119.515230] xfs_ifork               1023KB       1023KB
[  119.515823] xfs_da_state            1020KB       1020KB
[  119.516422] xfs_btree_cur            958KB        958KB
[  119.517038] xfs_bmap_free_item        768KB        768KB
[  119.517558] xfs_log_ticket          1023KB       1023KB
[  119.518048] bio-4                    127KB        127KB
[  119.518596] bio-3                   1912KB       1912KB
[  119.519209] scsi_sense_cache        1152KB       1152KB
[  119.519785] virtio_scsi_cmd           63KB         63KB
[  119.520358] rpc_buffers               64KB         64KB
[  119.520962] rpc_tasks                 64KB         64KB
[  119.521559] fib6_nodes               256KB        256KB
[  119.522153] ip6_dst_cache            382KB        382KB
[  119.522723] RAWv6                    189KB        189KB
[  119.523284] UDPv6                    695KB        695KB
[  119.523812] TCPv6                    369KB        369KB
[  119.524357] sd_ext_cdb                64KB         64KB
[  119.524922] sgpool-128               256KB        256KB
[  119.525491] sgpool-64                 64KB         64KB
[  119.526039] sgpool-32                704KB        704KB
[  119.526544] sgpool-16                960KB        960KB
[  119.527455] sgpool-8                1024KB       1024KB
[  119.527923] mqueue_inode_cache         64KB         64KB
[  119.528529] kioctx                   383KB        383KB
[  119.529119] aio_kiocb                511KB        511KB
[  119.529687] dnotify_mark              63KB         63KB
[  119.530268] dnotify_struct            64KB         64KB
[  119.530854] bio-2                    512KB        512KB
[  119.531487] fasync_cache              63KB         63KB
[  119.532107] posix_timers_cache        191KB        191KB
[  119.532738] UNIX                    1008KB       1008KB
[  119.533355] tcp_bind_bucket          832KB        832KB
[  119.533972] inet_peer_cache          447KB        447KB
[  119.534582] ip_fib_trie              319KB        319KB
[  119.535221] ip_fib_alias             383KB        383KB
[  119.535830] ip_dst_cache            1024KB       1024KB
[  119.536444] RAW                      126KB        126KB
[  119.537061] UDP                      828KB        828KB
[  119.537686] tw_sock_TCP              255KB        255KB
[  119.538317] request_sock_TCP          63KB         63KB
[  119.538969] TCP                      823KB        823KB
[  119.539590] hugetlbfs_inode_cache        127KB        127KB
[  119.540238] bio-1                   1083KB       1083KB
[  119.540833] eventpoll_pwq           1023KB       1023KB
[  119.541431] eventpoll_epi           1024KB       1024KB
[  119.542057] inotify_inode_mark       1023KB       1023KB
[  119.542644] request_queue            880KB        880KB
[  119.543258] blkdev_ioc              1023KB       1023KB
[  119.543849] bio-0                   1600KB       1600KB
[  119.544478] biovec-max              4096KB       4096KB
[  119.545100] biovec-64                768KB        768KB
[  119.545719] biovec-16               1024KB       1024KB
[  119.546328] bio_integrity_payload         64KB         64KB
[  119.547008] uid_cache                575KB        575KB
[  119.547642] dmaengine-unmap-256         63KB         63KB
[  119.548258] dmaengine-unmap-128         63KB         63KB
[  119.548899] dmaengine-unmap-16         63KB         63KB
[  119.549542] dmaengine-unmap-2         64KB         64KB
[  119.550154] audit_buffer             959KB        959KB
[  119.550755] skbuff_fclone_cache       1017KB       1017KB
[  119.551386] skbuff_head_cache       1216KB       1216KB
[  119.552003] file_lock_cache         1022KB       1022KB
[  119.552585] file_lock_ctx            959KB        959KB
[  119.553122] fsnotify_mark_connector       1023KB       1023KB
[  119.553751] net_namespace            250KB        250KB
[  119.554277] task_delay_info         1023KB       1023KB
[  119.554797] taskstats                829KB        829KB
[  119.556227] proc_dir_entry           896KB        896KB
[  119.556836] pde_opener              1023KB       1023KB
[  119.557464] seq_file                1024KB       1024KB
[  119.558069] sigqueue                1023KB       1023KB
[  119.558691] shmem_inode_cache       2358KB       2358KB
[  119.559357] kernfs_iattrs_cache        703KB        703KB
[  119.559998] kernfs_node_cache       5110KB       5110KB
[  119.560612] mnt_cache                956KB        956KB
[  119.561230] filp                    8704KB       8704KB
[  119.561857] names_cache             2048KB       2048KB
[  119.562495] avc_node                 639KB        639KB
[  119.563173] selinux_file_security       1024KB       1024KB
[  119.563834] selinux_inode_security       1599KB       1599KB
[  119.564513] key_jar                 1024KB       1024KB
[  119.565078] nsproxy                  447KB        447KB
[  119.565643] vm_area_struct          6900KB       6900KB
[  119.566136] mm_struct               1024KB       1024KB
[  119.566634] fs_cache                1024KB       1024KB
[  119.567674] files_cache             1023KB       1023KB
[  119.568244] signal_cache            3209KB       3209KB
[  119.568842] sighand_cache           2429KB       2429KB
[  119.569442] task_struct            10748KB      11347KB
[  119.570055] cred_jar                2429KB       2429KB
[  119.570661] anon_vma_chain          2112KB       2112KB
[  119.571284] anon_vma                1534KB       1534KB
[  119.571892] pid                     1024KB       1024KB
[  119.572518] Acpi-Operand             511KB        511KB
[  119.573129] Acpi-ParseExt            127KB        127KB
[  119.573741] Acpi-Parse                63KB         63KB
[  119.574356] Acpi-State               191KB        191KB
[  119.575000] Acpi-Namespace            63KB         63KB
[  119.575607] numa_policy               63KB         63KB
[  119.576214] trace_event_file         447KB        447KB
[  119.576809] ftrace_event_field        447KB        447KB
[  119.577442] pool_workqueue           768KB        768KB
[  119.578063] task_group              1022KB       1022KB
[  119.578674] pgd_cache               4352KB       4352KB
[  119.579295] vmap_area              88768KB      88768KB
[  119.579908] kmalloc-128k            4608KB       4608KB
[  119.580528] kmalloc-64k             9216KB       9216KB
[  119.581134] kmalloc-32k             8192KB       8192KB
[  119.581742] kmalloc-16k            15808KB      19968KB
[  119.582366] kmalloc-8k              5120KB       5120KB
[  119.583032] kmalloc-4k             10180KB      11136KB
[  119.583651] kmalloc-2k              8576KB       8576KB
[  119.584294] kmalloc-1k             24280KB      24320KB
[  119.584876] kmalloc-512             3840KB       3840KB
[  119.585486] kmalloc-256             4352KB       4352KB
[  119.586068] kmalloc-128           365120KB     365120KB
[  119.586685] kmem_cache_node          704KB        704KB
[  119.587343] kmem_cache               704KB        704KB
[  119.587913] Tasks state (memory values in pages):
[  119.588502] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
[  119.589571] [   1307]     0  1307      528       55   393216       94             0 systemd-journal
[  119.590667] [   1338]     0  1338      803       60   393216      115         -1000 systemd-udevd
[  119.591776] [   1708]    32  1708      239       62   393216       59             0 rpcbind
[  119.592793] [   1710]     0  1710     1716       13   393216       54         -1000 auditd
[  119.593778] [   1712]     0  1712      163       24   393216       47             0 sedispatch
[  119.594807] [   1735]     0  1735      202       18   393216       51             0 smartd
[  119.595789] [   1739]     0  1739     4578       65   458752      154             0 sssd
[  119.596760] [   1742]   991  1742      222       23   393216       21             0 chronyd
[  119.597731] [   1743]     0  1743     1378       22   458752       51             0 irqbalance
[  119.598881] [   1751]   997  1751    30608       45   720896      193             0 polkitd
[  119.599902] [   1752]   996  1752      192       23   393216       14             0 lsmd
[  119.600864] [   1753]    81  1753      191       51   393216       56          -900 dbus-daemon
[  119.601868] [   1758]   990  1758     4927       25   458752       79             0 rngd
[  119.602816] [   1768]     0  1768     4596       56   458752      179             0 sssd_be
[  119.603789] [   1772]     0  1772     4793      215   458752      146             0 sssd_nss
[  119.604733] [   1792]     0  1792      500       56   393216       81             0 systemd-logind
[  119.605748] [   1844]     0  1844     7539       81   458752      163             0 NetworkManager
[  119.606804] [   1849]     0  1849      443      438   393216        0           -17 iscsid
[  119.607784] [   1851]     0  1851     7696       72   458752      358             0 tuned
[  119.608754] [   1856]     0  1856     4160       23   393216       57             0 gssproxy
[  119.609755] [   2201]     0  2201      323       73   393216       90             0 dhclient
[  119.610770] [   2300]   989  2300     1812        0   393216      139             0 agent
[  119.611743] [   2301]     0  2301     2625       57   458752      105             0 rsyslogd
[  119.612748] [   2313]   988  2313     1774        0   393216       58             0 updater
[  119.613742] [   2360]     0  2360      512       77   393216       67         -1000 sshd
[  119.614720] [   2488]   989  2488     1838        0   393216      150             0 gomon
[  119.615679] [   4474]     0  4474      333       29   393216       31             0 atd
[  119.616629] [   4475]     0  4475     3474       18   327680       11             0 agetty
[  119.617601] [   4478]     0  4478     3559       32   458752       39             0 crond
[  119.618566] [   4485]     0  4485     3481       14   393216       11             0 agetty
[  119.619544] [   5446]     0  5446      807       70   393216      112             0 sshd
[  119.620517] [   5449]  1000  5449      393       55   393216       96             0 systemd
[  119.621508] [   5450]  1000  5450     6501        0   458752      175             0 (sd-pam)
[  119.622426] [   5458]  1000  5458      810       38   393216      125             0 sshd
[  119.623431] [   5459]  1000  5459     3556       23   458752       38             0 bash
[  119.624385] [   5507]     0  5507     3982       41   458752      105             0 sudo
[  119.625360] [   5508]     0  5508     3561       26   458752       41             0 bash
[  119.626348] [  12804]     0 12804       20        2   327680       10             0 free
[  119.627355] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/tuned.service,task=tuned,pid=1851,uid=0
[  119.629140] Out of memory: Killed process 1851 (tuned) total-vm:492544kB, anon-rss:0kB, file-rss:4608kB, shmem-rss:0kB, UID:0 pgtables:448kB oom_score_adj:0
@omarkilani omarkilani added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels May 28, 2021
@rincebrain
Copy link
Contributor

rincebrain commented May 28, 2021

How large is the pool (used/total)?
How much RAM is the ARC configured to use, out of how much on the VM in total?
How fast is the storage? OOMing that fast implies either almost no RAM or reasonably quick storage (or both), assuming it's the ARC filling that's exerting all that memory pressure.
If you want to get the VM running again, the things I'd try would be:

  • simplest would be importing the pool on another system and issuing a zpool scrub -p or -s as desired.
  • I think you could reach into the VM and create an /etc/modprobe.d/zfs.conf with something like options zfs zfs_no_scrub_io=1 to make scrub not actually scrub (though if just the metadata crawl is enough to exhaust things, that won't save you)
  • You could also try something hackish like sticking while true; do zpool scrub -p [thatpool] && break; done; in /etc/rc.local, though be sure to remove it before rebooting, as zpool scrub -p [pool] on a pool without a scrub running is also an error. (I just put the while true because I'm not sure what guarantees there are about when rc.local gets run relative to the pool import; if you know better than I that it gets run strictly later, you can drop it.)

@omarkilani
Copy link
Author

Hmmm... so I tried this with 2.0.4, and a similar thing happens:

[root@instance-20210526-1929 ~]# zpool import tank
[root@instance-20210526-1929 ~]# zpool status
  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 15:01:04 2021
	21.5G scanned at 1.79G/s, 620K issued at 51.7K/s, 118G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0

errors: No known data errors
[root@instance-20210526-1929 ~]# zpool status
  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 15:01:04 2021
	26.3G scanned at 1.76G/s, 752K issued at 50.1K/s, 118G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0

errors: No known data errors
[root@instance-20210526-1929 ~]# zpool status
  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 15:01:04 2021
	29.2G scanned at 1.83G/s, 752K issued at 47K/s, 118G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0

errors: No known data errors
[root@instance-20210526-1929 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       56831       40247          27         419       32774
Swap:          8191           0        8191
[root@instance-20210526-1929 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       60724       36354          27         419       28882
Swap:          8191           0        8191
[root@instance-20210526-1929 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       65738       31340          27         420       23867
Swap:          8191           0        8191
[root@instance-20210526-1929 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       69242       27836          27         420       20364
Swap:          8191           0        8191
[root@instance-20210526-1929 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       73679       23399          27         420       15926
Swap:          8191           0        8191
[root@instance-20210526-1929 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       78099       18979          27         420       11507
Swap:          8191           0        8191
[root@instance-20210526-1929 ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       82966       14112          27         420        6639
Swap:          8191           0        8191
[root@instance-20210526-1929 ~]# free -mclient_loop: send disconnect: Broken pipe

@omarkilani
Copy link
Author

Hey @rincebrain,

I was able to just zpool import tank && zpool scrub -s tank and then I recreated the tank. All good.

We're just running some tests on aarch64 so it's fine if the VM dies or locks up.

So, the pool looks like:

[root@instance-20210526-1929 ~]# zpool list -v
NAME                                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
tank                                      5.81T  4.07G  5.81T        -         -     0%     0%  1.00x    ONLINE  -
  scsi-360f39ea51229408cb368509d91495fb9   496G   351M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-3603528d43ade4b31b70186f9a041601e   496G   347M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-36007099c456f4ec780fdc03b14976f19   496G   354M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-360d5b4cb98a44fabbcc67b1a55808124   496G   353M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-3603ff370fa044673a5c09353568c6757   496G   349M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-360ba05ab3eab4897bcf042fdfc3da1eb   496G   347M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-360087adf642b4f6586326dada6c8eb41   496G   341M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-3603a47cd86dd484bba1b05bab36c1257   496G   339M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-3600bf0330c6e4139829ad72c816b8c06   496G   343M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-3605635d6a27b4c189c0af523ddc262de   496G   345M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-36013bd4eeaab4b4a9e88beb0474a2439   496G   346M   496G        -         -     0%  0.06%      -  ONLINE  
  scsi-360f024a7d8b64521b7e7d671d9397ab5   496G   349M   496G        -         -     0%  0.06%      -  ONLINE  

It has an aggregate read bandwidth of 3.6Gb/s. The machine has 96Gb of RAM. I haven't done any tuning or modified any settings at all, just loaded the module.

I reran scrub on the newly created pool and it also ran out of memory, so I'll see what I can tune.

@omarkilani
Copy link
Author

omarkilani commented May 28, 2021

[ deleted a post because the issue wasn't solved, it just looked like it due to a smaller allocated size on the new pool ]

@rincebrain
Copy link
Contributor

I'm pretty sure you shouldn't actually close this - IMO "scrub triggers the OOM killer even immediately at boot unless you tweak zfs_arc_max" sounds like a bug even if it's got a mitigation, as it's supposed to default to 50% of total system RAM on Linux, which should be well clear of that threshold on a system with 96 GiB.

@omarkilani
Copy link
Author

@rincebrain yeah, I mean that's what I personally think. Maybe it's just the speed of the newer devices or maybe something isn't optimal on aarch64 or 🤷‍♂️.

I'll leave it open then.

@omarkilani omarkilani reopened this May 28, 2021
@rincebrain
Copy link
Contributor

I'm running this on Linux/aarch64 on my RPi4 with 8GiB of RAM and it's been up for a month, including a scrub (admittedly only one spinning disk, though, so if it's a race between pruning the ARC and filling it, I wouldn't be hitting it).

Is your VM being run on an aarch64 machine in turn, or some x86_64 or other arch? (I'm wondering about the feasibility of triggering this without having some beefy aarch64 hardware available, though I suppose at least one cloud provider will sell you aarch64 VMs...)

@omarkilani
Copy link
Author

@rincebrain it's running it on a Ampere Alta host machine, which you can test out for free:

https://www.servethehome.com/oracle-cloud-giving-away-ampere-arm-a1-instances-always-free/

The way to get the better quota (16 cores and 96GiB of RAM) is to sign up for the Arm Accelerator:

https://go.oracle.com/armaccelerator

Which is made for OSS developers etc.

Note that I'm running it with RHEL but Oracle Linux is equivalent and you can install the same kernel on there.

I just had it die with the zfs_arc_max set to 24GiB so let me paste that in a new reply.

@omarkilani
Copy link
Author

Alright, so, it seems like this only happens if the allocated size on the zpool > available RAM, even when zfs_arc_max is set low and nothing else is running on the machine.

[root@instance-20210526-1929 ~]# cat /sys/module/zfs/parameters/zfs_arc_max 
25769803776
[root@instance-20210526-1929 ~]# zpool scrub tank
zpool statu[root@instance-20210526-1929 ~]# zpool status 1
  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	4.49G scanned at 4.49G/s, 312K issued at 312K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	8.08G scanned at 4.04G/s, 312K issued at 156K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	11.2G scanned at 3.72G/s, 312K issued at 104K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	14.1G scanned at 3.53G/s, 312K issued at 78K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	16.8G scanned at 3.36G/s, 312K issued at 62.4K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	19.0G scanned at 3.16G/s, 336K issued at 56K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	21.5G scanned at 2.69G/s, 336K issued at 42K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	24.3G scanned at 2.70G/s, 336K issued at 37.3K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	27.2G scanned at 2.72G/s, 336K issued at 33.6K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	30.0G scanned at 2.73G/s, 336K issued at 30.5K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	33.5G scanned at 2.79G/s, 336K issued at 28K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	36.9G scanned at 2.84G/s, 336K issued at 25.8K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	41.2G scanned at 2.94G/s, 336K issued at 24K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	45.1G scanned at 3.01G/s, 336K issued at 22.4K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	49.1G scanned at 3.07G/s, 336K issued at 21K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	52.6G scanned at 3.09G/s, 348K issued at 20.5K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	56.0G scanned at 3.11G/s, 348K issued at 19.3K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	59.8G scanned at 3.15G/s, 348K issued at 18.3K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	63.8G scanned at 3.19G/s, 348K issued at 17.4K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	67.3G scanned at 3.20G/s, 348K issued at 16.6K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	70.9G scanned at 3.22G/s, 372K issued at 16.9K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:12:47 2021
	74.8G scanned at 3.25G/s, 372K issued at 16.2K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       85643       11618          14         236        4062
Swap:          8191         209        7982
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       89486        7772          14         240         218
Swap:          8191         209        7982
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       88614        8666          14         217        1100
Swap:          8191         225        7966
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       86123       11161          14         214        3594
Swap:          8191         225        7966
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       85272       12012          14         214        4445
Swap:          8191         225        7966
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       89053        8232          14         214         664
Swap:          8191         225        7966
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       88280        9019          14         199        1447
Swap:          8191         237        7954
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       87278       10014          14         205        2443
Swap:          8191         237        7954
root@instance-20210526-1929 ~/fio # free -m
              total        used        free      shared  buff/cache   available
Mem:          97499       90669        6623          14         206         637
Swap:          8191         237        7954
root@instance-20210526-1929 ~/fio # client_loop: send disconnect: Broken pipe

It looked like it was attempting to keep the memory usage under control at least but I think it's just too fast, or something.

What I did was use fio to create a file big enough to trigger the issue:

[root@instance-20210526-1929 fio]# cat seqread_64.fio 
[global]
bs=64K
iodepth=64
direct=1
ioengine=libaio
group_reporting
time_based
runtime=120
numjobs=4
name=raw-read
rw=read
							
[job1]
filename=/tank/db/f.fio
size=128G

And the zpool/zfs parameters:

[root@instance-20210526-1929 ~]# zpool get all
NAME  PROPERTY                       VALUE                          SOURCE
tank  size                           5.81T                          -
tank  capacity                       2%                             -
tank  altroot                        -                              default
tank  health                         ONLINE                         -
tank  guid                           7232894876644065540            -
tank  version                        -                              default
tank  bootfs                         -                              default
tank  delegation                     on                             default
tank  autoreplace                    off                            default
tank  cachefile                      -                              default
tank  failmode                       wait                           default
tank  listsnapshots                  off                            default
tank  autoexpand                     off                            default
tank  dedupratio                     1.00x                          -
tank  free                           5.68T                          -
tank  allocated                      136G                           -
tank  readonly                       off                            -
tank  ashift                         12                             local
tank  comment                        -                              default
tank  expandsize                     -                              -
tank  freeing                        0                              -
tank  fragmentation                  0%                             -
tank  leaked                         0                              -
tank  multihost                      off                            default
tank  checkpoint                     -                              -
tank  load_guid                      12888677109862668849           -
tank  autotrim                       off                            default
tank  feature@async_destroy          enabled                        local
tank  feature@empty_bpobj            active                         local
tank  feature@lz4_compress           active                         local
tank  feature@multi_vdev_crash_dump  enabled                        local
tank  feature@spacemap_histogram     active                         local
tank  feature@enabled_txg            active                         local
tank  feature@hole_birth             active                         local
tank  feature@extensible_dataset     active                         local
tank  feature@embedded_data          active                         local
tank  feature@bookmarks              enabled                        local
tank  feature@filesystem_limits      enabled                        local
tank  feature@large_blocks           enabled                        local
tank  feature@large_dnode            enabled                        local
tank  feature@sha512                 enabled                        local
tank  feature@skein                  enabled                        local
tank  feature@edonr                  enabled                        local
tank  feature@userobj_accounting     active                         local
tank  feature@encryption             enabled                        local
tank  feature@project_quota          active                         local
tank  feature@device_removal         enabled                        local
tank  feature@obsolete_counts        enabled                        local
tank  feature@zpool_checkpoint       enabled                        local
tank  feature@spacemap_v2            active                         local
tank  feature@allocation_classes     enabled                        local
tank  feature@resilver_defer         enabled                        local
tank  feature@bookmark_v2            enabled                        local
tank  feature@redaction_bookmarks    enabled                        local
tank  feature@redacted_datasets      enabled                        local
tank  feature@bookmark_written       enabled                        local
tank  feature@log_spacemap           active                         local
tank  feature@livelist               enabled                        local
tank  feature@device_rebuild         enabled                        local
tank  feature@zstd_compress          enabled                        local
[root@instance-20210526-1929 ~]# zfs get all
NAME     PROPERTY              VALUE                  SOURCE
tank     type                  filesystem             -
tank     creation              Fri May 28 16:07 2021  -
tank     used                  136G                   -
tank     available             5.50T                  -
tank     referenced            96K                    -
tank     compressratio         1.03x                  -
tank     mounted               yes                    -
tank     quota                 none                   default
tank     reservation           none                   default
tank     recordsize            128K                   default
tank     mountpoint            /tank                  default
tank     sharenfs              off                    default
tank     checksum              on                     default
tank     compression           off                    default
tank     atime                 on                     default
tank     devices               on                     default
tank     exec                  on                     default
tank     setuid                on                     default
tank     readonly              off                    default
tank     zoned                 off                    default
tank     snapdir               hidden                 default
tank     aclmode               discard                default
tank     aclinherit            restricted             default
tank     createtxg             1                      -
tank     canmount              on                     default
tank     xattr                 on                     default
tank     copies                1                      default
tank     version               5                      -
tank     utf8only              off                    -
tank     normalization         none                   -
tank     casesensitivity       sensitive              -
tank     vscan                 off                    default
tank     nbmand                off                    default
tank     sharesmb              off                    default
tank     refquota              none                   default
tank     refreservation        none                   default
tank     guid                  9019197745549167109    -
tank     primarycache          all                    default
tank     secondarycache        all                    default
tank     usedbysnapshots       0B                     -
tank     usedbydataset         96K                    -
tank     usedbychildren        136G                   -
tank     usedbyrefreservation  0B                     -
tank     logbias               latency                default
tank     objsetid              54                     -
tank     dedup                 off                    default
tank     mlslabel              none                   default
tank     sync                  standard               default
tank     dnodesize             legacy                 default
tank     refcompressratio      1.00x                  -
tank     written               96K                    -
tank     logicalused           140G                   -
tank     logicalreferenced     42K                    -
tank     volmode               default                default
tank     filesystem_limit      none                   default
tank     snapshot_limit        none                   default
tank     filesystem_count      none                   default
tank     snapshot_count        none                   default
tank     snapdev               hidden                 default
tank     acltype               off                    default
tank     context               none                   default
tank     fscontext             none                   default
tank     defcontext            none                   default
tank     rootcontext           none                   default
tank     relatime              off                    default
tank     redundant_metadata    all                    default
tank     overlay               on                     default
tank     encryption            off                    default
tank     keylocation           none                   default
tank     keyformat             none                   default
tank     pbkdf2iters           0                      default
tank     special_small_blocks  0                      default
tank/db  type                  filesystem             -
tank/db  creation              Fri May 28 16:08 2021  -
tank/db  used                  136G                   -
tank/db  available             5.50T                  -
tank/db  referenced            136G                   -
tank/db  compressratio         1.03x                  -
tank/db  mounted               yes                    -
tank/db  quota                 none                   default
tank/db  reservation           none                   default
tank/db  recordsize            8K                     local
tank/db  mountpoint            /tank/db               default
tank/db  sharenfs              off                    default
tank/db  checksum              on                     default
tank/db  compression           lz4                    local
tank/db  atime                 off                    local
tank/db  devices               on                     default
tank/db  exec                  on                     default
tank/db  setuid                on                     default
tank/db  readonly              off                    default
tank/db  zoned                 off                    default
tank/db  snapdir               hidden                 default
tank/db  aclmode               discard                default
tank/db  aclinherit            restricted             default
tank/db  createtxg             19                     -
tank/db  canmount              on                     default
tank/db  xattr                 sa                     local
tank/db  copies                1                      default
tank/db  version               5                      -
tank/db  utf8only              off                    -
tank/db  normalization         none                   -
tank/db  casesensitivity       sensitive              -
tank/db  vscan                 off                    default
tank/db  nbmand                off                    default
tank/db  sharesmb              off                    default
tank/db  refquota              none                   default
tank/db  refreservation        none                   default
tank/db  guid                  3231454792195716646    -
tank/db  primarycache          all                    default
tank/db  secondarycache        all                    default
tank/db  usedbysnapshots       0B                     -
tank/db  usedbydataset         136G                   -
tank/db  usedbychildren        0B                     -
tank/db  usedbyrefreservation  0B                     -
tank/db  logbias               throughput             local
tank/db  objsetid              899                    -
tank/db  dedup                 off                    default
tank/db  mlslabel              none                   default
tank/db  sync                  standard               default
tank/db  dnodesize             legacy                 default
tank/db  refcompressratio      1.03x                  -
tank/db  written               136G                   -
tank/db  logicalused           140G                   -
tank/db  logicalreferenced     140G                   -
tank/db  volmode               default                default
tank/db  filesystem_limit      none                   default
tank/db  snapshot_limit        none                   default
tank/db  filesystem_count      none                   default
tank/db  snapshot_count        none                   default
tank/db  snapdev               hidden                 default
tank/db  acltype               off                    default
tank/db  context               none                   default
tank/db  fscontext             none                   default
tank/db  defcontext            none                   default
tank/db  rootcontext           none                   default
tank/db  relatime              off                    default
tank/db  redundant_metadata    all                    default
tank/db  overlay               on                     default
tank/db  encryption            off                    default
tank/db  keylocation           none                   default
tank/db  keyformat             none                   default
tank/db  pbkdf2iters           0                      default
tank/db  special_small_blocks  0                      default

@omarkilani
Copy link
Author

I tried setting zfs_no_scrub_prefetch to 1 but it just slowed down the scrub to 2.28Gb/s with the same issue.

The thing is... the 'used' output of 'free -m' matches the progress output of zpool status. So just before it dies:

	85.5G scanned at 2.25G/s, 320K issued at 8.42K/s, 136G total

And the last 3 calls to free on a while 1 / free -m / sleep 1 loop:

              total        used        free      shared  buff/cache   available
Mem:          97499       85499       11580          27         420        4108
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          97499       87691        9387          27         420        1915
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          97499       90111        6967          27         420        1080
Swap:          8191           0        8191
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.

@behlendorf
Copy link
Contributor

Could you check the contents of /proc/spl/kmem/slab it should show how the memory is being used. It sounds like for some reason the scrub scan logic is not respecting its memory limits.

@omarkilani
Copy link
Author

@behlendorf yup.

Okay, so...

zpool import tank && zpool status 1

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 18:24:44 2021
	83.3G scanned at 3.33G/s, 156K issued at 6.24K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time

...

errors: No known data errors
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.

And I wrote a little script that does a diff -u on the output on kmem/slab after a second:

--- slab.last.txt	2021-05-28 18:40:29.020378409 +0000
+++ slab.new.txt	2021-05-28 18:40:30.030382972 +0000
@@ -341,8 +341,8 @@
 dmu_buf_impl_t                        0x00080    587520    519168     3456      384    170   169   169   1360  1352  1352      0     0     0
 zil_lwb_cache                         0x00080         0         0     3392      376      0     0     0      0     0     0      0     0     0
 zil_zcw_cache                         0x00080         0         0     1600      152      0     0     0      0     0     0      0     0     0
-sio_cache_0                           0x08080 1647101760 1217423040     1472      136  1118955 1118955 1118955  8951640 8951640 8951640      0     0     0
-sio_cache_1                           0x00080   3558400   2703168     1600      152   2224  2223  2223  17792 17784 17784      0     0     0
+sio_cache_0                           0x08080 1733533184 1281307136     1472      136  1177672 1177672 1177672  9421376 9421376 9421376      0     0     0
+sio_cache_1                           0x00080   3742400   2843008     1600      152   2339  2338  2338  18712 18704 18704      0     0     0
 sio_cache_2                           0x00080    307584    237888     1728      168    178   177   177   1424  1416  1416      0     0     0
 zfs_znode_cache                       0x00100         -      5720        -     1144      -     -     -      -     5     -      -     -     -
 zfs_znode_hold_cache                  0x00080      2176       704     1088       88      2     1     1     16     8     8      0     0     0
--- slab.last.txt	2021-05-28 18:40:30.030382972 +0000
+++ slab.new.txt	2021-05-28 18:40:31.030387489 +0000
@@ -341,8 +341,8 @@
 dmu_buf_impl_t                        0x00080    587520    519168     3456      384    170   169   169   1360  1352  1352      0     0     0
 zil_lwb_cache                         0x00080         0         0     3392      376      0     0     0      0     0     0      0     0     0
 zil_zcw_cache                         0x00080         0         0     1600      152      0     0     0      0     0     0      0     0     0
-sio_cache_0                           0x08080 1733698048 1281428992     1472      136  1177784 1177784 1177784  9422272 9422272 9422272      0     0     0
-sio_cache_1                           0x00080   3742400   2843008     1600      152   2339  2338  2338  18712 18704 18704      0     0     0
+sio_cache_0                           0x08080 1819677568 1344979072     1472      136  1236194 1236194 1236194  9889552 9889552 9889552      0     0     0
+sio_cache_1                           0x00080   3924800   2981632     1600      152   2453  2452  2452  19624 19616 19616      0     0     0
 sio_cache_2                           0x00080    307584    237888     1728      168    178   177   177   1424  1416  1416      0     0     0
 zfs_znode_cache                       0x00100         -      5720        -     1144      -     -     -      -     5     -      -     -     -
 zfs_znode_hold_cache                  0x00080      2176       704     1088       88      2     1     1     16     8     8      0     0     0
--- slab.last.txt	2021-05-28 18:40:31.030387489 +0000
+++ slab.new.txt	2021-05-28 18:40:32.030392007 +0000
@@ -5,11 +5,11 @@
 kcf_sreq_cache                        0x00080         0         0     2112      160      0     0     0      0     0     0      0     0     0
 kcf_areq_cache                        0x00080         0         0     4672      464      0     0     0      0     0     0      0     0     0
 kcf_context_cache                     0x00080         0         0     2112      152      0     0     0      0     0     0      0     0     0
-zfs_btree_leaf_cache                  0x00080   4939648   4849664    33152     4096    149   148   148   1192  1184  1184      0     0     0
+zfs_btree_leaf_cache                  0x00080   5039104   4947968    33152     4096    152   151   151   1216  1208  1208      0     0     0
 ddt_cache                             0x00080    996160    795392   199232    24856      5     4     4     40    32    32      0     0     0
 ddt_entry_cache                       0x00080         0         0     3968      448      0     0     0      0     0     0      0     0     0
-zio_cache                             0x00080   7319936   3000320    10624     1280    689   689   757   5512  2344  6056      0     0     0
-zio_link_cache                        0x00080    521472    112128      768       48    679   679   772   5432  2336  6176      0     0     0
+zio_cache                             0x00080   7171200   2836480    10624     1280    675   675   757   5400  2216  6056      0     0     0
+zio_link_cache                        0x00080    513024    105984      768       48    668   668   772   5344  2208  6176      0     0     0
 zio_buf_512                           0x00082    147968     65536     8704      512     17    16    16    136   128   128      0     0     0
 zio_data_buf_512                      0x00082         0         0     8704      512      0     0     0      0     0     0      0     0     0
 zio_buf_1024                          0x00082         0         0    12800     1024      0     0     0      0     0     0      0     0     0
@@ -331,18 +331,18 @@
 zio_buf_16777216                      0x00082         0         0 16908288 16777216      0     0     0      0     0     0      0     0     0
 zio_data_buf_16777216                 0x00082         0         0 16908288 16777216      0     0     0      0     0     0      0     0     0
 lz4_cache                             0x00080   2234752   2097152   131456    16384     17    16    16    136   128   128      0     0     0
-abd_t                                 0x00080   3466176   2474880     1344      120   2579  2578  2578  20632 20624 20624      0     0     0
+abd_t                                 0x00080   3468864   2476800     1344      120   2581  2580  2580  20648 20640 20640      0     0     0
 sa_cache                              0x00080      5248      2240     2624      280      2     1     1     16     8     8      0     0     0
 dnode_t                               0x00080   1089792   1031232     8256      984    132   131   131   1056  1048  1048      0     0     0
 arc_buf_hdr_t_full                    0x00080   7577152   6607232     3008      328   2519  2518  2518  20152 20144 20144      0     0     0
 arc_buf_hdr_t_full_crypt              0x00080         0         0     3520      392      0     0     0      0     0     0      0     0     0
 arc_buf_hdr_t_l2only                  0x00080         0         0     1152       96      0     0     0      0     0     0      0     0     0
 arc_buf_t                             0x00080     63488     39040     1024       80     62    61    61    496   488   488      0     0     0
-dmu_buf_impl_t                        0x00080    587520    519168     3456      384    170   169   169   1360  1352  1352      0     0     0
+dmu_buf_impl_t                        0x00080    590976    522240     3456      384    171   170   170   1368  1360  1360      0     0     0
 zil_lwb_cache                         0x00080         0         0     3392      376      0     0     0      0     0     0      0     0     0
 zil_zcw_cache                         0x00080         0         0     1600      152      0     0     0      0     0     0      0     0     0
-sio_cache_0                           0x00080 1819843904 1345100928     1472      136  1236307 1236306 1236306  9890456 9890448 9890448      0     0     0
-sio_cache_1                           0x00080   3926400   2982848     1600      152   2454  2453  2453  19632 19624 19624      0     0     0
+sio_cache_0                           0x00080 1903981952 1407289920     1472      136  1293466 1293465 1293465  10347728 10347720 10347720      0     0     0
+sio_cache_1                           0x00080   4105600   3119040     1600      152   2566  2565  2565  20528 20520 20520      0     0     0
 sio_cache_2                           0x00080    307584    237888     1728      168    178   177   177   1424  1416  1416      0     0     0
 zfs_znode_cache                       0x00100         -      5720        -     1144      -     -     -      -     5     -      -     -     -
 zfs_znode_hold_cache                  0x00080      2176       704     1088       88      2     1     1     16     8     8      0     0     0
--- slab.last.txt	2021-05-28 18:40:32.040392052 +0000
+++ slab.new.txt	2021-05-28 18:40:33.040396570 +0000
@@ -341,8 +341,8 @@
 dmu_buf_impl_t                        0x00080    590976    522240     3456      384    171   170   170   1368  1360  1360      0     0     0
 zil_lwb_cache                         0x00080         0         0     3392      376      0     0     0      0     0     0      0     0     0
 zil_zcw_cache                         0x00080         0         0     1600      152      0     0     0      0     0     0      0     0     0
-sio_cache_0                           0x00080 1904341120 1407555392     1472      136  1293710 1293709 1293709  10349680 10349672 10349672      0     0     0
-sio_cache_1                           0x00080   4105600   3119040     1600      152   2566  2565  2565  20528 20520 20520      0     0     0
+sio_cache_0                           0x00080 1988549824 1469796608     1472      136  1350917 1350916 1350916  10807336 10807328 10807328      0     0     0
+sio_cache_1                           0x00080   4284800   3255232     1600      152   2678  2677  2677  21424 21416 21416      0     0     0
 sio_cache_2                           0x00080    307584    237888     1728      168    178   177   177   1424  1416  1416      0     0     0
 zfs_znode_cache                       0x00100         -      5720        -     1144      -     -     -      -     5     -      -     -     -
 zfs_znode_hold_cache                  0x00080      2176       704     1088       88      2     1     1     16     8     8      0     0     0
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.

@omarkilani
Copy link
Author

I'm running the same test on an AWS r6g.4xlarge instance with 12x500Gb gp2 EBS volumes just to make sure it's not some weird Ampere Alta thing. I'm pretty sure they're both Neoverse N1 based:

OCI A1 instance

processor	: 15
BogoMIPS	: 50.00
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0xd0c
CPU revision	: 1
AWS r6g.4xlarge

processor	: 15
BogoMIPS	: 243.75
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0xd0c
CPU revision	: 1

Then I'll try on x86_64.

@behlendorf
Copy link
Contributor

Well, that seems about right. The sio_cache* members are the cache which is used when the scrub scanning and they're limited to 5% of system memory. So 1.9G isn't unreasonable for a 96G system. I didn't see any other very large caches, so it's a bit unclear what's using the memory exactly. Trying x86_64 would be a good sanity check, since this certainly is a pretty common case. One other thing to check would be the systems page size, there may still be some issues lurking with non-4K page systems.

@omarkilani
Copy link
Author

Before I move to x86_64, testing on the AWS Graviton 2 shows the same issue:

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 20:25:16 2021
	103G scanned at 3.82G/s, 124K issued at 4.61K/s, 129G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME        STATE     READ WRITE CKSUM
	tank        ONLINE       0     0     0
	  nvme1n1   ONLINE       0     0     0
	  nvme2n1   ONLINE       0     0     0
	  nvme3n1   ONLINE       0     0     0
	  nvme4n1   ONLINE       0     0     0
	  nvme5n1   ONLINE       0     0     0
	  nvme6n1   ONLINE       0     0     0
	  nvme7n1   ONLINE       0     0     0
	  nvme8n1   ONLINE       0     0     0
	  nvme9n1   ONLINE       0     0     0
	  nvme10n1  ONLINE       0     0     0
	  nvme11n1  ONLINE       0     0     0
	  nvme12n1  ONLINE       0     0     0

errors: No known data errorsConnection to 54.226.181.X closed by remote host.
Connection to 54.226.181.X closed.
--- slab.last.txt       2021-05-28 20:25:40.748680196 +0000
+++ slab.new.txt        2021-05-28 20:25:41.758681597 +0000
@@ -182,8 +182,8 @@
 dmu_buf_impl_t                        0x00080  44634240   7197696     3456      384  12915 12915 67012  103320 18744 536096      0     0     0
 zil_lwb_cache                         0x00080      6784      3008     3392      376      2     1     1     16     8     8      0     0     0
 zil_zcw_cache                         0x00080      3200      1216     1600      152      2     1     1     16     8     8      0     0     0
-sio_cache_0                           0x00080 2277653568 1683481984     1472      136  1547319 1547318 1547318  12378552 12378544 12378544      0     0     0
-sio_cache_1                           0x00080   4859200   3691776     1600      152   3037  3036  3036  24296 24288 24288      0     0     0
+sio_cache_0                           0x08080 2375485632 1755793728     1472      136  1613781 1613781 1613781  12910248 12910248 12910248      0     0     0
+sio_cache_1                           0x00080   5067200   3849856     1600      152   3167  3166  3166  25336 25328 25328      0     0     0
 sio_cache_2                           0x00080    119232     91392     1728      168     69    68    68    552   544   544      0     0     0
 zfs_znode_cache                       0x00100         -      6864        -     1144      -     -     -      -     6     -      -     -     -
 zfs_znode_hold_cache                  0x00080      5440      2816     1088       88      5     4     4     40    32    32      0     0     0
Connection to 54.226.181.X closed by remote host.
Connection to 54.226.181.X closed.

This instance type has 128Gb of RAM instead of the 96 on OCI, but it runs out of memory the same way.

All I did on this new instance was boot it up, install zfs, create the pool and fs, run the fio script, then run scrub, using the official RH8.4 AMI:

RHEL-8.4.0_HVM-20210504-arm64-2-Hourly2-GP2

Happy to provide you guys with an Oracle or AWS arm64 instance to play around with if you'd like.

You can create a new ssh key pair and send me the pub key and I can set it up.

@omarkilani
Copy link
Author

Alright, so on a r5.4xlarge instance with 16 of these:

model name	: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz

And 128Gb of RAM... the scrub completes successfully and zfs never uses more than 3.5Gb of RAM. I even imported the pool from the arm64 instance just to test with the exact same on disk data.

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 00:43:59 with 0 errors on Fri May 28 21:09:15 2021
config:

	NAME                                                    STATE     READ WRITE CKSUM
	tank                                                    ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol06c3bb1b1fcec5212  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0044be91588faf04d  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0c352803eca664f2d  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol04c4af5c8f2e08693  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol03fa95fc4af36924f  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol01efe0ee629742e4d  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol06c8eee0acee3193f  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0c98855ea8d5de600  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol02f4eaa6644236712  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0dfd26938bc9e4e9c  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0dcfff35a07a4735f  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0377673811f99d751  ONLINE       0     0     0

errors: No known data errors
[root@ip-172-30-0-87 ~]# while [ 1 ]; do free -m; sleep 1; done
              total        used        free      shared  buff/cache   available
Mem:         127189        3545      121378          16        2265      122544
Swap:             0           0           0

...

              total        used        free      shared  buff/cache   available
Mem:         127189        2912      122011          16        2265      123176
Swap:             0           0           0

...

              total        used        free      shared  buff/cache   available
Mem:         127189        1512      123411          16        2265      124576
Swap:             0           0           0

So... that's fun. :)

@omarkilani
Copy link
Author

I'll test with rc6 on arm64 just in case of magic.

@omarkilani omarkilani changed the title scrub uses all machine memory and locks up the machine [arm64] scrub uses all machine memory and locks up the machine May 28, 2021
@omarkilani
Copy link
Author

The other potential culprit is building from the ./configure script with no CFLAGS vs rebuilding the RPMs with the normal system optimisations. I'll play around with that as well.

@omarkilani
Copy link
Author

Okay, so... this is with 2.1.0-rc6. On a new r6g.4xlarge instance with 16 Graviton 2 cores and 128Gb of RAM. It died.

I installed all the deps:

dnf install vim-enhanced tmux wget fio gcc make kernel-devel libuuid-devel libattr-devel libaio-devel openssl-devel elfutils-libelf-devel libudev-devel libblkid-devel libtirpc-devel zlib-devel pam-devel

Ran configure with no flags:

mkdir src && cd src
wget https://github.com/openzfs/zfs/releases/download/zfs-2.1.0-rc6/zfs-2.1.0-rc6.tar.gz
tar xf zfs-2.1.0-rc6.tar.gz
cd zfs-2.1.0
./configure

Ran make, and in another terminal ran ps to check which flags were getting passed to gcc:

root      165449  0.0  0.0   5824  1600 pts/0    S+   21:32   0:00 gcc -Wp,-MD,/root/src/zfs-2.1.0/module/icp/api/.kcf_miscapi.o.d -nostdinc -isystem /usr/lib/gcc/aarch64-redhat-linux/8/include -I./arch/arm64/include -I./arch/arm64/include/generated -I./include/drm-backport -I./include -I./arch/arm64/include/uapi -I./arch/arm64/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -mlittle-endian -DKASAN_SHADOW_SCALE_SHIFT=3 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -DCC_HAVE_ASM_GOTO -mgeneral-regs-only -DCONFIG_AS_LSE=1 -fno-asynchronous-unwind-tables -mabi=lp64 -fno-dwarf2-cfi-asm -DKASAN_SHADOW_SCALE_SHIFT=3 -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -Wno-int-in-bool-context -O2 --param=allow-store-data-races=0 -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-but-set-variable -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -gdwarf-4 -pg -fno-inline-functions-called-once -Wdeclaration-after-statement -Wno-pointer-sign -Wno-stringop-truncation -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -fmacro-prefix-map=./= -Wno-packed-not-aligned -std=gnu99 -Wno-declaration-after-statement -Wmissing-prototypes -Wno-format-zero-length -include /root/src/zfs-2.1.0/zfs_config.h -I/root/src/zfs-2.1.0/include -I/root/src/zfs-2.1.0/include/os/linux/kernel -I/root/src/zfs-2.1.0/include/os/linux/spl -I/root/src/zfs-2.1.0/include/os/linux/zfs -I/root/src/zfs-2.1.0/include -D_KERNEL -UDEBUG -DNDEBUG -I/root/src/zfs-2.1.0/module/icp/include -DMODULE -DKBUILD_BASENAME="kcf_miscapi" -DKBUILD_MODNAME="icp" -c -o /root/src/zfs-2.1.0/module/icp/api/.tmp_kcf_miscapi.o /root/src/zfs-2.1.0/module/icp/api/kcf_miscapi.c
root      165450  0.0  0.0  67136 41024 pts/0    R+   21:32   0:00 /usr/libexec/gcc/aarch64-redhat-linux/8/cc1 -quiet -nostdinc -I ./arch/arm64/include -I ./arch/arm64/include/generated -I ./include/drm-backport -I ./include -I ./arch/arm64/include/uapi -I ./arch/arm64/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi -I /root/src/zfs-2.1.0/include -I /root/src/zfs-2.1.0/include/os/linux/kernel -I /root/src/zfs-2.1.0/include/os/linux/spl -I /root/src/zfs-2.1.0/include/os/linux/zfs -I /root/src/zfs-2.1.0/include -I /root/src/zfs-2.1.0/module/icp/include -D __KERNEL__ -D KASAN_SHADOW_SCALE_SHIFT=3 -D CC_HAVE_ASM_GOTO -D CONFIG_AS_LSE=1 -D KASAN_SHADOW_SCALE_SHIFT=3 -D _KERNEL -U DEBUG -D NDEBUG -D MODULE -D KBUILD_BASENAME="kcf_miscapi" -D KBUILD_MODNAME="icp" -isystem /usr/lib/gcc/aarch64-redhat-linux/8/include -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -include /root/src/zfs-2.1.0/zfs_config.h -MD /root/src/zfs-2.1.0/module/icp/api/.kcf_miscapi.o.d /root/src/zfs-2.1.0/module/icp/api/kcf_miscapi.c -quiet -dumpbase kcf_miscapi.c -mlittle-endian -mgeneral-regs-only -mabi=lp64 -auxbase-strip /root/src/zfs-2.1.0/module/icp/api/.tmp_kcf_miscapi.o -g -gdwarf-4 -O2 -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -Werror=implicit-function-declaration -Wno-format-security -Wno-frame-address -Wformat-truncation=0 -Wformat-overflow=0 -Wno-int-in-bool-context -Wframe-larger-than=2048 -Wno-unused-but-set-variable -Wunused-const-variable=0 -Wno-pointer-sign -Wno-stringop-truncation -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wno-packed-not-aligned -Wno-declaration-after-statement -Wmissing-prototypes -Wno-format-zero-length -std=gnu90 -std=gnu99 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -fno-asynchronous-unwind-tables -fno-dwarf2-cfi-asm -fno-delete-null-pointer-checks -fstack-protector-strong -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-inline-functions-called-once -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fmacro-prefix-map=./= --param allow-store-data-races=0 -o /tmp/ccJSdz4H.s

It's just using the same flags the kernel was built with by RH:

[root@ip-172-30-0-248 zfs-2.1.0]# cd /usr/src/kernels/4.18.0-305.el8.aarch64/
[root@ip-172-30-0-248 4.18.0-305.el8.aarch64]# fgrep -r lp64 .
./arch/arm64/Makefile:KBUILD_CFLAGS	+= $(call cc-option,-mabi=lp64)
./arch/arm64/Makefile:KBUILD_AFLAGS	+= $(call cc-option,-mabi=lp64)
./arch/riscv/Makefile:	KBUILD_CFLAGS += -mabi=lp64
./arch/riscv/Makefile:	KBUILD_AFLAGS += -mabi=lp64
./scripts/mod/devicetable-offsets.s:// -mabi=lp64 -auxbase-strip scripts/mod/devicetable-offsets.s -g -gdwarf-4
./scripts/mod/devicetable-offsets.s:	.ascii	"eneral-regs-only -mabi=lp64 -g -gdwarf-4 -O2 -std=gnu90 -p -"

This time, instead of watching the slab info, I watched /proc/meminfo.

I ran zpool scrub tank && zpool status 1:

  pool: tank
 state: ONLINE
  scan: scrub in progress since Fri May 28 21:37:44 2021
	114G scanned at 3.91G/s, 118K issued at 4.09K/s, 129G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                                    STATE     READ WRITE CKSUM
	tank                                                    ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol06c3bb1b1fcec5212  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0044be91588faf04d  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0c352803eca664f2d  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol04c4af5c8f2e08693  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol03fa95fc4af36924f  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol01efe0ee629742e4d  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol06c8eee0acee3193f  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0c98855ea8d5de600  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol02f4eaa6644236712  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0dfd26938bc9e4e9c  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0dcfff35a07a4735f  ONLINE       0     0     0
	  nvme-Amazon_Elastic_Block_Store_vol0377673811f99d751  ONLINE       0     0     0

errors: No known data errors

Connection to 54.91.207.X closed by remote host.
Connection to 54.91.207.X closed.

And watching meminfo:

--- meminfo.last.txt	2021-05-28 21:38:03.886107192 +0000
+++ meminfo.new.txt	2021-05-28 21:38:04.886098667 +0000
@@ -1,28 +1,28 @@
 MemTotal:       132164992 kB
-MemFree:        45023424 kB
-MemAvailable:   36895104 kB
+MemFree:        41001216 kB
+MemAvailable:   32872896 kB
 Buffers:            8384 kB
 Cached:          4217728 kB
 SwapCached:            0 kB
 Active:          2140416 kB
-Inactive:        2277952 kB
+Inactive:        2278336 kB
 Active(anon):       8640 kB
-Inactive(anon):   210048 kB
+Inactive(anon):   210432 kB
 Active(file):    2131776 kB
 Inactive(file):  2067904 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:             0 kB
 SwapFree:              0 kB
-Dirty:               512 kB
+Dirty:                 0 kB
 Writeback:             0 kB
-AnonPages:        193088 kB
+AnonPages:        193152 kB
 Mapped:            88512 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1122240 kB
+Slab:            1138112 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1005312 kB
+SUnreclaim:      1021184 kB
 KernelStack:       29248 kB
 PageTables:         9408 kB
 NFS_Unstable:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:04.886098667 +0000
+++ meminfo.new.txt	2021-05-28 21:38:05.906089973 +0000
@@ -1,13 +1,13 @@
 MemTotal:       132164992 kB
-MemFree:        40994112 kB
-MemAvailable:   32865792 kB
+MemFree:        36908672 kB
+MemAvailable:   28780352 kB
 Buffers:            8384 kB
 Cached:          4217728 kB
 SwapCached:            0 kB
-Active:          2140416 kB
-Inactive:        2278336 kB
-Active(anon):       8640 kB
-Inactive(anon):   210432 kB
+Active:          2140352 kB
+Inactive:        2277696 kB
+Active(anon):       8576 kB
+Inactive(anon):   209792 kB
 Active(file):    2131776 kB
 Inactive(file):  2067904 kB
 Unevictable:           0 kB
@@ -16,15 +16,15 @@
 SwapFree:              0 kB
 Dirty:                 0 kB
 Writeback:             0 kB
-AnonPages:        193152 kB
-Mapped:            88512 kB
+AnonPages:        192576 kB
+Mapped:            88448 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1138112 kB
+Slab:            1152320 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1021184 kB
-KernelStack:       29248 kB
-PageTables:         9408 kB
+SUnreclaim:      1035392 kB
+KernelStack:       28992 kB
+PageTables:         9152 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:05.906089973 +0000
+++ meminfo.new.txt	2021-05-28 21:38:06.916081364 +0000
@@ -1,13 +1,13 @@
 MemTotal:       132164992 kB
-MemFree:        36901568 kB
-MemAvailable:   28773248 kB
+MemFree:        32887872 kB
+MemAvailable:   24759552 kB
 Buffers:            8384 kB
 Cached:          4217728 kB
 SwapCached:            0 kB
-Active:          2140352 kB
-Inactive:        2277696 kB
-Active(anon):       8576 kB
-Inactive(anon):   209792 kB
+Active:          2140416 kB
+Inactive:        2278080 kB
+Active(anon):       8640 kB
+Inactive(anon):   210176 kB
 Active(file):    2131776 kB
 Inactive(file):  2067904 kB
 Unevictable:           0 kB
@@ -16,20 +16,20 @@
 SwapFree:              0 kB
 Dirty:                 0 kB
 Writeback:             0 kB
-AnonPages:        192576 kB
-Mapped:            88448 kB
+AnonPages:        193152 kB
+Mapped:            88512 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1152320 kB
+Slab:            1177152 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1035392 kB
-KernelStack:       28992 kB
-PageTables:         9152 kB
+SUnreclaim:      1060224 kB
+KernelStack:       29056 kB
+PageTables:         9408 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:    66082496 kB
-Committed_AS:     470976 kB
+Committed_AS:     503296 kB
 VmallocTotal:   133009506240 kB
 VmallocUsed:           0 kB
 VmallocChunk:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:06.916081364 +0000
+++ meminfo.new.txt	2021-05-28 21:38:07.916072841 +0000
@@ -1,13 +1,13 @@
 MemTotal:       132164992 kB
-MemFree:        32880768 kB
-MemAvailable:   24752448 kB
+MemFree:        28966848 kB
+MemAvailable:   20838528 kB
 Buffers:            8384 kB
 Cached:          4217728 kB
 SwapCached:            0 kB
 Active:          2140416 kB
-Inactive:        2278080 kB
+Inactive:        2277952 kB
 Active(anon):       8640 kB
-Inactive(anon):   210176 kB
+Inactive(anon):   210048 kB
 Active(file):    2131776 kB
 Inactive(file):  2067904 kB
 Unevictable:           0 kB
@@ -16,20 +16,20 @@
 SwapFree:              0 kB
 Dirty:                 0 kB
 Writeback:             0 kB
-AnonPages:        193152 kB
+AnonPages:        193088 kB
 Mapped:            88512 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1177152 kB
+Slab:            1200192 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1060224 kB
-KernelStack:       29056 kB
+SUnreclaim:      1083264 kB
+KernelStack:       28992 kB
 PageTables:         9408 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
 CommitLimit:    66082496 kB
-Committed_AS:     503296 kB
+Committed_AS:     470656 kB
 VmallocTotal:   133009506240 kB
 VmallocUsed:           0 kB
 VmallocChunk:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:07.916072841 +0000
+++ meminfo.new.txt	2021-05-28 21:38:08.916064317 +0000
@@ -1,6 +1,6 @@
 MemTotal:       132164992 kB
-MemFree:        28960704 kB
-MemAvailable:   20832384 kB
+MemFree:        25025664 kB
+MemAvailable:   16897152 kB
 Buffers:            8384 kB
 Cached:          4217728 kB
 SwapCached:            0 kB
@@ -20,11 +20,11 @@
 Mapped:            88512 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1200256 kB
+Slab:            1219392 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1083328 kB
+SUnreclaim:      1102464 kB
 KernelStack:       28992 kB
-PageTables:         9408 kB
+PageTables:         9344 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:08.916064317 +0000
+++ meminfo.new.txt	2021-05-28 21:38:09.916055793 +0000
@@ -1,30 +1,30 @@
 MemTotal:       132164992 kB
-MemFree:        25019904 kB
-MemAvailable:   16891584 kB
+MemFree:        20993280 kB
+MemAvailable:   12865024 kB
 Buffers:            8384 kB
-Cached:          4217728 kB
+Cached:          4217856 kB
 SwapCached:            0 kB
-Active:          2140416 kB
-Inactive:        2277952 kB
+Active:          2140480 kB
+Inactive:        2278336 kB
 Active(anon):       8640 kB
-Inactive(anon):   210048 kB
-Active(file):    2131776 kB
-Inactive(file):  2067904 kB
+Inactive(anon):   210368 kB
+Active(file):    2131840 kB
+Inactive(file):  2067968 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:             0 kB
 SwapFree:              0 kB
 Dirty:                 0 kB
 Writeback:             0 kB
-AnonPages:        193088 kB
+AnonPages:        192704 kB
 Mapped:            88512 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1219392 kB
+Slab:            1234688 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1102464 kB
-KernelStack:       28992 kB
-PageTables:         9344 kB
+SUnreclaim:      1117760 kB
+KernelStack:       28928 kB
+PageTables:         9152 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:09.926055708 +0000
+++ meminfo.new.txt	2021-05-28 21:38:10.926047184 +0000
@@ -1,13 +1,13 @@
 MemTotal:       132164992 kB
-MemFree:        20985600 kB
-MemAvailable:   12857344 kB
+MemFree:        16963264 kB
+MemAvailable:    8835008 kB
 Buffers:            8384 kB
 Cached:          4217856 kB
 SwapCached:            0 kB
-Active:          2140480 kB
-Inactive:        2278016 kB
-Active(anon):       8640 kB
-Inactive(anon):   210048 kB
+Active:          2140416 kB
+Inactive:        2274240 kB
+Active(anon):       8576 kB
+Inactive(anon):   206272 kB
 Active(file):    2131840 kB
 Inactive(file):  2067968 kB
 Unevictable:           0 kB
@@ -16,15 +16,15 @@
 SwapFree:              0 kB
 Dirty:                 0 kB
 Writeback:             0 kB
-AnonPages:        192128 kB
-Mapped:            88512 kB
+AnonPages:        188864 kB
+Mapped:            87296 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1234688 kB
+Slab:            1259072 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1117760 kB
-KernelStack:       28928 kB
-PageTables:         8640 kB
+SUnreclaim:      1142144 kB
+KernelStack:       29120 kB
+PageTables:         9152 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:10.926047184 +0000
+++ meminfo.new.txt	2021-05-28 21:38:11.926038661 +0000
@@ -1,13 +1,13 @@
 MemTotal:       132164992 kB
-MemFree:        16957120 kB
-MemAvailable:    8828864 kB
+MemFree:        12873280 kB
+MemAvailable:    4745024 kB
 Buffers:            8384 kB
 Cached:          4217856 kB
 SwapCached:            0 kB
 Active:          2140416 kB
-Inactive:        2274240 kB
+Inactive:        2274176 kB
 Active(anon):       8576 kB
-Inactive(anon):   206272 kB
+Inactive(anon):   206208 kB
 Active(file):    2131840 kB
 Inactive(file):  2067968 kB
 Unevictable:           0 kB
@@ -15,14 +15,14 @@
 SwapTotal:             0 kB
 SwapFree:              0 kB
 Dirty:                 0 kB
-Writeback:             0 kB
-AnonPages:        188864 kB
+Writeback:           128 kB
+AnonPages:        188928 kB
 Mapped:            87296 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1259072 kB
+Slab:            1278912 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1142144 kB
+SUnreclaim:      1161984 kB
 KernelStack:       29120 kB
 PageTables:         9152 kB
 NFS_Unstable:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:11.926038661 +0000
+++ meminfo.new.txt	2021-05-28 21:38:12.926030137 +0000
@@ -1,13 +1,13 @@
 MemTotal:       132164992 kB
-MemFree:        12867136 kB
-MemAvailable:    4738880 kB
+MemFree:         8824384 kB
+MemAvailable:     696128 kB
 Buffers:            8384 kB
 Cached:          4217856 kB
 SwapCached:            0 kB
 Active:          2140416 kB
-Inactive:        2274176 kB
+Inactive:        2274304 kB
 Active(anon):       8576 kB
-Inactive(anon):   206208 kB
+Inactive(anon):   206336 kB
 Active(file):    2131840 kB
 Inactive(file):  2067968 kB
 Unevictable:           0 kB
@@ -15,16 +15,16 @@
 SwapTotal:             0 kB
 SwapFree:              0 kB
 Dirty:                 0 kB
-Writeback:           128 kB
-AnonPages:        188928 kB
+Writeback:             0 kB
+AnonPages:        188864 kB
 Mapped:            87296 kB
 Shmem:             26432 kB
 KReclaimable:     116928 kB
-Slab:            1278976 kB
+Slab:            1293056 kB
 SReclaimable:     116928 kB
-SUnreclaim:      1162048 kB
-KernelStack:       29120 kB
-PageTables:         9152 kB
+SUnreclaim:      1176128 kB
+KernelStack:       28928 kB
+PageTables:         9088 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
--- meminfo.last.txt	2021-05-28 21:38:12.926030137 +0000
+++ meminfo.new.txt	2021-05-28 21:38:13.946021443 +0000
@@ -1,30 +1,30 @@
 MemTotal:       132164992 kB
-MemFree:         8817280 kB
-MemAvailable:     689024 kB
-Buffers:            8384 kB
-Cached:          4217856 kB
+MemFree:         9113600 kB
+MemAvailable:          0 kB
+Buffers:             128 kB
+Cached:            55488 kB
 SwapCached:            0 kB
-Active:          2140416 kB
-Inactive:        2274304 kB
+Active:            35648 kB
+Inactive:         211456 kB
 Active(anon):       8576 kB
-Inactive(anon):   206336 kB
-Active(file):    2131840 kB
-Inactive(file):  2067968 kB
+Inactive(anon):   210240 kB
+Active(file):      27072 kB
+Inactive(file):     1216 kB
 Unevictable:           0 kB
 Mlocked:               0 kB
 SwapTotal:             0 kB
 SwapFree:              0 kB
 Dirty:                 0 kB
 Writeback:             0 kB
-AnonPages:        188864 kB
-Mapped:            87296 kB
+AnonPages:        193472 kB
+Mapped:            33920 kB
 Shmem:             26432 kB
-KReclaimable:     116928 kB
-Slab:            1293056 kB
-SReclaimable:     116928 kB
-SUnreclaim:      1176128 kB
-KernelStack:       28928 kB
-PageTables:         9088 kB
+KReclaimable:      51840 kB
+Slab:            1178752 kB
+SReclaimable:      51840 kB
+SUnreclaim:      1126912 kB
+KernelStack:       28864 kB
+PageTables:         9920 kB
 NFS_Unstable:          0 kB
 Bounce:                0 kB
 WritebackTmp:          0 kB
Connection to 54.91.207.X closed by remote host.
Connection to 54.91.207.X closed.

Is there a "debugging ZFS on weird architectures" document somewhere? :)

@rincebrain
Copy link
Contributor

rincebrain commented May 28, 2021

I don't think AArch64 qualifies as that odd, personally.

FYI, you can use make V=1 [other args] to convince make to tell you what it's doing. (This will, necessarily, be a lot of text.)

I think OpenZFS specifies almost no per-arch flags, I believe it gets (nearly?) all of them from the compile flags in the kernel Makefile. So if you want to experiment with the flags the modules get built with, I think there's only so much you can do without nontrivial work. (If distros vary the flags used to build the kernels significantly, which I don't know, not having ever had occasion to look, you could try another distro and see if the behavior varies.)

@omarkilani
Copy link
Author

Yeah, I mean, I just wanted to show I wasn't doing anything weird. I trust RH knows what they're doing since the rest of the system works.

I feel like most people wouldn't run anything apart from the official RH kernel on production workloads so... I'm not sure what the next step is here. I haven't done kernel development since like 2005 so I need to reactivate that part of my brain. lol.

I can try the Oracle 5.4.x kernel since that's pretty easy to test on RHEL. Might as well.

@rincebrain
Copy link
Contributor

I didn't mean to suggest RH was doing something wrong, just that since I didn't see anything obviously special-casing arm64 handling, I was wondering about flag-induced behavior.

@behlendorf above wondered about the system page size - I have never had to look at this before, so I just looked something up, but it looks like getconf PAGESIZE will answer that.

As an experiment, I'll try booting up an AArch64 VM and see if I can easily repro this...

@omarkilani
Copy link
Author

omarkilani commented May 28, 2021

No worries. :)

I just tested it on:

Linux instance-20210526-1929 5.4.17-2102.201.3.el8uek.aarch64 #2 SMP Fri Apr 23 09:42:46 PDT 2021 aarch64 aarch64 aarch64 GNU/Linux

Which is the latest Oracle-for-RHEL kernel. It died the same way.

[root@instance-20210526-1929 ~]# /usr/local/sbin/zpool version
zfs-2.1.0-rc6
zfs-kmod-2.1.0-rc6
[root@instance-20210526-1929 ~]# /usr/local/sbin/zpool import tank && /usr/local/sbin/zpool status 1
  pool: tank
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
	The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub in progress since Fri May 28 18:24:44 2021
	22.2G scanned at 3.17G/s, 84K issued at 12K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors

...

  pool: tank
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
	The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
	the pool may no longer be accessible by software that does not support
	the features. See zpool-features(5) for details.
  scan: scrub in progress since Fri May 28 18:24:44 2021
	84.1G scanned at 3.23G/s, 168K issued at 6.46K/s, 136G total
	0B repaired, 0.00% done, no estimated completion time
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  scsi-360f39ea51229408cb368509d91495fb9  ONLINE       0     0     0
	  scsi-3603528d43ade4b31b70186f9a041601e  ONLINE       0     0     0
	  scsi-36007099c456f4ec780fdc03b14976f19  ONLINE       0     0     0
	  scsi-360d5b4cb98a44fabbcc67b1a55808124  ONLINE       0     0     0
	  scsi-3603ff370fa044673a5c09353568c6757  ONLINE       0     0     0
	  scsi-360ba05ab3eab4897bcf042fdfc3da1eb  ONLINE       0     0     0
	  scsi-360087adf642b4f6586326dada6c8eb41  ONLINE       0     0     0
	  scsi-3603a47cd86dd484bba1b05bab36c1257  ONLINE       0     0     0
	  scsi-3600bf0330c6e4139829ad72c816b8c06  ONLINE       0     0     0
	  scsi-3605635d6a27b4c189c0af523ddc262de  ONLINE       0     0     0
	  scsi-36013bd4eeaab4b4a9e88beb0474a2439  ONLINE       0     0     0
	  scsi-360f024a7d8b64521b7e7d671d9397ab5  ONLINE       0     0     0

errors: No known data errors
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.
[root@instance-20210526-1929 ~]# while [ 1 ]; do free -m; sleep 1; done
              total        used        free      shared  buff/cache   available
Mem:          96706         933       93280          26        2492       86533
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706         933       93280          26        2492       86533
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706         954       93259          26        2492       86512
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706        2754       91454          26        2496       84710
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706        6223       87985          26        2496       81241
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706        9670       84539          26        2496       77794
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       13096       81112          26        2496       74368
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       16512       77696          26        2496       70952
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       19913       74296          26        2496       67551
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       23329       70879          26        2496       64135
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       26735       67474          26        2497       60729
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       30121       64087          26        2497       57343
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       33473       60735          26        2497       53991
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       36830       57378          26        2497       50633
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       40174       54034          26        2497       47290
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       43448       50761          26        2497       44016
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       46780       47428          26        2497       40684
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       50080       44128          26        2497       37384
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       53382       40826          26        2497       34082
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       56681       37528          26        2497       30783
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       59925       34283          26        2497       27539
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       63229       30979          26        2497       24235
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       66530       27678          26        2497       20934
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       69834       24374          26        2497       17630
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       73119       21089          26        2497       14345
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       76424       17784          26        2497       11039
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       79786       14422          26        2497        7677
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       83129       11079          26        2497        4335
Swap:          8191           0        8191
              total        used        free      shared  buff/cache   available
Mem:          96706       86468        7740          26        2497         995
Swap:          8191           0        8191
Connection to 150.136.46.X closed by remote host.
Connection to 150.136.46.X closed.

Oracle uses slightly different CFLAGS to build their kernels, but it doesn't seem to matter:

root      113811  0.0  0.0 217984  1664 pts/0    S+   22:19   0:00 gcc -Wp,-MD,/root/src/zfs-2.1.0/module/zcommon/.zfs_fletcher_superscalar.o.d -nostdinc -isystem /usr/lib/gcc/aarch64-redhat-linux/8/include -I./arch/arm64/include -I./arch/arm64/include/generated -I./include -I./arch/arm64/include/uapi -I./arch/arm64/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -D__KERNEL__ -mlittle-endian -DKASAN_SHADOW_SCALE_SHIFT=3 -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -std=gnu89 -mgeneral-regs-only -DCONFIG_AS_LSE=1 -DCONFIG_CC_HAS_K_CONSTRAINT=1 -fno-asynchronous-unwind-tables -Wno-psabi -mabi=lp64 -mindirect-branch=thunk-extern -DRETPOLINE -DKASAN_SHADOW_SCALE_SHIFT=3 -fno-delete-null-pointer-checks -Wno-frame-address -Wno-format-truncation -Wno-format-overflow -O2 -gt --param=allow-store-data-races=0 -Werror=frame-larger-than=2048 -Wframe-larger-than=2048 -fstack-protector-strong -Wno-unused-but-set-variable -Wimplicit-fallthrough -Wno-unused-const-variable -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -g -pg -fno-inline-functions-called-once -ffunction-sections -fdata-sections -Wdeclaration-after-statement -Wvla -Wno-pointer-sign -Wno-stringop-truncation -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fno-stack-check -fconserve-stack -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -fmacro-prefix-map=./= -fcf-protection=none -Wno-packed-not-aligned -std=gnu99 -Wno-declaration-after-statement -Wmissing-prototypes -Wno-format-zero-length -include /root/src/zfs-2.1.0/zfs_config.h -I/root/src/zfs-2.1.0/include -I/root/src/zfs-2.1.0/include/os/linux/kernel -I/root/src/zfs-2.1.0/include/os/linux/spl -I/root/src/zfs-2.1.0/include/os/linux/zfs -I/root/src/zfs-2.1.0/include -D_KERNEL -UDEBUG -DNDEBUG -DMODULE -DKBUILD_BASENAME="zfs_fletcher_superscalar" -DKBUILD_MODNAME="zcommon" -c -o /root/src/zfs-2.1.0/module/zcommon/zfs_fletcher_superscalar.o /root/src/zfs-2.1.0/module/zcommon/zfs_fletcher_superscalar.c
root      113812  0.0  0.0 277632 32640 pts/0    R+   22:19   0:00 /usr/libexec/gcc/aarch64-redhat-linux/8/cc1 -quiet -nostdinc -I ./arch/arm64/include -I ./arch/arm64/include/generated -I ./include -I ./arch/arm64/include/uapi -I ./arch/arm64/include/generated/uapi -I ./include/uapi -I ./include/generated/uapi -I /root/src/zfs-2.1.0/include -I /root/src/zfs-2.1.0/include/os/linux/kernel -I /root/src/zfs-2.1.0/include/os/linux/spl -I /root/src/zfs-2.1.0/include/os/linux/zfs -I /root/src/zfs-2.1.0/include -D __KERNEL__ -D KASAN_SHADOW_SCALE_SHIFT=3 -D CONFIG_AS_LSE=1 -D CONFIG_CC_HAS_K_CONSTRAINT=1 -D RETPOLINE -D KASAN_SHADOW_SCALE_SHIFT=3 -D _KERNEL -U DEBUG -D NDEBUG -D MODULE -D KBUILD_BASENAME="zfs_fletcher_superscalar" -D KBUILD_MODNAME="zcommon" -isystem /usr/lib/gcc/aarch64-redhat-linux/8/include -include ./include/linux/kconfig.h -include ./include/linux/compiler_types.h -include /root/src/zfs-2.1.0/zfs_config.h -MD /root/src/zfs-2.1.0/module/zcommon/.zfs_fletcher_superscalar.o.d /root/src/zfs-2.1.0/module/zcommon/zfs_fletcher_superscalar.c -quiet -dumpbase zfs_fletcher_superscalar.c -mlittle-endian -mgeneral-regs-only -mabi=lp64 -mindirect-branch=thunk-extern -auxbase-strip /root/src/zfs-2.1.0/module/zcommon/zfs_fletcher_superscalar.o -gt -g -O2 -Wall -Wundef -Werror=strict-prototypes -Wno-trigraphs -Werror=implicit-function-declaration -Werror=implicit-int -Wno-format-security -Wno-psabi -Wno-frame-address -Wformat-truncation=0 -Wformat-overflow=0 -Werror=frame-larger-than=2048 -Wframe-larger-than=2048 -Wno-unused-but-set-variable -Wimplicit-fallthrough=3 -Wunused-const-variable=0 -Wvla -Wno-pointer-sign -Wno-stringop-truncation -Werror=date-time -Werror=incompatible-pointer-types -Werror=designated-init -Wno-packed-not-aligned -Wno-declaration-after-statement -Wmissing-prototypes -Wno-format-zero-length -std=gnu90 -std=gnu99 -p -fno-strict-aliasing -fno-common -fshort-wchar -fno-PIE -fno-asynchronous-unwind-tables -fno-delete-null-pointer-checks -fstack-protector-strong -fno-omit-frame-pointer -fno-optimize-sibling-calls -fno-var-tracking-assignments -fno-inline-functions-called-once -ffunction-sections -fdata-sections -fno-strict-overflow -fno-merge-all-constants -fmerge-constants -fstack-check=no -fconserve-stack -fmacro-prefix-map=./= -fcf-protection=none --param allow-store-data-races=0 -o /tmp/ccirKvo4.s

The pagesize on OCI A1:

[root@instance-20210526-1929 ~]# getconf PAGESIZE
65536

The pagesize on the AWS Graviton 2:

[root@ip-172-30-0-199 ec2-user]#  getconf PAGESIZE
65536

The Neoverse N1 tech manual (https://documentation-service.arm.com/static/5f561d50235b3560a01e03b5?token=) says:

The instruction fetch unit includes:
• A 64KB, 4-way, set associative L1 instruction cache with 64-byte cache lines and parity protection.
• A fully associative L1 instruction TLB with native support for 4KB, 16KB, 64KB, 2MB, and 32MB page sizes.
• A dynamic branch predictor.
• Configurable support for instruction cache hardware coherency

https://www.kernel.org/doc/html/latest/arm64/memory.html

Has a good rundown of the page sizes on AArch64/Linux.

It seems like maybe Debian uses 4K page size on AArch64 and RHEL uses 64K page size.

@behlendorf
Copy link
Contributor

Found it. Issue #11574 describes this same issue with zpool scrub except on ppc64le. The workaround there was to set spl_kmem_cache_slab_limit=16384, and the issue appears to be related to the page size (also 64k) which is what prompted me to ask. I haven't looked, but it sounds like an issue with our slab implementation are larger page sizes.

@omarkilani
Copy link
Author

omarkilani commented May 28, 2021

Thanks @behlendorf.

Now I know what to focus on I can take a look at the code.

For whatever reason when it locks up on AWS the instance becomes completely unresponsive and is unsalvageable. The only option is to terminate the entire instance.

On OCI it's at least rebootable. And with the RH 4.18.0-305 kernel it even reboots itself, which is nice.

@behlendorf
Copy link
Contributor

We may want to find tune this a bit, but here's what I'm currently thinking would be a reasonable fix. Basically, if the page size was anything other than 4k we'd always fallback to using the SPLs kmem implementation which requires page alignment and was causing the memory inflation. We were effectively, wasting the majority of every page we allocated. If you can verify this resolves the issue I'll open a PR and we can go from there.

diff --git a/module/os/linux/spl/spl-kmem-cache.c b/module/os/linux/spl/spl-kme>
index 6b3d559ff..4b7867b7e 100644
--- a/module/os/linux/spl/spl-kmem-cache.c
+++ b/module/os/linux/spl/spl-kmem-cache.c
@@ -100,12 +100,13 @@ MODULE_PARM_DESC(spl_kmem_cache_max_size, "Maximum size o>
  * For small objects the Linux slab allocator should be used to make the most
  * efficient use of the memory.  However, large objects are not supported by
  * the Linux slab and therefore the SPL implementation is preferred.  A cutoff
- * of 16K was determined to be optimal for architectures using 4K pages.
+ * of 16K was determined to be optimal for architectures using 4K pages. For
+ * larger page sizes set the cutoff at a single page.
  */
-#if PAGE_SIZE == 4096
+#if PAGE_SIZE <= 16384
 unsigned int spl_kmem_cache_slab_limit = 16384;
 #else
-unsigned int spl_kmem_cache_slab_limit = 0;
+unsigned int spl_kmem_cache_slab_limit = PAGE_SIZE;
 #endif
 module_param(spl_kmem_cache_slab_limit, uint, 0644);
 MODULE_PARM_DESC(spl_kmem_cache_slab_limit,

behlendorf added a commit to behlendorf/zfs that referenced this issue May 29, 2021
For small objects the kernel's slab implemention is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to PAGE_SIZE on systems using larger pages. Since 16,384 bytes
was experimentally determined to yield the best performance on
4K page systems this is used as the cutoff. This means on 4K
page systems there is no functional change.

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#11429
Closes openzfs#11574
Closes openzfs#12150
@behlendorf
Copy link
Contributor

I've opened PR #12152 with the patch above and an explanation of the issue. I haven't actually tested it however, so it'd be great if you could confirm it really does resolve the problem.

@omarkilani
Copy link
Author

omarkilani commented May 29, 2021

@behlendorf I'm testing it now.

Okay, I did this by modifying the modprobe.d file for spl:

[root@instance-20210526-1929 ~]# cat /etc/modprobe.d/zfs.conf 
options zfs zfs_arc_max=25769803776
options spl spl_kmem_cache_slab_limit=65536
[root@instance-20210526-1929 ~]# cat /sys/module/spl/parameters/spl_kmem_cache_slab_limit 
65536

Just echo'ing into that sysfs file didn't work at first. I had to rmmod all the zfs modules and modprobe again.

With that change it doesn't run out of RAM. I'll try a couple of other values just to make sure that's the best one. But at the very least... it no longer crashes. Awesome. :)

@behlendorf
Copy link
Contributor

That's right. You just need to make sure it's set before importing the pool.

@omarkilani
Copy link
Author

@behlendorf Cool.

So, I did some testing of various values of spl_kmem_cache_slab_limit.

16k
24k
32k
40k
48k
56k
64k
128k
256k

Every value finished in the same time / at the same speed:

  pool: tank
 state: ONLINE
  scan: scrub in progress since Sat May 29 01:50:35 2021
	136G scanned at 1.20G/s, 136G issued at 1.20G/s, 136G total
	0B repaired, 99.80% done, 00:00:00 to go

...

  pool: tank
 state: ONLINE
  scan: scrub repaired 0B in 00:01:54 with 0 errors on Sat May 29 01:52:29 2021

I ran 'vmstat 1' alongside each scrub and stopped it as soon as the scrub was complete. I wrote a little thing to aggregate the values across the run time for each limit I tested. I've put the output here:

https://gist.github.com/omarkilani/346fb6ac8406fc0a51d0c267c3a31fa3

On the whole I don't think it makes any difference which value is chosen. 16k seems to have a lower system time but it's within a margin of error so I wouldn't put any stock in it.

I think the PR is good to go.

@omarkilani
Copy link
Author

omarkilani commented May 29, 2021

I ran some Postgres benchmarks at the various limit levels, with 64k on the 64k page size kernel providing the best performance:

16k: avg latency = 2.404 ms, avg tps = 13315.121336

latency average = 2.352 ms
tps = 13603.681323 (including connections establishing)
tps = 13604.826099 (excluding connections establishing)

latency average = 2.389 ms
tps = 13394.444262 (including connections establishing)
tps = 13395.613079 (excluding connections establishing)

latency average = 2.472 ms
tps = 12943.765913 (including connections establishing)
tps = 12944.924831 (excluding connections establishing)

---

64k: avg latency = 2.313 ms, avg tps = 13838.339199

latency average = 2.233 ms
tps = 14329.728653 (including connections establishing)
tps = 14332.726826 (excluding connections establishing)

latency average = 2.271 ms
tps = 14090.842201 (including connections establishing)
tps = 14092.230062 (excluding connections establishing)

latency average = 2.445 ms
tps = 13088.930065 (including connections establishing)
tps = 13090.060708 (excluding connections establishing)

---

128k: avg latency = 2.366 ms, avg tps = 13527.519451

latency average = 2.370 ms
tps = 13504.669294 (including connections establishing)
tps = 13505.974290 (excluding connections establishing)

latency average = 2.347 ms
tps = 13634.011648 (including connections establishing)
tps = 13635.310885 (excluding connections establishing)

latency average = 2.381 ms
tps = 13440.105691 (including connections establishing)
tps = 13441.273178 (excluding connections establishing)

---

256k: avg latency = 2.423 ms, avg tps = 13218.833702

latency average = 2.513 ms
tps = 12732.348960 (including connections establishing)
tps = 12733.493379 (excluding connections establishing)

latency average = 2.392 ms
tps = 13379.154862 (including connections establishing)
tps = 13380.268778 (excluding connections establishing)

latency average = 2.363 ms
tps = 13541.525764 (including connections establishing)
tps = 13542.738950 (excluding connections establishing)

@omarkilani
Copy link
Author

One final test, fio run with the following config:

[global]
bs=64K
iodepth=64
direct=1
ioengine=libaio
group_reporting
time_based
runtime=60
numjobs=4
name=raw-read
rw=read
							
[job1]
filename=/tank/db/f.fio
size=128G

At 16k/64k/128k/256k.

Outputs here:

https://gist.github.com/omarkilani/dc8f6d167493e9b94fae7402de841ec4

64k and 16k look alright on the 64k page size kernel.

Thanks for all your help @rincebrain and @behlendorf . Glad there was a solution in the end. :)

@omarkilani
Copy link
Author

omarkilani commented May 29, 2021

Ran a pgbench stress test on zfs with spl_kmem_cache_slab_limit=65536 for 12 hours, and the machine survived. It also survived a scrub of the resulting on disk data. 👍

starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 43200 s
number of transactions actually processed: 612557228
latency average = 2.257 ms
tps = 14179.551402 (including connections establishing)
tps = 14179.553286 (excluding connections establishing)

behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 2, 2021
For small objects the kernel's slab implemention is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to PAGE_SIZE on systems using larger pages. Since 16,384 bytes
was experimentally determined to yield the best performance on
4K page systems this is used as the cutoff. This means on 4K
page systems there is no functional change.

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#11429
Closes openzfs#11574
Closes openzfs#12150
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 3, 2021
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#12152
Closes openzfs#11429
Closes openzfs#11574
Closes openzfs#12150
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Jun 4, 2021
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#12152
Closes openzfs#11429
Closes openzfs#11574
Closes openzfs#12150
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 8, 2021
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#12152
Closes openzfs#11429
Closes openzfs#11574
Closes openzfs#12150
behlendorf added a commit to behlendorf/zfs that referenced this issue Jun 9, 2021
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#12152
Closes openzfs#11429
Closes openzfs#11574
Closes openzfs#12150
tonyhutter pushed a commit that referenced this issue Jun 23, 2021
For small objects the kernel's slab implementation is very fast and
space efficient. However, as the allocation size increases to
require multiple pages performance suffers. The SPL kmem cache
allocator was designed to better handle these large allocation
sizes. Therefore, on Linux the kmem_cache_* compatibility wrappers
prefer to use the kernel's slab allocator for small objects and
the custom SPL kmem cache allocator for larger objects.

This logic was effectively disabled for all architectures using
a non-4K page size which caused all kmem caches to only use the
SPL implementation. Functionally this is fine, but the SPL code
which calculates the target number of objects per-slab does not
take in to account that __vmalloc() always returns page-aligned
memory. This can result in a massive amount of wasted space when
allocating tiny objects on a platform using large pages (64k).

To resolve this issue we set the spl_kmem_cache_slab_limit cutoff
to 16K for all architectures. 

This particular change does not attempt to update the logic used
to calculate the optimal number of pages per slab. This remains
an issue which should be addressed in a future change.

Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Tony Nguyen <tony.nguyen@delphix.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #12152
Closes #11429
Closes #11574
Closes #12150
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants