Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm / remove / delete a large file causes high load and irresponsiveness #4227

Closed
michaelxwang opened this issue Jan 15, 2016 · 56 comments
Closed
Labels
Type: Performance Performance improvement or performance problem

Comments

@michaelxwang
Copy link

Here is a demonstration of the problem:

# ls -ls `pwd`/junk
101496218 -rw-r--r-- 1 root root 373720875008 Jan 14 21:52 /srv/mysqldata/data/backup/junk

# time rm junk

real    7m27.012s
user    0m0.000s
sys     0m4.348s

# while true; do uptime; sleep 5; done
...
 21:58:27 up  1:31,  4 users,  load average: 48.76, 24.15, 10.50
 21:58:32 up  1:31,  4 users,  load average: 48.94, 24.60, 10.72
 21:58:37 up  1:31,  4 users,  load average: 49.10, 25.03, 10.93
 21:58:42 up  1:31,  4 users,  load average: 49.26, 25.47, 11.15
 21:58:47 up  1:31,  4 users,  load average: 49.40, 25.89, 11.36
 21:58:52 up  1:32,  4 users,  load average: 49.52, 26.31, 11.58
 21:58:57 up  1:32,  4 users,  load average: 49.64, 26.72, 11.79
 21:59:02 up  1:32,  4 users,  load average: 46.15, 26.37, 11.76
...
 22:06:32 up  1:39,  4 users,  load average: 0.98, 8.03, 8.56

The top processes are things like:

top - 19:53:47 up 90 days,  1:05,  4 users,  load average: 26.10, 6.45, 2.69
Tasks: 546 total,   2 running, 543 sleeping,   1 stopped,   0 zombie
%Cpu(s):  0.0 us,  0.0 sy,  0.0 ni, 98.0 id,  1.7 wa,  0.0 hi,  0.0 si,  0.2 st
KiB Mem:  25189913+total, 14417196+used, 10772716+free,   425504 buffers
KiB Swap:        0 total,        0 used,        0 free.  3173036 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                    
 46003 root      20   0       0      0      0 R  59.7  0.0 324:05.24 txg_sync                                                                                                                                   
 45987 root       0 -20       0      0      0 S   7.6  0.0   5:07.18 z_fr_iss_1                                                                                                                                 
 45988 root       0 -20       0      0      0 S   7.3  0.0   5:04.00 z_fr_iss_2                                                                                                                                 
 45990 root       0 -20       0      0      0 S   7.3  0.0   5:12.88 z_fr_iss_4                                                                                                                                 
 45991 root       0 -20       0      0      0 S   7.3  0.0   5:10.03 z_fr_iss_5                                                                                                                                 
 45986 root       0 -20       0      0      0 S   7.0  0.0   5:05.78 z_fr_iss_0                                                                                                                                 
 45989 root       0 -20       0      0      0 S   7.0  0.0   5:11.09 z_fr_iss_3                                                                                                                                 
 45993 root       0 -20       0      0      0 S   7.0  0.0   5:12.80 z_fr_iss_7                                                                                                                                 
 45992 root       0 -20       0      0      0 S   5.6  0.0   5:06.98 z_fr_iss_6   

The zfs version I have is 0.6.5.4 (rebooted after installing the package):

# dpkg -l | grep zfs

ii  libzfs2                              0.6.5.4-1~trusty                    amd64        Native OpenZFS filesystem library for Linux
ii  mountall                             2.53-zfs1                           amd64        filesystem mounting tool
ii  ubuntu-zfs                           8~trusty                            amd64        Native ZFS filesystem metapackage for Ubuntu.
ii  zfs-dkms                             0.6.5.4-1~trusty                    amd64        Native OpenZFS filesystem kernel modules for Linux
ii  zfs-doc                              0.6.5.4-1~trusty                    amd64        Native OpenZFS filesystem documentation and examples.
ii  zfsutils                             0.6.5.4-1~trusty                    amd64        Native OpenZFS management utilities for Linux

The "junk" file is copied from a MySQL database file. I tried creating a large file with dd:

# dd if=/dev/zero of=junk2 bs=1024000 count=400000
# time rm junk2

real    0m0.002s
user    0m0.000s
sys     0m0.000s

but as seen, it is removed without issue. Obviously dd a file and cp a file are different.

I originally encountered the problem when dropping a large table (300GB) in MySQL database, and found this is due to the removal of the database file, so rm a large file is more basic issue.

@kernelOfTruth
Copy link
Contributor

@michaelxwang you have lz4 compression enabled ?

then dd if=/dev/zero shouldn't be an issue - at all,

"performance" for the other files with non-zero content however is concerning

referencing: #3976 High CPU usage by "z_fr_iss" after deleting large files

and judging from the comments in that issue,

lock contention, held locks could be also the underlying issue - thus referencing:

#4106 ZFS 0.6.5.3 servers hang trying to get mutexes

@michaelxwang
Copy link
Author

Yes, I have lz4 enabled:

# zpool get all | grep lz4
tmp0  feature@lz4_compress        active                      local
vol1  feature@lz4_compress        active                      local

@kernelOfTruth
Copy link
Contributor

@michaelxwang please post output of

zcat /proc/config.gz | grep -i preempt

to see which preemption model the underlying kernel uses,

how is the system holding up during that delete operation ?

Still responsive ?

Also could you do a quick inspection what iotop shows ?
Consistent, low i/o (not much fluctuation) ?

Any lvm, device-mapper, etc. ?

How much snapshots do exist on that pool ? how many in the system in general ?

Is deduplication enabled ?

Hardware specs ? (RAM, pool capacity, are crucial)

What is output of /proc/spl/kstat/zfs/arcstats ?

@kernelOfTruth
Copy link
Contributor

also what ashift was used during pool creation ?

what harddrives in what configuration is used ?

@michaelxwang
Copy link
Author

(01) Which preemption model the underlying kernel uses: zcat /proc/config.gz | grep -i preempt?

I have not have /proc/config.gz file, my kernel is "3.19.0-43-generic #49~14.04.1-Ubuntu SMP".

(02) How is the system holding up during that delete operation?

The system did not crash but not responsive, for some commands. For example, "vi README.txt"
was hanging there for sometime, and so does ":q".

(03) What iotop shows?

IOTOP shows very little activity.

# iotop

Total DISK READ :       0.00 B/s | Total DISK WRITE :      18.18 K/s
Actual DISK READ:       0.00 B/s | Actual DISK WRITE:      18.18 K/s
   TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND                                                                                                                                          
  4012 be/4 root        0.00 B/s    0.00 B/s  0.00 % 75.50 % [txg_sync]
  8414 be/4 root        0.00 B/s   18.18 K/s  0.00 %  0.00 % java -XX:+UseParallelGC -server -Xms64m -Xmx128m -Djava.library.path=./19~ com.sumologic.scala.collector.Collector -b installerSources/selected.json
     1 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % init
     2 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kthreadd]
     3 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
     4 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0]
     5 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
     6 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/u256:0]
     8 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_sched]
     9 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcu_bh]
    10 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuos/0]
    11 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuob/0]
    12 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/0]
    13 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/0]
    14 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/1]
    15 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/1]
    16 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/1]
    18 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/1:0H]
    19 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuos/1]
    20 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuob/1]
    21 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/2]
    22 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/2]
    23 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/2]
    24 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/2:0]
    25 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/2:0H]
    26 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuos/2]
    27 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuob/2]
    28 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/3]
    29 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/3]
    30 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/3]
    31 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:0]
    32 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/3:0H]
    33 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuos/3]
    34 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuob/3]
    35 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/4]
    36 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/4]
    37 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/4]
    38 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/4:0]
    39 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/4:0H]
    40 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuos/4]
    41 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuob/4]
    42 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [watchdog/5]
    43 rt/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [migration/5]
    44 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [ksoftirqd/5]
    45 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/5:0]
    46 be/0 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [kworker/5:0H]
    47 be/4 root        0.00 B/s    0.00 B/s  0.00 %  0.00 % [rcuos/5]

(04) Any lvm, device-mapper, etc?

I did not use lvm, device-mapper, etc.

(05) How much snapshots do exist on that pool ? how many in the system in general ?

Zero and zero.

# zfs list -t snapshot
no datasets available

(06) Hardware specs ? (RAM, pool capacity, are crucial).

# more /proc/meminfo 
MemTotal:       251899128 kB
MemFree:        197170396 kB
MemAvailable:   197709332 kB
Buffers:          302580 kB
Cached:           397824 kB
SwapCached:            0 kB
Active:         48563304 kB
Inactive:         148912 kB
Active(anon):   48011932 kB
Inactive(anon):      336 kB
Active(file):     551372 kB
Inactive(file):   148576 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:                84 kB
Writeback:             0 kB
AnonPages:      48011976 kB
Mapped:            50360 kB
Shmem:               460 kB
Slab:            4332696 kB
SReclaimable:     128468 kB
SUnreclaim:      4204228 kB
KernelStack:       33184 kB
PageTables:       111640 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    125949564 kB
Committed_AS:   121141024 kB
VmallocTotal:   34359738367 kB
VmallocUsed:     1514552 kB
VmallocChunk:   34230011000 kB
HardwareCorrupted:     0 kB
AnonHugePages:  19623936 kB
CmaTotal:              0 kB
CmaFree:               0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      124928 kB
DirectMap2M:    256006144 kB

# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tmp0   568G   318M   568G         -     0%     0%  1.00x  ONLINE  -
vol1  3.88T   623G  3.27T         -    15%    15%  1.00x  ONLINE  -

(07) What is output of /proc/spl/kstat/zfs/arcstats?

# time cat /proc/spl/kstat/zfs/arcstats
6 1 0x01 91 4368 4251897849 146903901931781
name                            type data
hits                            4    16704542
misses                          4    42214291
demand_data_hits                4    384
demand_data_misses              4    41252505
demand_metadata_hits            4    16704158
demand_metadata_misses          4    961679
prefetch_data_hits              4    0
prefetch_data_misses            4    0
prefetch_metadata_hits          4    0
prefetch_metadata_misses        4    107
mru_hits                        4    7618911
mru_ghost_hits                  4    0
mfu_hits                        4    9085631
mfu_ghost_hits                  4    0
deleted                         4    43
mutex_miss                      4    0
evict_skip                      4    44
evict_not_enough                4    0
evict_l2_cached                 4    0
evict_l2_eligible               4    389120
evict_l2_ineligible             4    0
evict_l2_skip                   4    0
hash_elements                   4    278555
hash_elements_max               4    406891
hash_collisions                 4    740321
hash_chains                     4    1132
hash_chain_max                  4    3
p                               4    51588942233
c                               4    103177884467
c_min                           4    33554432
c_max                           4    103177884467
size                            4    5521386152
hdr_size                        4    137379392
data_size                       4    65019904
metadata_size                   4    5221504512
other_size                      4    97482344
anon_size                       4    749496320
anon_evictable_data             4    0
anon_evictable_metadata         4    0
mru_size                        4    1225226240
mru_evictable_data              4    59506688
mru_evictable_metadata          4    1112735744
mru_ghost_size                  4    0
mru_ghost_evictable_data        4    0
mru_ghost_evictable_metadata    4    0
mfu_size                        4    3311801856
mfu_evictable_data              4    0
mfu_evictable_metadata          4    2975798784
mfu_ghost_size                  4    0
mfu_ghost_evictable_data        4    0
mfu_ghost_evictable_metadata    4    0
l2_hits                         4    0
l2_misses                       4    0
l2_feeds                        4    0
l2_rw_clash                     4    0
l2_read_bytes                   4    0
l2_write_bytes                  4    0
l2_writes_sent                  4    0
l2_writes_done                  4    0
l2_writes_error                 4    0
l2_writes_lock_retry            4    0
l2_evict_lock_retry             4    0
l2_evict_reading                4    0
l2_evict_l1cached               4    0
l2_free_on_write                4    0
l2_cdata_free_on_write          4    0
l2_abort_lowmem                 4    0
l2_cksum_bad                    4    0
l2_io_error                     4    0
l2_size                         4    0
l2_asize                        4    0
l2_hdr_size                     4    0
l2_compress_successes           4    0
l2_compress_zeros               4    0
l2_compress_failures            4    0
memory_throttle_count           4    0
duplicate_buffers               4    0
duplicate_buffers_size          4    0
duplicate_reads                 4    0
memory_direct_count             4    0
memory_indirect_count           4    0
arc_no_grow                     4    0
arc_tempreserve                 4    0
arc_loaned_bytes                4    0
arc_prune                       4    0
arc_meta_used                   4    5456366248
arc_meta_limit                  4    77383413350
arc_meta_max                    4    6762980000
arc_meta_min                    4    16777216
arc_need_free                   4    0
arc_sys_free                    4    4030382080

real        0m0.001s
user        0m0.000s
sys 0m0.000s

(08) also what ashift was used during pool creation ?

# zpool get all | grep -i ashift
tmp0  ashift                      0                           default
vol1  ashift                      0                           default

(09) what harddrives in what configuration is used ?

In this test, it is Amazon AWS EBS but the problem happened with
ephemeral drive as well.

(10) Other info:

[2016-01-16T13:08:19 root@mdb-01-r-01:/srv/mysqldata/data/backup]
# time rm junk

real    7m40.305s
user    0m0.000s
sys     0m4.340s
[2016-01-16T13:16:09 root@mdb-01-r-01:/srv/mysqldata/data/backup]

# top

top - 13:10:01 up 1 day, 16:43,  4 users,  load average: 3.21, 1.20, 0.55
Tasks: 1555 total,   2 running, 1553 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.3 us,  0.1 sy,  0.0 ni, 97.6 id,  1.7 wa,  0.0 hi,  0.0 si,  0.3 st
KiB Mem:  25189913+total, 57213072 used, 19468604+free,   297196 buffers
KiB Swap:        0 total,        0 used,        0 free.   394288 cached Mem

   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  4012 root      20   0       0      0      0 R  40.9  0.0  19:42.98 txg_sync
 64983 root      20   0   56996  24352   7084 S  10.6  0.0  80:27.69 iotop
 93681 root      20   0   16692   4036   2248 R   1.0  0.0   0:00.33 top
  3744 root       0 -20       0      0      0 S   0.7  0.0   0:03.82 z_fr_iss_0
  3753 root       0 -20       0      0      0 S   0.7  0.0   0:03.86 z_fr_iss_1
  3755 root       0 -20       0      0      0 S   0.7  0.0   0:03.84 z_fr_iss_1
  3757 root       0 -20       0      0      0 S   0.7  0.0   0:03.78 z_fr_iss_1
  3762 root       0 -20       0      0      0 S   0.7  0.0   0:03.85 z_fr_iss_1
  3764 root       0 -20       0      0      0 S   0.7  0.0   0:04.14 z_fr_iss_1
  3766 root       0 -20       0      0      0 S   0.7  0.0   0:03.82 z_fr_iss_2
  3767 root       0 -20       0      0      0 S   0.7  0.0   0:03.85 z_fr_iss_2
  3772 root       0 -20       0      0      0 S   0.7  0.0   0:03.82 z_fr_iss_2
  3779 root       0 -20       0      0      0 S   0.7  0.0   0:04.06 z_fr_iss_3
  3785 root       0 -20       0      0      0 S   0.7  0.0   0:04.10 z_fr_iss_3
  3789 root       0 -20       0      0      0 S   0.7  0.0   0:03.80 z_fr_iss_4
  3794 root       0 -20       0      0      0 S   0.7  0.0   0:03.96 z_fr_iss_4
  3795 root       0 -20       0      0      0 S   0.7  0.0   0:03.82 z_fr_iss_4
  3797 root       0 -20       0      0      0 S   0.7  0.0   0:03.80 z_fr_iss_4
  3806 root       0 -20       0      0      0 S   0.7  0.0   0:03.92 z_fr_iss_5
  3818 root       0 -20       0      0      0 S   0.7  0.0   0:03.80 z_fr_iss_6
  3819 root       0 -20       0      0      0 S   0.7  0.0   0:03.84 z_fr_iss_6
  3820 root       0 -20       0      0      0 S   0.7  0.0   0:03.90 z_fr_iss_6
  3822 root       0 -20       0      0      0 S   0.7  0.0   0:04.13 z_fr_iss_6
  3831 root       0 -20       0      0      0 S   0.7  0.0   0:03.90 z_fr_iss_7
  3832 root       0 -20       0      0      0 S   0.7  0.0   0:03.82 z_fr_iss_7
  3833 root       0 -20       0      0      0 S   0.7  0.0   0:04.04 z_fr_iss_7
  3741 root       0 -20       0      0      0 S   0.3  0.0   0:03.97 z_fr_iss_0
  3742 root       0 -20       0      0      0 S   0.3  0.0   0:03.86 z_fr_iss_0
  3743 root       0 -20       0      0      0 S   0.3  0.0   0:03.80 z_fr_iss_0
  3745 root       0 -20       0      0      0 S   0.3  0.0   0:03.81 z_fr_iss_0
  3746 root       0 -20       0      0      0 S   0.3  0.0   0:03.90 z_fr_iss_0
  3747 root       0 -20       0      0      0 S   0.3  0.0   0:03.93 z_fr_iss_0
  3748 root       0 -20       0      0      0 S   0.3  0.0   0:03.97 z_fr_iss_0
  3749 root       0 -20       0      0      0 S   0.3  0.0   0:03.87 z_fr_iss_0
  3750 root       0 -20       0      0      0 S   0.3  0.0   0:04.04 z_fr_iss_0
  3751 root       0 -20       0      0      0 S   0.3  0.0   0:03.90 z_fr_iss_0
  3752 root       0 -20       0      0      0 S   0.3  0.0   0:03.85 z_fr_iss_0
  3754 root       0 -20       0      0      0 S   0.3  0.0   0:03.86 z_fr_iss_1
  3756 root       0 -20       0      0      0 S   0.3  0.0   0:03.78 z_fr_iss_1
  3758 root       0 -20       0      0      0 S   0.3  0.0   0:03.83 z_fr_iss_1
  3760 root       0 -20       0      0      0 S   0.3  0.0   0:03.86 z_fr_iss_1
  3761 root       0 -20       0      0      0 S   0.3  0.0   0:04.06 z_fr_iss_1
  3763 root       0 -20       0      0      0 S   0.3  0.0   0:03.92 z_fr_iss_1
  3765 root       0 -20       0      0      0 S   0.3  0.0   0:03.79 z_fr_iss_2

@kernelOfTruth
Copy link
Contributor

okay - so the memory pressure can't be too high, lots of it appears to be still free

please post output of

dmesg | grep -i preempt

@michaelxwang
Copy link
Author

dmsg does not have the info:

# dmesg | grep -i empt | wc
      0       0       0

@lnxbil
Copy link

lnxbil commented Jan 16, 2016

I do not know if this is related or not, but I encountered the same problem yielding stuck processes on delete. Could you please check if you have similar messages like these in your dmesg:

[39002.408919] INFO: task rm:5711 blocked for more than 120 seconds.
[39002.412114]       Tainted: P          IO    4.2.6-1-pve #1
[39002.415308] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[39002.418569] rm              D ffff88040fa96a00     0  5711   4788 0x00000002
[39002.421839]  ffff8804e526bae8 0000000000000082 ffff88040da4a580 ffff880801551900
[39002.425163]  0000000000000202 ffff8804e526c000 ffff88040d5cc150 ffff88040d5cc178
[39002.428517]  ffff88040d5cc190 0000000000000000 ffff8804e526bb08 ffffffff81804227
[39002.431872] Call Trace:
[39002.435203]  [<ffffffff81804227>] schedule+0x37/0x80
[39002.438562]  [<ffffffffc0369e39>] cv_wait_common+0x109/0x140 [spl]
[39002.441937]  [<ffffffff810bd790>] ? wait_woken+0x90/0x90
[39002.445350]  [<ffffffffc0369e85>] __cv_wait+0x15/0x20 [spl]
[39002.448738]  [<ffffffffc101636b>] dmu_tx_wait+0xab/0x3a0 [zfs]
[39002.452143]  [<ffffffffc10166fb>] dmu_tx_assign+0x9b/0x4f0 [zfs]
[39002.455514]  [<ffffffffc100721c>] dmu_free_long_range+0x19c/0x270 [zfs]
[39002.458908]  [<ffffffffc107c81d>] zfs_rmnode+0x28d/0x350 [zfs]
[39002.462310]  [<ffffffffc109d7bd>] zfs_zinactive+0x18d/0x1a0 [zfs]
[39002.465676]  [<ffffffffc10983fb>] zfs_inactive+0x6b/0x260 [zfs]
[39002.469033]  [<ffffffff81193dae>] ? truncate_pagecache+0x5e/0x70
[39002.472421]  [<ffffffff810bd7d0>] ? autoremove_wake_function+0x40/0x40
[39002.475887]  [<ffffffffc10ae5d8>] zpl_evict_inode+0x48/0x70 [zfs]
[39002.479304]  [<ffffffff81218c4a>] evict+0xba/0x180
[39002.482678]  [<ffffffff812195ca>] iput+0x17a/0x1e0
[39002.485981]  [<ffffffff8120d87e>] do_unlinkat+0x16e/0x2d0
[39002.489266]  [<ffffffff8120e42b>] SyS_unlinkat+0x1b/0x30
[39002.492464]  [<ffffffff81808332>] entry_SYSCALL_64_fastpath+0x16/0x75

I'm on 0.6.5.4 too.

@dweeezil
Copy link
Contributor

@michaelxwang I'm most interested in the CPU being used by the sync task. I'd suggest running perf top to get a handle on where the CPU time is being spent.

@michaelxwang
Copy link
Author

@dweeezil Here is the output of "perf top":

Samples: 11K of event 'cpu-clock', Event count (approx.): 1714950826
Overhead  Shared Object        Symbol
  31.72%  [kernel]             [k] _raw_spin_unlock_irqrestore
  11.86%  [kernel]             [k] finish_task_switch
   3.44%  [kernel]             [k] tick_nohz_idle_enter
   2.87%  [kernel]             [k] mutex_lock
   2.85%  [kernel]             [k] tick_nohz_idle_exit
   1.52%  [kernel]             [k] mutex_unlock
   1.50%  [kernel]             [k] range_tree_seg_compare
   1.49%  [kernel]             [k] _raw_spin_lock
   1.30%  [kernel]             [k] _raw_spin_unlock
   1.05%  libc-2.19.so         [.] __memcpy_sse2_unaligned
   1.04%  libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
   0.95%  [kernel]             [k] buf_hash_find
   0.89%  [kernel]             [k] cv_wait_common
   0.81%  [kernel]             [k] zio_create
   0.78%  [kernel]             [k] zrl_add
   0.67%  [kernel]             [k] free_blocks
   0.67%  [kernel]             [k] dsl_dataset_block_kill
   0.66%  [kernel]             [k] zio_execute
   0.54%  [kernel]             [k] pvclock_clocksource_read
   0.47%  [kernel]             [k] zrl_remove
   0.40%  [kernel]             [k] __schedule
   0.39%  [kernel]             [k] dbuf_dirty
   0.38%  [kernel]             [k] io_schedule
   0.36%  [kernel]             [k] dsl_dir_diduse_space
   0.34%  [kernel]             [k] vdev_lookup_top
   0.34%  [kernel]             [k] kmem_cache_free
   0.33%  [kernel]             [k] avl_find
   0.31%  [kernel]             [k] zio_wait
   0.31%  [kernel]             [k] dbuf_read
   0.28%  [kernel]             [k] zio_free_bp_init
   0.28%  [kernel]             [k] schedule_preempt_disabled
   0.27%  [kernel]             [k] range_tree_space
   0.27%  [kernel]             [k] xen_hypercall_event_channel_op
   0.26%  [kernel]             [k] __cv_broadcast
   0.26%  [kernel]             [k] range_tree_add
   0.26%  [kernel]             [k] spa_taskq_dispatch_ent
   0.25%  [kernel]             [k] zio_destroy
   0.25%  [kernel]             [k] spl_kmem_cache_free
   0.24%  [kernel]             [k] __cv_destroy
   0.23%  [kernel]             [k] vdev_stat_update
   0.23%  [kernel]             [k] _raw_spin_lock_irqsave
   0.23%  libpython2.7.so.1.0  [.] lookdict_string
   0.22%  [kernel]             [k] zio_done
   0.22%  [kernel]             [k] __do_page_fault
   0.21%  [kernel]             [k] spl_kmem_cache_alloc
   0.21%  mysqld               [.] 0x0000000000168635
   0.20%  mysqld               [.] 0x0000000000168a65
   0.20%  mysqld               [.] 0x000000000057e41f

@lnxbil I do not have any messages related to this in either dmsg or syslog.

@dweeezil
Copy link
Contributor

@michaelxwang I think a flame graph would be helpful here. If you're not familiar with the steps, here's how:

  • Get the software: git clone https://github.com/brendangregg/FlameGraph
  • In the FlameGraph directory, while the CPU hogging is happening, run: perf record -ag -F 999
  • After about a minute interrupt it with ^c
  • Generate the graph: perf script | ./stackcollapse-perf.pl | ./flamegraph.pl > fg.svg

And then post fg.svg to a paste site. Hopefully it will shed some light as to where all the calls to _raw_spin_unlock_irqrestore are coming from.

@michaelxwang
Copy link
Author

@dweeezil Here is the flame graph:

http://wikisend.com/download/745082/fg.svg

Thanks!

@dweeezil
Copy link
Contributor

@michaelxwang That's not showing terribly excessive CPU usage by the sync task at all. The sync task is only 1.53% of the total time. Looking back at your original issue report, I see now that there likely wasn't very high CPU load but instead more likely a lot of blocked zio processes. My guess would be the z_fr_iss tasks. How many hardware threads does your system have? The current configuration allows up to 96 z_fr_iss threads and they definitely can oversubscribe the system and cause thrashing. There was another issue (forgot which one) in which this same thing happened.

@dweeezil
Copy link
Contributor

I think a reference to #3976 might be in order.

@michaelxwang
Copy link
Author

@dweeezil It is Amazon memory optimized EC2 instance "r3.8xlarge":

Model       vCPU    Mem (GiB)   SSD Storage (GB)
r3.8xlarge  32      244         2 x 320

with details in:

https://aws.amazon.com/ec2/instance-types/

There are 32 vCPU's, They are all identical, the last one is:

processor   : 31
vendor_id   : GenuineIntel
cpu family  : 6
model       : 62
model name  : Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
stepping    : 4
microcode   : 0x428
cpu MHz     : 2494.066
cache size  : 25600 KB
physical id : 1
siblings    : 16
core id     : 7
cpu cores   : 8
apicid      : 47
initial apicid  : 47
fpu     : yes
fpu_exception   : yes
cpuid level : 13
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 cx16 pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm fsgsbase smep erms xsaveopt
bugs        :
bogomips    : 5052.49
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

So I guess it has 32 threads at least.

@ryao
Copy link
Contributor

ryao commented Jan 18, 2016

@michaelxwang The kernel hangcheck timer backtrace of the rm kernel stack suggests that the unlink operation has exhausted the amount of globally dirty data in the open transaction group, causing all write operations to block in dmu_tx_assign until this state clears. Any time the transaction group commit clears freeing space for more dirty data, the operation then unblocks and gobbles up the rest of it.

Data from /proc/spl/kstat/zfs/dmu_tx at the start of the operation, some points during the operation and at the end of the operation should confirm this.

There might be a bug in the code that keeps us from honoring zfs_dirty_data_sync or maybe we are just overwhelming our ability to sync new transaction groups. We could tell if someone with a reproducer runs /path/to/funclatency -f spa_sync 120 during the time the rm operation is unblocked, kills it when rm unblocks and posts the output.

That funclatency tool is from brendangregg/perf-tools#43. Here is a direct link:

https://github.com/ryao/perf-tools/blob/funclatency/kernel/funclatency

@dweeezil
Copy link
Contributor

I'd be interested to know what dmesg | grep smpboot shows. I've seen EC2 instances with crazy large numbers of hotplug CPUs configured (depending on the AMI used). That will cause tx_cpu to be very large and can cause extreme overhead in the txg system creating and destroying various objects such as mutexes, cvs, etc.

@dweeezil
Copy link
Contributor

If the number of hotplug CPUs shows up as some very large number, you can use the possible_cpus=X kernel parameter to set it appropriately (32 in this case).

@ryao
Copy link
Contributor

ryao commented Jan 18, 2016

@michaelxwang In addition to my previous comment, what is the output of zpool status for this pool and what drive models is it using? I am particularly interested in trying to get an idea of the IOPS ability of the pool's vdevs. Either we are not syncing often enough or our ability to sync is overwhelmed. The latter could be due to hardware having trouble keeping up with the dirty data generated.

As for the other contributors posting here, here is some food for thought. If we have the problem of too much dirty data, is there any reason why we could not put the file in the unlinked set and then free the file contents in the background asynchronously at a more manageable rate?

@ryao
Copy link
Contributor

ryao commented Jan 18, 2016

@dweeezil Why would extreme overhead in creating and destroying various objects cause everything to stop until the rm operation has finished?

@michaelxwang
Copy link
Author

@dweeezil Here is the output on smpboot:

# dmesg | grep smpboot
[    0.000000] smpboot: Allowing 128 CPUs, 96 hotplug CPUs
[    0.266110] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz (fam: 06, model: 3e, stepping: 04)
[    0.938384] smpboot: Total of 32 processors activated (160649.95 BogoMIPS)

@dweeezil
Copy link
Contributor

@ryao My thinking is that it might slow txg commits sufficiently to cause the problem you've described above.

As to unlinked set, why would this operation be bypassing it? The rm task is calling zfs_rmnode which does used it (the unlinked set).

@michaelxwang That's not as much overshoot as I expected to see but it's still excessively high. Even if it doesn't help this problem, you probably want to be booting with possible_cpus=32.

@dweeezil
Copy link
Contributor

@michaelxwang Since you mentioned MySQL, has the recordsize been tuned on this filesystem? I'm wondering how many blocks might be involved. If the recordsize is very small, there's a whole lot more to free.

@michaelxwang
Copy link
Author

@dweeezil I believe by recordsize you are referring to the innodb_page_size? It is 16k:

| innodb_page_size                                       | 16384                |

@dweeezil
Copy link
Contributor

@michaelxwang I was referring to the zfs filesystem recordsize which you can retrieve with zfs get recordsize <pool>/<dataset>. Database users often set it when not using zfs compression. If you've not changed it, the recordsize will be 128K.

@michaelxwang
Copy link
Author

@dweeezil The datafile that I am deleting via drop table inside mysql, or the copy of the datafile created by the cp command that I am deleting via rm are resides in the data filesystem which has recordsize 16K:

# zfs get recordsize  | sed "s:^:    :"
NAME                      PROPERTY    VALUE    SOURCE
tmp0                      recordsize  128K     default
tmp0/mysqllog             recordsize  128K     default
tmp0/mysqllog/log         recordsize  128K     default
tmp0/mysqltemp            recordsize  128K     default
tmp0/mysqltemp/tmp        recordsize  128K     default
vol1                      recordsize  128K     default
vol1/mysqlbin             recordsize  128K     default
vol1/mysqlbin/binlog      recordsize  128K     local
vol1/mysqlbin/relaylog    recordsize  128K     local
vol1/mysqldata            recordsize  128K     default
vol1/mysqldata/binlog     recordsize  128K     local
vol1/mysqldata/data       recordsize  16K      local
vol1/mysqldata/innodblog  recordsize  4K       local
vol1/mysqldata/relaylog   recordsize  128K     local

@dweeezil
Copy link
Contributor

@michaelxwang The 16K recordsize will certainly amplify the number of blocks which need to be freed.

@dweeezil
Copy link
Contributor

I've finally got a decent test scenario set up. Short story is that deleting a giant file with a 16K record size takes a long time.

If, however, the file is currently open by another process, the unlinked set comes into play and the removal is instantaneous. It seem that as @ryao suggested, there ought be a way to always use the unlinked set. Maybe only for very large objects? I suspect the reason it's not is because most people expect the space to be immediately available for re-use once a delete is "finished".

I'm not sure how these operations could be accelerated. My test used a 1.7TB file with 16K recordsize. In addition to the 103M L0 (data) blocks, there are many, many indirect blocks which need to be freed and to make matters worse on a low-IOPS pool, they're all dittoed which doubles the IO load. Furthermore, if there any snapshots, deleting a file of this size requires writing a couple gigabytes to a deadlist.

@michaelxwang
Copy link
Author

@ryao This is to respond to your post on dmu_tx. I run a loop to capture the rm process, uptime, and the content of dmu_tx continuously every 15 seconds before the rm started, during the process [appearance of the rm process], and after the rm process finishes. The load reaches as high as 49, and it starts to drop before the rm process finishes. Among my repeated tests, sometimes the load reach this high number, sometimes it goes up but remained in single digit. I noticed that during the rm operation, the dmu_tx_assigned hard increased. Is this test conclusive? Can you interpret more on the test result?

# while true; do ps -eaf | grep " r[m]"; uptime; cat /proc/spl/kstat/zfs/dmu_tx; sleep 15; done
 13:40:52 up 4 days, 17:14,  3 users,  load average: 0.11, 0.24, 0.14
5 1 0x01 11 528 4185196809 407636480808736
name                            type data
dmu_tx_assigned                 4    353539985
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850213
dmu_tx_dirty_over_max           4    288
dmu_tx_quota                    4    0
 13:41:07 up 4 days, 17:14,  3 users,  load average: 0.09, 0.22, 0.14
5 1 0x01 11 528 4185196809 407651538980506
name                            type data
dmu_tx_assigned                 4    353544783
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850213
dmu_tx_dirty_over_max           4    288
dmu_tx_quota                    4    0
 13:41:22 up 4 days, 17:14,  3 users,  load average: 0.07, 0.21, 0.13
5 1 0x01 11 528 4185196809 407666597060145
name                            type data
dmu_tx_assigned                 4    353549826
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850213
dmu_tx_dirty_over_max           4    288
dmu_tx_quota                    4    0
 13:41:37 up 4 days, 17:14,  3 users,  load average: 0.13, 0.22, 0.14
5 1 0x01 11 528 4185196809 407681654662923
name                            type data
dmu_tx_assigned                 4    353558893
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850213
dmu_tx_dirty_over_max           4    288
dmu_tx_quota                    4    0
 13:41:52 up 4 days, 17:15,  3 users,  load average: 0.10, 0.21, 0.14
5 1 0x01 11 528 4185196809 407696711861983
name                            type data
dmu_tx_assigned                 4    353567859
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850213
dmu_tx_dirty_over_max           4    288
dmu_tx_quota                    4    0
 13:42:07 up 4 days, 17:15,  3 users,  load average: 0.08, 0.20, 0.13
5 1 0x01 11 528 4185196809 407711769255868
name                            type data
dmu_tx_assigned                 4    353574459
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850213
dmu_tx_dirty_over_max           4    288
dmu_tx_quota                    4    0
 13:42:22 up 4 days, 17:15,  3 users,  load average: 0.14, 0.21, 0.14
5 1 0x01 11 528 4185196809 407726824894645
name                            type data
dmu_tx_assigned                 4    353584124
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850213
dmu_tx_dirty_over_max           4    288
dmu_tx_quota                    4    0
 13:42:38 up 4 days, 17:15,  3 users,  load average: 0.11, 0.20, 0.14
5 1 0x01 11 528 4185196809 407741882239476
name                            type data
dmu_tx_assigned                 4    353592756
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850213
dmu_tx_dirty_over_max           4    288
dmu_tx_quota                    4    0
root     119026   9094 81 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:42:53 up 4 days, 17:16,  3 users,  load average: 0.24, 0.22, 0.14
5 1 0x01 11 528 4185196809 407756939672915
name                            type data
dmu_tx_assigned                 4    353598057
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850314
dmu_tx_dirty_over_max           4    290
dmu_tx_quota                    4    0
root     119026   9094  9 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:43:08 up 4 days, 17:16,  3 users,  load average: 15.03, 3.49, 1.22
5 1 0x01 11 528 4185196809 407771998027779
name                            type data
dmu_tx_assigned                 4    353598126
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850378
dmu_tx_dirty_over_max           4    354
dmu_tx_quota                    4    0
root     119026   9094  5 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:43:23 up 4 days, 17:16,  3 users,  load average: 23.00, 5.82, 2.01
5 1 0x01 11 528 4185196809 407787055351904
name                            type data
dmu_tx_assigned                 4    353598322
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  3 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:43:38 up 4 days, 17:16,  3 users,  load average: 29.20, 8.03, 2.80
5 1 0x01 11 528 4185196809 407802114923806
name                            type data
dmu_tx_assigned                 4    353598323
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  2 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:43:53 up 4 days, 17:17,  3 users,  load average: 34.03, 10.14, 3.57
5 1 0x01 11 528 4185196809 407817170991484
name                            type data
dmu_tx_assigned                 4    353598325
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  2 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:44:08 up 4 days, 17:17,  3 users,  load average: 37.79, 12.14, 4.33
5 1 0x01 11 528 4185196809 407832229283083
name                            type data
dmu_tx_assigned                 4    353598326
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:44:23 up 4 days, 17:17,  3 users,  load average: 40.72, 14.04, 5.08
5 1 0x01 11 528 4185196809 407847285355945
name                            type data
dmu_tx_assigned                 4    353598327
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:44:38 up 4 days, 17:17,  3 users,  load average: 42.99, 15.85, 5.82
5 1 0x01 11 528 4185196809 407862346713233
name                            type data
dmu_tx_assigned                 4    353598327
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:44:53 up 4 days, 17:18,  3 users,  load average: 44.77, 17.57, 6.54
5 1 0x01 11 528 4185196809 407877403222566
name                            type data
dmu_tx_assigned                 4    353598329
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:45:08 up 4 days, 17:18,  3 users,  load average: 46.15, 19.21, 7.25
5 1 0x01 11 528 4185196809 407892461692671
name                            type data
dmu_tx_assigned                 4    353598331
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:45:23 up 4 days, 17:18,  3 users,  load average: 47.22, 20.77, 7.95
5 1 0x01 11 528 4185196809 407907518619205
name                            type data
dmu_tx_assigned                 4    353598333
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:45:38 up 4 days, 17:18,  3 users,  load average: 48.06, 22.25, 8.64
5 1 0x01 11 528 4185196809 407922577539206
name                            type data
dmu_tx_assigned                 4    353598334
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:45:53 up 4 days, 17:19,  3 users,  load average: 48.71, 23.66, 9.32
5 1 0x01 11 528 4185196809 407937633794578
name                            type data
dmu_tx_assigned                 4    353598336
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:46:08 up 4 days, 17:19,  3 users,  load average: 49.29, 25.01, 10.00
5 1 0x01 11 528 4185196809 407952694728387
name                            type data
dmu_tx_assigned                 4    353598336
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:01 rm /srv/mysqldata/data/backup/junk
 13:46:23 up 4 days, 17:19,  3 users,  load average: 49.67, 26.29, 10.65
5 1 0x01 11 528 4185196809 407967752617535
name                            type data
dmu_tx_assigned                 4    353598533
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850536
dmu_tx_dirty_over_max           4    404
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:46:38 up 4 days, 17:19,  3 users,  load average: 39.72, 25.23, 10.56
5 1 0x01 11 528 4185196809 407982811789366
name                            type data
dmu_tx_assigned                 4    353624769
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850728
dmu_tx_dirty_over_max           4    409
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:46:54 up 4 days, 17:20,  3 users,  load average: 32.25, 24.29, 10.48
5 1 0x01 11 528 4185196809 407997867985186
name                            type data
dmu_tx_assigned                 4    353624820
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850728
dmu_tx_dirty_over_max           4    409
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:47:09 up 4 days, 17:20,  3 users,  load average: 26.15, 23.32, 10.39
5 1 0x01 11 528 4185196809 408012925813095
name                            type data
dmu_tx_assigned                 4    353624906
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:47:24 up 4 days, 17:20,  3 users,  load average: 21.46, 22.43, 10.30
5 1 0x01 11 528 4185196809 408027983449080
name                            type data
dmu_tx_assigned                 4    353624976
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:47:39 up 4 days, 17:20,  3 users,  load average: 17.82, 21.57, 10.22
5 1 0x01 11 528 4185196809 408043041447075
name                            type data
dmu_tx_assigned                 4    353624996
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:47:54 up 4 days, 17:21,  3 users,  load average: 14.98, 20.76, 10.14
5 1 0x01 11 528 4185196809 408058098064067
name                            type data
dmu_tx_assigned                 4    353625046
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:48:09 up 4 days, 17:21,  3 users,  load average: 12.77, 19.99, 10.05
5 1 0x01 11 528 4185196809 408073155741166
name                            type data
dmu_tx_assigned                 4    353625046
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  1 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:48:24 up 4 days, 17:21,  3 users,  load average: 11.05, 19.25, 9.97
5 1 0x01 11 528 4185196809 408088217084424
name                            type data
dmu_tx_assigned                 4    353625047
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:48:39 up 4 days, 17:21,  3 users,  load average: 9.71, 18.56, 9.89
5 1 0x01 11 528 4185196809 408103275683195
name                            type data
dmu_tx_assigned                 4    353625047
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:48:54 up 4 days, 17:22,  3 users,  load average: 8.67, 17.89, 9.81
5 1 0x01 11 528 4185196809 408118333908628
name                            type data
dmu_tx_assigned                 4    353625048
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:49:09 up 4 days, 17:22,  3 users,  load average: 7.85, 17.26, 9.74
5 1 0x01 11 528 4185196809 408133395043679
name                            type data
dmu_tx_assigned                 4    353625051
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:49:24 up 4 days, 17:22,  3 users,  load average: 7.22, 16.66, 9.66
5 1 0x01 11 528 4185196809 408148453606398
name                            type data
dmu_tx_assigned                 4    353625053
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:49:39 up 4 days, 17:22,  3 users,  load average: 6.73, 16.09, 9.59
5 1 0x01 11 528 4185196809 408163511689430
name                            type data
dmu_tx_assigned                 4    353625055
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
root     119026   9094  0 13:42 pts/0    00:00:03 rm /srv/mysqldata/data/backup/junk
 13:49:54 up 4 days, 17:23,  3 users,  load average: 6.35, 15.55, 9.51
5 1 0x01 11 528 4185196809 408178573471646
name                            type data
dmu_tx_assigned                 4    353625056
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    850793
dmu_tx_dirty_over_max           4    413
dmu_tx_quota                    4    0
 13:50:09 up 4 days, 17:23,  3 users,  load average: 6.05, 15.03, 9.44
5 1 0x01 11 528 4185196809 408193632124395
name                            type data
dmu_tx_assigned                 4    353669495
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    856707
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:50:24 up 4 days, 17:23,  3 users,  load average: 5.15, 14.39, 9.32
5 1 0x01 11 528 4185196809 408208690282061
name                            type data
dmu_tx_assigned                 4    353721988
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    909125
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:50:39 up 4 days, 17:23,  3 users,  load average: 4.53, 13.80, 9.21
5 1 0x01 11 528 4185196809 408223751541430
name                            type data
dmu_tx_assigned                 4    353818390
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    948736
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:50:54 up 4 days, 17:24,  3 users,  load average: 3.89, 13.21, 9.09
5 1 0x01 11 528 4185196809 408238806418776
name                            type data
dmu_tx_assigned                 4    353851297
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    956600
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:51:10 up 4 days, 17:24,  3 users,  load average: 3.31, 12.62, 8.96
5 1 0x01 11 528 4185196809 408253862486532
name                            type data
dmu_tx_assigned                 4    353866795
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    972058
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:51:25 up 4 days, 17:24,  3 users,  load average: 3.04, 12.10, 8.85
5 1 0x01 11 528 4185196809 408268917867465
name                            type data
dmu_tx_assigned                 4    353882110
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    987234
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:51:40 up 4 days, 17:24,  3 users,  load average: 2.66, 11.58, 8.73
5 1 0x01 11 528 4185196809 408283976770226
name                            type data
dmu_tx_assigned                 4    353901469
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1006547
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:51:55 up 4 days, 17:25,  3 users,  load average: 2.37, 11.07, 8.61
5 1 0x01 11 528 4185196809 408299035063038
name                            type data
dmu_tx_assigned                 4    353918192
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1023238
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:52:10 up 4 days, 17:25,  3 users,  load average: 2.07, 10.58, 8.49
5 1 0x01 11 528 4185196809 408314088601443
name                            type data
dmu_tx_assigned                 4    353931541
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1036470
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:52:25 up 4 days, 17:25,  3 users,  load average: 1.91, 10.13, 8.38
5 1 0x01 11 528 4185196809 408329146392486
name                            type data
dmu_tx_assigned                 4    353946623
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1051490
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:52:40 up 4 days, 17:25,  3 users,  load average: 1.79, 9.70, 8.26
5 1 0x01 11 528 4185196809 408344206609347
name                            type data
dmu_tx_assigned                 4    353949833
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054648
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:52:55 up 4 days, 17:26,  3 users,  load average: 1.46, 9.24, 8.14
5 1 0x01 11 528 4185196809 408359264654694
name                            type data
dmu_tx_assigned                 4    353953276
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:53:10 up 4 days, 17:26,  3 users,  load average: 1.48, 8.87, 8.03
5 1 0x01 11 528 4185196809 408374317446813
name                            type data
dmu_tx_assigned                 4    353962077
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:53:25 up 4 days, 17:26,  3 users,  load average: 1.15, 8.43, 7.90
5 1 0x01 11 528 4185196809 408389375326823
name                            type data
dmu_tx_assigned                 4    353965799
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:53:40 up 4 days, 17:26,  3 users,  load average: 0.89, 8.02, 7.78
5 1 0x01 11 528 4185196809 408404432465523
name                            type data
dmu_tx_assigned                 4    353966564
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:53:55 up 4 days, 17:27,  3 users,  load average: 0.70, 7.63, 7.65
5 1 0x01 11 528 4185196809 408419490354769
name                            type data
dmu_tx_assigned                 4    353967280
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:54:10 up 4 days, 17:27,  3 users,  load average: 0.54, 7.25, 7.53
5 1 0x01 11 528 4185196809 408434548514514
name                            type data
dmu_tx_assigned                 4    353975078
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:54:25 up 4 days, 17:27,  3 users,  load average: 0.42, 6.90, 7.41
5 1 0x01 11 528 4185196809 408449607092162
name                            type data
dmu_tx_assigned                 4    353982849
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:54:40 up 4 days, 17:27,  3 users,  load average: 0.33, 6.56, 7.29
5 1 0x01 11 528 4185196809 408464663815867
name                            type data
dmu_tx_assigned                 4    353984276
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:54:55 up 4 days, 17:28,  3 users,  load average: 0.26, 6.24, 7.17
5 1 0x01 11 528 4185196809 408479723099894
name                            type data
dmu_tx_assigned                 4    353985096
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:55:10 up 4 days, 17:28,  3 users,  load average: 0.20, 5.93, 7.06
5 1 0x01 11 528 4185196809 408494785741898
name                            type data
dmu_tx_assigned                 4    353991855
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:55:26 up 4 days, 17:28,  3 users,  load average: 0.15, 5.64, 6.95
5 1 0x01 11 528 4185196809 408509843564576
name                            type data
dmu_tx_assigned                 4    353996372
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:55:41 up 4 days, 17:28,  3 users,  load average: 0.12, 5.37, 6.83
5 1 0x01 11 528 4185196809 408524902360005
name                            type data
dmu_tx_assigned                 4    354002809
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:55:56 up 4 days, 17:29,  3 users,  load average: 0.09, 5.10, 6.72
5 1 0x01 11 528 4185196809 408539959684468
name                            type data
dmu_tx_assigned                 4    354005751
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:56:11 up 4 days, 17:29,  3 users,  load average: 0.07, 4.85, 6.62
5 1 0x01 11 528 4185196809 408555017376424
name                            type data
dmu_tx_assigned                 4    354012879
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:56:26 up 4 days, 17:29,  3 users,  load average: 0.06, 4.61, 6.51
5 1 0x01 11 528 4185196809 408570075028865
name                            type data
dmu_tx_assigned                 4    354015577
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:56:41 up 4 days, 17:29,  3 users,  load average: 0.04, 4.39, 6.41
5 1 0x01 11 528 4185196809 408585133088338
name                            type data
dmu_tx_assigned                 4    354025307
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:56:56 up 4 days, 17:30,  3 users,  load average: 0.03, 4.17, 6.30
5 1 0x01 11 528 4185196809 408600191288682
name                            type data
dmu_tx_assigned                 4    354037905
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:57:11 up 4 days, 17:30,  3 users,  load average: 0.03, 3.97, 6.20
5 1 0x01 11 528 4185196809 408615248594178
name                            type data
dmu_tx_assigned                 4    354048242
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:57:26 up 4 days, 17:30,  3 users,  load average: 0.02, 3.77, 6.10
5 1 0x01 11 528 4185196809 408630307228432
name                            type data
dmu_tx_assigned                 4    354057458
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:57:41 up 4 days, 17:30,  3 users,  load average: 0.02, 3.59, 6.01
5 1 0x01 11 528 4185196809 408645365761476
name                            type data
dmu_tx_assigned                 4    354061363
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:57:56 up 4 days, 17:31,  3 users,  load average: 0.01, 3.41, 5.91
5 1 0x01 11 528 4185196809 408660423165043
name                            type data
dmu_tx_assigned                 4    354071167
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:58:11 up 4 days, 17:31,  3 users,  load average: 0.01, 3.25, 5.81
5 1 0x01 11 528 4185196809 408675479930347
name                            type data
dmu_tx_assigned                 4    354071876
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0
 13:58:26 up 4 days, 17:31,  3 users,  load average: 0.01, 3.09, 5.72
5 1 0x01 11 528 4185196809 408690537913429
name                            type data
dmu_tx_assigned                 4    354076468
dmu_tx_delay                    4    0
dmu_tx_error                    4    0
dmu_tx_suspended                4    0
dmu_tx_group                    4    1743
dmu_tx_memory_reserve           4    0
dmu_tx_memory_reclaim           4    0
dmu_tx_dirty_throttle           4    0
dmu_tx_dirty_delay              4    1054684
dmu_tx_dirty_over_max           4    415
dmu_tx_quota                    4    0

@dweeezil
Copy link
Contributor

@michaelxwang Did you post the output of zpool status anywhere? How many IOPS can this pool support?

@michaelxwang
Copy link
Author

@dweee and @ryao I have not posted the zpool status but here it is:

# zpool status | sed "s:^:    :"
  pool: tmp0
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    tmp0        ONLINE       0     0     0
      xvdb2     ONLINE       0     0     0
      xvdc2     ONLINE       0     0     0

errors: No known data errors

  pool: vol1
 state: ONLINE
  scan: none requested
config:

    NAME        STATE     READ WRITE CKSUM
    vol1        ONLINE       0     0     0
      xvdz      ONLINE       0     0     0

errors: No known data errors

The data resides on vol1 for the test I am doing. It is huge:

# zpool list
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
tmp0   568G   330M   568G         -     0%     0%  1.00x  ONLINE  -
vol1  3.88T   695G  3.20T         -    17%    17%  1.00x  ONLINE  -

vol1 is Amazon io1 EBS volume with Provisioned IOPS (SSD)

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html

The doc says "Consistently performs at provisioned level, up to 20,000 IOPS maximum", and we did provision it at the max 20,000 iops.

@behlendorf
Copy link
Contributor

@dweeezil there was some additional logic in zfs_remove() similar to what you suggest which was dropped, 3558fd7, during the initial ZFS port to Linux do dome VFS integration complexity. Since then the Linux VFS and the ZoL code have moved along significantly, it looks relatively straight forward to re-add this missing functionality. Take a look at the may_delete_now and delete_now variables in Illumos zfs_remove() implementation.

@dweeezil
Copy link
Contributor

@behlendorf Very interesting. I actually did poke around a bit in Illumos to see if things were different but I wasn't looking in the right place.

@alek-p
Copy link
Contributor

alek-p commented Jan 20, 2016

We've spent some time on the large file delete problem @Nexenta, and this sounds like it could be related. The core issue I've found is that there is no throttle for how many deletes get assigned to one TXG. As a results when deleting large files we end up filling consecutive TXGs with deletes/frees, then write throttling other (more important) ops.

There is an easy test case for this problem. Try deleting several large files while you do write ops on the same pool. What we've seen is performance of these write ops (let's call it sideload I/O) would drop to zero.

More specifically the problem is that dmu_free_long_range_impl() can/will fill up all of the dirty data in the pool "instantly", before many of the sideload ops can get in. So sideload performance will be impacted until all the files are freed.
The solution we have tested (with positive results) creates a relatively simple throttle for how many "free" ops we let into one TXG. If the above looks like the same issue ZoL is experiencing - we are prepared to try upstreaming our approach to @openzfs ahead of our normal upstreaming schedule.

@dweeezil
Copy link
Contributor

@alek-p I'm not totally sure what the original problem was in this issue. Is it that the unlink blocks for a long time or is its impact on other writes? I've been assuming the former but the problem you're describing sounds like the latter for which your solution sounds good.

Large deletes also cause stress because they require writes to deadlists if there are any snapshots.

@alek-p
Copy link
Contributor

alek-p commented Jan 20, 2016

@dweeezil I should have mentioned that we have a separate patch for the unlink blocking problem. It calls zfs_inactive() async for files bigger than 64 GB.
There is also a 3rd patch that will interrupt file freeing if you happen to "zpool export -f" the pool while (large) deletes are ongoing. This becomes even more relevant if you have the delete throttle patch installed since delete then take longer.
I'm not in the office part of this week but will put the patches out for review when I get back.

@michaelxwang
Copy link
Author

@ryao Here is the funclatency output with comment in /* ... */.

# unbuffer ./funclatency -f spa_sync 60 2>&1 | tee -a funclatency.txt
Tracing spa_sync latencies. Output every 60 seconds. Ctrl-C to end.

/* No rm process. */

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 1        |####                                  |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 11       |######################################|
       8 -> 16      : 0        |                                      |
      16 -> 32      : 0        |                                      |
      32 -> 64      : 2        |#######                               |
      64 -> 128     : 10       |###################################   |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 0        |                                      |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 12       |######################################|
       8 -> 16      : 0        |                                      |
      16 -> 32      : 0        |                                      |
      32 -> 64      : 0        |                                      |
      64 -> 128     : 12       |######################################|

/* enters the rm process */

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 1        |####                                  |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 11       |######################################|
       8 -> 16      : 0        |                                      |
      16 -> 32      : 0        |                                      |
      32 -> 64      : 0        |                                      |
      64 -> 128     : 6        |#####################                 |
     128 -> 256     : 0        |                                      |
     256 -> 512     : 0        |                                      |
     512 -> 1024    : 0        |                                      |
    1024 -> 2048    : 0        |                                      |
    2048 -> 4096    : 0        |                                      |
    4096 -> 8192    : 0        |                                      |
    8192 -> 16384   : 0        |                                      |
   16384 -> 32768   : 1        |####                                  |

/* A long timing followed by a few short timings */

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 5        |############################          |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 7        |######################################|

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 6        |######################################|
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 6        |######################################|

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 7        |######################################|
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 5        |############################          |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 5        |############################          |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 7        |######################################|
       8 -> 16      : 0        |                                      |
      16 -> 32      : 1        |######                                |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 7        |######################################|
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 5        |############################          |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 9        |######################################|
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 3        |#############                         |

/* rm left the process
 *
 * # ls -ls `pwd`/junk
 * 102757678 -rw-r--r-- 1 root root 378384941056 Jan 19 23:05 /srv/mysqldata/data/backup/junk
 *
 * # time rm `pwd`/junk
 * real    7m2.562s
 * user    0m0.000s
 * sys     0m4.308s
 *
 */

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 5        |############################          |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 7        |######################################|
       8 -> 16      : 0        |                                      |
      16 -> 32      : 0        |                                      |
      32 -> 64      : 1        |######                                |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 1        |####                                  |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 10       |######################################|
       8 -> 16      : 1        |####                                  |
      16 -> 32      : 1        |####                                  |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 0        |                                      |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 12       |######################################|

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 0        |                                      |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 12       |######################################|
       8 -> 16      : 0        |                                      |
      16 -> 32      : 0        |                                      |
      32 -> 64      : 1        |####                                  |
      64 -> 128     : 8        |##########################            |
     128 -> 256     : 2        |#######                               |
     256 -> 512     : 0        |                                      |
     512 -> 1024    : 1        |####                                  |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 1        |####                                  |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 11       |######################################|
       8 -> 16      : 0        |                                      |
      16 -> 32      : 0        |                                      |
      32 -> 64      : 2        |#######                               |
      64 -> 128     : 8        |############################          |
     128 -> 256     : 2        |#######                               |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 3        |#############                         |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 9        |######################################|
       8 -> 16      : 0        |                                      |
      16 -> 32      : 0        |                                      |
      32 -> 64      : 4        |#################                     |
      64 -> 128     : 8        |##################################    |

  >=(ms) .. <(ms)   : Count    |Distribution                          |
       0 -> 1       : 3        |###############                       |
       1 -> 2       : 0        |                                      |
       2 -> 4       : 0        |                                      |
       4 -> 8       : 8        |######################################|
       8 -> 16      : 1        |#####                                 |
      16 -> 32      : 0        |                                      |
      32 -> 64      : 5        |########################              |
      64 -> 128     : 7        |##################################    |

@michaelxwang
Copy link
Author

This post is to confirm a fact that @dweeezil mentioned. In one session I did a tail -f on a large file, and on another session I issued rm on the file. The deletion is immediate and there is no load nor blocking issue. However, when I tried to control-c on the tail -f session, it "hangs" there, and there are the normal load and blocking associated with removing the large file while the file is not open.

@dweeezil In your test scenario, have you reproduced both problems: (a) Removing a large file takes long; (b) The other processes are blocked while deletion is in progress? To me, they are the same issue. While deleting a large file, I cannot vi a new file on the same filesystem but I can vi a file on another system. This tells me that this is not a global CPU or memory issue but the filesystem IO issue. Thanks for looking into this problem.

@kernelOfTruth
Copy link
Contributor

Key

NEX-3890

Description

Users performing several parallel
deletions of very large files built with
small recordsize can see performance
degradation during the delete

Workaround

  1. Serialize delete operations
  2. Use a larger recordsize (for example, 128k)

@kernelOfTruth
Copy link
Contributor

referencing: #3725 (comment)

@alek-p
Copy link
Contributor

alek-p commented Jan 20, 2016

I was able to get to this last night, see openzfs/openzfs#61

@michaelxwang
Copy link
Author

@kernelOfTruth Users performing even ONE
deletion of a very large file built with
small recordsize can see performance
degradation during the delete.

@dweeezil
Copy link
Contributor

I took a look at @alek-p's patch and am going to try porting it to ZoL (atop @ryao's port of illumos 4950 on which it sort of depends).

@kernelOfTruth
Copy link
Contributor

referencing: #4259 Illumos zfs_remove() patches

@behlendorf
Copy link
Contributor

Thanks @kernelOfTruth I meant to link this. The patches in #4259 are lightly tests but they should help in part with this issue by making the unlinks of very large files asynchronous even if no other process has the file open.

We should definitely work on porting the dmu bits of @alek-p's patch as well. The whole thing isn't completely applicable because of significant VFS differences but throttle the number of deletes in a txg is definitely a good idea.

@alek-p
Copy link
Contributor

alek-p commented Jan 27, 2016

the biggest issue I see with there VFS differences is that there seems to be no VFS flags (particularly VFS_UNMOUNTED) . Is there an existing way to tell that a fs is unmounting/unmounted?

I can port the delete throttle patch by itself to start with, but that exposes the issues I've mentioned in the pull request.

@behlendorf
Copy link
Contributor

@alek-p right this is one of those small but critical differences in Linux the VFS model. Most notably there isn't a 1:1 mapping between the mounted namespace and the filesystem super block (zfsvfs_t). There is a single super block for the filesystem but there can be multiple mounted namespaces for that super block. That means there isn't a simple mapping back to a single mounted namespace.

This is managed in the super block with the usual sort of reference counting scheme with the namespaces each taking a reference. Once they all drop their references the super block could unmounted/destroyed. Additionally, file handles hold references on dentries, which hold references on inodes, which hold references on the super block. All these references need to be dropped before the filesystem can be unmounted. I need to spend some time looking at your patch but it may be that this case just can't happen under Linux.

@ryao
Copy link
Contributor

ryao commented Jan 28, 2016

@michaelxwang I was tied down chasing bugs at work last week. As a belated reply, we probably could use data from /proc/spl/kstat/zfs/vol1/dmu_tx_assign while such an unlink is running, but everything you have posted so far is not inconsistent with what I would expect to see. I say not inconsistent rather than consistent because I am uncertain if the the law of excluded middle holds in this situation.

@alek-p To elaborate on what @behlendorf said, the tree of mounted filesystems is a per process thing on Linux that is inherited across fork() and allows filesystems to be mounted in multiple trees at once. Therefore, being mounted/unmounted is not a boolean property on Linux. The closest thing to VFS_UNMOUNTED would be ->s_count == 0 in the struct super_block, although some attention likely would need to be paid to when this is decremented as part of the umount process and and it would probably be better to hook zfs_preumount().

@jonathanvaughn
Copy link

I'm running into an apparently similar problem with txg_sync etc, when rm -rf large tree of files, if I don't pause rm process every few seconds txg_sync and friends will start taking some CPU (but little actual writes) and load starts to skyrocket, even though the actual CPU usage is still rather low (90%+ idle). If I do nothing, deleting 50k or so files / directories may lead to a total system hang it never recovers from. Sometimes the hang comes some time after rm has actually completed.

I seem to have 24x each z_fr_iss_N, z_rd_int_N, z_wr_int_N where N is 0..7 (as it is an 8-thread CPU). In my case dedupe is enabled, not sure if this is part of the problem (I assume the DDT must be searched and refcounts reduced, and then particular block hashes removed if ref count is zero, etc). In total there are 655 z_* processes.

@nwf
Copy link
Contributor

nwf commented May 21, 2016

@jonathanvaughn: I suspect, given that you have dedup turned on, you are running into #3725, wherein I noted "dmu_tx_count_free does not appear to have any estimation of the amount of DDT work that will be required per transaction, so a whole lot of predicted-to-be-small transactions can get dmu_tx_assign-ed to a txg and then they all have to work on the DDT" If, during your userland I/O stalls or hangs, you see your disks frantically seeking and anon_size very slowly inch upwards forever, well past its limits, that is probably what's going on. I'd be curious to know if my hack to limit the number of range free calls per unlink per txg helps you.

@michaelxwang
Copy link
Author

@nwf and @jonathanvaughn and @ryao and @behlendorf and @alek-p and @kernelOfTruth and @dweeezil and @lnxbil

As you can see from output below, I had problem in ZFS 0.6.5.4 (rebooted after installation of this version) and resolved in ZFS 0.6.5.7 (after the version was installed and rebooted). The problem could have been resolved in version 0.6.5.6 or 0.6.5.5, but I do not know as I did not reboot after these versions and test it. The ZFS version is only effective after reboot. Can you tell me from release note which fix resolved the problem?

As the problem is still not resolved (not marked resolved at this post), then it must be a different problem. I always have dedup disabled from the beginning.

Thank you for your attention for the problem.

Tested on [R-06]

Installed ZFS 0.6.5.4 and rebooted
Installed ZFS 0.6.5.5 without reboot
Installed ZFS 0.6.5.6 without reboot
Installed ZFS 0.6.5.7 without reboot

# cp -i ../*old*.ibd junk
# date; ls -ls `pwd`/junk; time rm `pwd`/junk; date

Tue Jun 21 00:32:11 UTC 2016
102591455 -rw-r--r-- 1 root root 300425412608 Jun 16 11:24 /srv/mysqldata/data/assistly/junk/junk

real    5m10.683s
user    0m0.000s
sys     0m3.412s
Tue Jun 21 00:37:21 UTC 2016

# mysqladmin shutdown
# reboot

# cp -i ../*old*.ibd junk
# date; ls -ls `pwd`/junk; time rm `pwd`/junk; date

Tue Jun 21 01:52:38 UTC 2016
102578209 -rw-r--r-- 1 root root 300425412608 Jun 21 01:45 /srv/mysqldata/data/assistly/junk/junk

real    0m3.396s
user    0m0.000s
sys     0m3.392s
Tue Jun 21 01:52:42 UTC 2016

@therealdano
Copy link

In reference to behlendorf's comment earlier, I had been experiencing the issue where deleting large files was causing a performance issue resulting in the NFS server for the ZFS filesystem becoming unresponsive. This was a very repeatable issue for me. I rsync files from several servers as backups, so there are multiple copies of essentially the same files grouped by the date. I would delete the oldest set and see "NFS server not responding" messages every time. I applied the fixes from commit 1a04bab. I have been running batch deletes all week without the NFS server becoming unresponsive once. Although not a rigorous test, I thought the information might be helpful.

@behlendorf behlendorf added the Type: Performance Performance improvement or performance problem label Nov 8, 2016
@behlendorf
Copy link
Contributor

Closing. This issue was resolved by the patches referenced above for the latest 0.6.5.x versions and 0.7.

@davidblewett
Copy link

@behlendorf We are experiencing this exact issue right now, against ZFS pools on single EBS volumes. In our case, I was cleaning up PostgreSQL WAL archives on a few machines (in one case, 255 directories with 255 files inside). This caused all IO on the EBS volume to stop for an extended period (several minutes). I had to cancel the operation, and instead destroy the dedicated ZFS filesystem and re-create it. We are using 0.6.5.11-1~trusty (packaged from https://launchpad.net/~zfs-native/+archive/ubuntu/stable ).

@alek-p
Copy link
Contributor

alek-p commented Sep 12, 2017

@davidblewett you may hove gotten bit by the large file delete problem, see: #5706
This issue is fixed in zfs-0.7.0-rc4 and later.

@dghodgson
Copy link

dghodgson commented Nov 26, 2017

What constitutes a large file? I'm deleting folders with about 12 or so 800MB files, and I'm also seeing very high IO in iotop from txg_sync. Deleting the files is taking a while (about one second per folder).

This is on zfs-0.7.3-1.el7_4 on CentOS 7.4. Is this related?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests