Repeatably hang user processes, using memory-mapped I/O and fsync. #8751

darrenfreeman · 2019-05-16T10:02:38Z

System information

Type	Version/Name
Distribution Name	Debian
Distribution Version	9.9
Linux Kernel	4.9.0-9-amd64
Architecture	Xeon 8-core SMP x86-64
ZFS Version	git `ca6c7a9`

Describe the problem you're observing

Complete failure to perform further FS I/O on the same dataset, after running a benchmarking test. Benchmarking processes are defunct, kill -9 has no effect on them.

Describe how to reproduce the problem

Create an unused dataset on a pool with 2x Samsung SSD 850 EVO 500GB configured as independent vdevs, plus a 32 GB Optane NVMe module as SLOG.

  pool: ssd
 state: ONLINE
  scan: scrub repaired 0B in 0 days 00:30:30 with 0 errors on Wed May 15 12:00:32 2019
config:

        NAME        STATE     READ WRITE CKSUM
        ssd         ONLINE       0     0     0
          ssd0      ONLINE       0     0     0
          ssd1      ONLINE       0     0     0
        logs
          ssd-log   ONLINE       0     0     0

errors: No known data errors

I am attempting to simulate a worst case workload from Postgresql 11. I believe it uses mmap for file I/O, but I may be mistaken. Page size is 8 KiB.

Run the following benchmark:
fio --name TEST --eta-newline=5s --filename=fio-tempfile.dat --rw=randwrite --size=512m --io_size=10g --blocksize=8k --ioengine=libaio --fsync=1 --iodepth=1 --direct=1 --numjobs=8 --runtime=60 --group_reporting

This will succeed with the SLOG getting up to a couple of GB used, nice high IOPS.

Now run it again with --ioengine=mmap. There is an initial burst of write I/O to the pool, then the numbers all drop to zero, and it seems very much stalled. Occasionally there is a burst of I/O, then nothing. (I think this is happening to another dataset on the same pool.)

The fio job can't be killed with ctrl-C, the processes don't die with kill -9, but they can be made to stop outputting to the shell with kill -9. ps aux reveals a status of "Ds".

Now the previously successful fio test will also suffer the same fate if run again. However it can be run on a different pool which also uses the Optane as SLOG, and it succeeds. I'm not game to try mmap again.

The postgresql DB on the SSD pool seems to be still running, it can generate reasonable looking I/O numbers. It is in a different dataset from the one used for benchmarking. I think maybe the benchmarking throw-away dataset could be hosed though.

Since the host appears to still be up and providing services, I will leave it like this for a day or two, in case you have some tests you'd like me to perform in this state.

Include any warning/errors/backtraces from the system logs

Here are the first few kernel stacktraces from journalctl, they are not identical even for the different workers of fio.

May 16 19:18:36 host kernel: INFO: task kworker/u16:9:32756 blocked for more than 120 seconds.
May 16 19:18:36 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 16 19:18:36 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 19:18:36 host kernel: kworker/u16:9   D    0 32756      2 0x00000000
May 16 19:18:36 host kernel: Workqueue: writeback wb_workfn (flush-zfs-48)
May 16 19:18:36 host kernel:  ffff8c89e2f19c00 0000000000000000 ffff8c8ab2b5c400 ffff8c8adfa58980
May 16 19:18:36 host kernel:  ffff8c8abbc3e180 ffff9d1897a8f8e0 ffffffffb6415a99 0000000000000000
May 16 19:18:36 host kernel:  0000a36577c22e00 ffff8c8adfa58980 0007ea6fb541d8ef ffff8c8ab2b5c400
May 16 19:18:36 host kernel: Call Trace:
May 16 19:18:36 host kernel:  [<ffffffffb6415a99>] ? __schedule+0x239/0x6f0
May 16 19:18:36 host kernel:  [<ffffffffb6416750>] ? bit_wait+0x50/0x50
May 16 19:18:36 host kernel:  [<ffffffffb6415f82>] ? schedule+0x32/0x80
May 16 19:18:36 host kernel:  [<ffffffffb641931d>] ? schedule_timeout+0x1dd/0x380
May 16 19:18:36 host kernel:  [<ffffffffb5ef139e>] ? ktime_get+0x3e/0xb0
May 16 19:18:36 host kernel:  [<ffffffffb6416750>] ? bit_wait+0x50/0x50
May 16 19:18:36 host kernel:  [<ffffffffb64157fd>] ? io_schedule_timeout+0x9d/0x100
May 16 19:18:36 host kernel:  [<ffffffffb5ebd2d7>] ? prepare_to_wait_exclusive+0x57/0x80
May 16 19:18:36 host kernel:  [<ffffffffb6416767>] ? bit_wait_io+0x17/0x60
May 16 19:18:36 host kernel:  [<ffffffffb64163a6>] ? __wait_on_bit_lock+0x56/0xc0
May 16 19:18:36 host kernel:  [<ffffffffb5f82d48>] ? find_get_pages_tag+0x158/0x2e0
May 16 19:18:36 host kernel:  [<ffffffffb5f822e2>] ? __lock_page+0x82/0xa0
May 16 19:18:36 host kernel:  [<ffffffffb5ebd680>] ? wake_atomic_t_function+0x60/0x60
May 16 19:18:36 host kernel:  [<ffffffffb5f91735>] ? write_cache_pages+0x2d5/0x490
May 16 19:18:36 host kernel:  [<ffffffffc1bd0b20>] ? zpl_write_common_iovec+0xf0/0xf0 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5eb64f1>] ? update_group_capacity+0x21/0x1c0
May 16 19:18:36 host kernel:  [<ffffffffb6135046>] ? cpumask_next_and+0x26/0x40
May 16 19:18:36 host kernel:  [<ffffffffb5eb6776>] ? update_sd_lb_stats+0xe6/0x4b0
May 16 19:18:36 host kernel:  [<ffffffffc1bd0eb7>] ? zpl_writepages+0x87/0x150 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb603b0fd>] ? __writeback_single_inode+0x3d/0x320
May 16 19:18:36 host kernel:  [<ffffffffb5eb6b7e>] ? find_busiest_group+0x3e/0x4d0
May 16 19:18:36 host kernel:  [<ffffffffb603b8d1>] ? writeback_sb_inodes+0x221/0x4f0
May 16 19:18:36 host kernel:  [<ffffffffb603bc27>] ? __writeback_inodes_wb+0x87/0xb0
May 16 19:18:36 host kernel:  [<ffffffffb603bf9e>] ? wb_writeback+0x27e/0x310
May 16 19:18:36 host kernel:  [<ffffffffb602782c>] ? get_nr_inodes+0x3c/0x60
May 16 19:18:36 host kernel:  [<ffffffffb603c904>] ? wb_workfn+0x2b4/0x380
May 16 19:18:36 host kernel:  [<ffffffffb5e9450a>] ? process_one_work+0x18a/0x430
May 16 19:18:36 host kernel:  [<ffffffffb5e947fd>] ? worker_thread+0x4d/0x490
May 16 19:18:36 host kernel:  [<ffffffffb5e947b0>] ? process_one_work+0x430/0x430
May 16 19:18:36 host kernel:  [<ffffffffb5e9a869>] ? kthread+0xd9/0xf0
May 16 19:18:36 host kernel:  [<ffffffffb641a964>] ? __switch_to_asm+0x34/0x70
May 16 19:18:36 host kernel:  [<ffffffffb5e9a790>] ? kthread_park+0x60/0x60
May 16 19:18:36 host kernel:  [<ffffffffb641a9f7>] ? ret_from_fork+0x57/0x70

May 16 19:18:36 host kernel: INFO: task fio:6532 blocked for more than 120 seconds.
May 16 19:18:36 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 16 19:18:36 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 19:18:36 host kernel: fio             D    0  6532      1 0x00000004
May 16 19:18:36 host kernel:  ffff8c89e2f19400 0000000000000000 ffff8c86ed499040 ffff8c8adfbd8980
May 16 19:18:36 host kernel:  ffff8c8abbc6a240 ffff9d1860537998 ffffffffb6415a99 ffff9d18605379e8
May 16 19:18:36 host kernel:  00ffffffc0c88203 ffff8c8adfbd8980 ffff9d18605379b8 ffff8c86ed499040
May 16 19:18:36 host kernel: Call Trace:
May 16 19:18:36 host kernel:  [<ffffffffb6415a99>] ? __schedule+0x239/0x6f0
May 16 19:18:36 host kernel:  [<ffffffffb6415f82>] ? schedule+0x32/0x80
May 16 19:18:36 host kernel:  [<ffffffffc0c872af>] ? cv_wait_common+0x11f/0x140 [spl]
May 16 19:18:36 host kernel:  [<ffffffffb5ebd5e0>] ? prepare_to_wait_event+0xf0/0xf0
May 16 19:18:36 host kernel:  [<ffffffffc1ba5207>] ? rangelock_enter+0x3d7/0x540 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffc1bb5b4f>] ? zfs_get_data+0x11f/0x330 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffc1bc00cf>] ? zil_commit_impl+0x8ff/0xca0 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffc1bb25a5>] ? zfs_putpage+0x4f5/0x500 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffc1bb20a0>] ? zfs_link+0x6c0/0x6c0 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffc1bd0b51>] ? zpl_putpage+0x31/0x40 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5f91622>] ? write_cache_pages+0x1c2/0x490
May 16 19:18:36 host kernel:  [<ffffffffc1bd0b20>] ? zpl_write_common_iovec+0xf0/0xf0 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb6417dac>] ? __mutex_unlock_slowpath+0x6c/0x90
May 16 19:18:36 host kernel:  [<ffffffffc1b3d4ab>] ? rrw_exit+0x5b/0x140 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5f84a18>] ? __filemap_fdatawrite_range+0xc8/0x100
May 16 19:18:36 host kernel:  [<ffffffffb5f84b53>] ? filemap_write_and_wait_range+0x33/0x70
May 16 19:18:36 host kernel:  [<ffffffffc1bd1067>] ? zpl_fsync+0x37/0x90 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5fc6b94>] ? SyS_msync+0x184/0x200
May 16 19:18:36 host kernel:  [<ffffffffb5e03b7d>] ? do_syscall_64+0x8d/0xf0
May 16 19:18:36 host kernel:  [<ffffffffb641a84e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

May 16 19:18:36 host kernel: INFO: task fio:6533 blocked for more than 120 seconds.
May 16 19:18:36 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 16 19:18:36 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 19:18:36 host kernel: fio             D    0  6533      1 0x00000004
May 16 19:18:36 host kernel:  ffff8c887efc8c00 0000000000000000 ffff8c87c222e2c0 ffff8c8adfbd8980
May 16 19:18:36 host kernel:  ffff8c8abbc6a240 ffff9d186047ba10 ffffffffb6415a99 ffffffffb641a964
May 16 19:18:36 host kernel:  00ffffffb641a970 ffff8c8adfbd8980 ffffffffb5e2568f ffff8c87c222e2c0
May 16 19:18:36 host kernel: Call Trace:
May 16 19:18:36 host kernel:  [<ffffffffb6415a99>] ? __schedule+0x239/0x6f0
May 16 19:18:36 host kernel:  [<ffffffffb641a964>] ? __switch_to_asm+0x34/0x70
May 16 19:18:36 host kernel:  [<ffffffffb5e2568f>] ? __switch_to+0x1ef/0x660
May 16 19:18:36 host kernel:  [<ffffffffb6416750>] ? bit_wait+0x50/0x50
May 16 19:18:36 host kernel:  [<ffffffffb6415f82>] ? schedule+0x32/0x80
May 16 19:18:36 host kernel:  [<ffffffffb641931d>] ? schedule_timeout+0x1dd/0x380
May 16 19:18:36 host kernel:  [<ffffffffb6415aa1>] ? __schedule+0x241/0x6f0
May 16 19:18:36 host kernel:  [<ffffffffb6416750>] ? bit_wait+0x50/0x50
May 16 19:18:36 host kernel:  [<ffffffffb64157fd>] ? io_schedule_timeout+0x9d/0x100
May 16 19:18:36 host kernel:  [<ffffffffb5ebd2d7>] ? prepare_to_wait_exclusive+0x57/0x80
May 16 19:18:36 host kernel:  [<ffffffffb6416767>] ? bit_wait_io+0x17/0x60
May 16 19:18:36 host kernel:  [<ffffffffb64163a6>] ? __wait_on_bit_lock+0x56/0xc0
May 16 19:18:36 host kernel:  [<ffffffffb5f822e2>] ? __lock_page+0x82/0xa0
May 16 19:18:36 host kernel:  [<ffffffffb5ebd680>] ? wake_atomic_t_function+0x60/0x60
May 16 19:18:36 host kernel:  [<ffffffffc1bb23e5>] ? zfs_putpage+0x335/0x500 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5fc703f>] ? rmap_walk_file+0x19f/0x260
May 16 19:18:36 host kernel:  [<ffffffffb5fc9704>] ? page_mkclean+0xa4/0xc0
May 16 19:18:36 host kernel:  [<ffffffffb5fc78e0>] ? __page_check_address+0x1f0/0x1f0
May 16 19:18:36 host kernel:  [<ffffffffc1bd0b51>] ? zpl_putpage+0x31/0x40 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5f91622>] ? write_cache_pages+0x1c2/0x490
May 16 19:18:36 host kernel:  [<ffffffffc1bd0b20>] ? zpl_write_common_iovec+0xf0/0xf0 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5ebd094>] ? __wake_up+0x34/0x50
May 16 19:18:36 host kernel:  [<ffffffffb6417d1e>] ? mutex_lock+0xe/0x30
May 16 19:18:36 host kernel:  [<ffffffffc0c87280>] ? cv_wait_common+0xf0/0x140 [spl]
May 16 19:18:36 host kernel:  [<ffffffffc1bd0eb7>] ? zpl_writepages+0x87/0x150 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5f84a18>] ? __filemap_fdatawrite_range+0xc8/0x100
May 16 19:18:36 host kernel:  [<ffffffffb5f84b53>] ? filemap_write_and_wait_range+0x33/0x70
May 16 19:18:36 host kernel:  [<ffffffffc1bd1067>] ? zpl_fsync+0x37/0x90 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5fc6b94>] ? SyS_msync+0x184/0x200
May 16 19:18:36 host kernel:  [<ffffffffb5e03b7d>] ? do_syscall_64+0x8d/0xf0
May 16 19:18:36 host kernel:  [<ffffffffb641a84e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

May 16 19:18:36 host kernel: INFO: task fio:6534 blocked for more than 120 seconds.
May 16 19:18:36 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 16 19:18:36 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 16 19:18:36 host kernel: fio             D    0  6534      1 0x00000004
May 16 19:18:36 host kernel:  ffff8c887efc8400 0000000000000000 ffff8c88d762d080 ffff8c8adfb98980
May 16 19:18:36 host kernel:  ffff8c8abbc6af80 ffff9d186053fc48 ffffffffb6415a99 0000000000000246
May 16 19:18:36 host kernel:  0000000000000003 ffff8c8adfb98980 0000000000000000 ffff8c88d762d080
May 16 19:18:36 host kernel: Call Trace:
May 16 19:18:36 host kernel:  [<ffffffffb6415a99>] ? __schedule+0x239/0x6f0
May 16 19:18:36 host kernel:  [<ffffffffb6415f82>] ? schedule+0x32/0x80
May 16 19:18:36 host kernel:  [<ffffffffb641623a>] ? schedule_preempt_disabled+0xa/0x10
May 16 19:18:36 host kernel:  [<ffffffffb6417c94>] ? __mutex_lock_slowpath+0xb4/0x130
May 16 19:18:36 host kernel:  [<ffffffffb6417d2b>] ? mutex_lock+0x1b/0x30
May 16 19:18:36 host kernel:  [<ffffffffc1bbfb72>] ? zil_commit_impl+0x3a2/0xca0 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5ebd01f>] ? __wake_up_common+0x4f/0x90
May 16 19:18:36 host kernel:  [<ffffffffb5ebd094>] ? __wake_up+0x34/0x50
May 16 19:18:36 host kernel:  [<ffffffffc1bd0ef5>] ? zpl_writepages+0xc5/0x150 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5f84a18>] ? __filemap_fdatawrite_range+0xc8/0x100
May 16 19:18:36 host kernel:  [<ffffffffb5f84b53>] ? filemap_write_and_wait_range+0x33/0x70
May 16 19:18:36 host kernel:  [<ffffffffc1bd1067>] ? zpl_fsync+0x37/0x90 [zfs]
May 16 19:18:36 host kernel:  [<ffffffffb5fc6b94>] ? SyS_msync+0x184/0x200
May 16 19:18:36 host kernel:  [<ffffffffb5e03b7d>] ? do_syscall_64+0x8d/0xf0
May 16 19:18:36 host kernel:  [<ffffffffb641a84e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

The text was updated successfully, but these errors were encountered:

darrenfreeman · 2019-05-20T07:04:51Z

The system is still up, seems perfectly stable apart from the hung processes mentioned above. Even the stuck dataset is available for reading, creating, deleting files.

ps aux | grep D shows only the expected fio processes and the kworker/u16 as being in the D state.

I note there are a few open "everything hangs" issues, with difficulty in replicating the issue. E.g. #8556, #8469, #7038, #7484, #8321, #7924. I have also had this machine hang under high I/O, and so far never found a cause. High I/O seems to be related. Notably, #7484 mentions fsync as a trigger.

Hoping to have found a smoking gun, I repeated the benchmarks, and got the same result. First, created empty dataset ssd/benchmark2 and changed directory to there. Ran the first fio benchmark 10x times in a row, no issues. Changed to mmap, and it died at about the 50% mark, on the first attempt. Numbers all dropped to zero, and the fio output just says e.g.:

Jobs: 8 (f=8): [w(8)] [0.8% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 05h:54m:18s]

This is an example of what zpool iostat -v 5 showed during the successful benchmarks:

ssd                                                                      506G   422G      0  3.88K      0   256M
  ssd0                                                                   284G   180G      0  1.23K      0   117M
  ssd1                                                                   223G   241G      0    735      0  74.7M
logs                                                                        -      -      -      -      -      -
  ssd-log                                                                518M  3.24G      0  1.94K      0  64.1M

This is the output of the final successful test:

TEST: (groupid=0, jobs=8): err= 0: pid=18265: Mon May 20 16:58:27 2019
  write: io=3308.9MB, bw=56469KB/s, iops=7058, runt= 60002msec
    slat (usec): min=19, max=72029, avg=212.82, stdev=894.78
    clat (usec): min=0, max=36598, avg= 4.56, stdev=106.54
     lat (usec): min=21, max=72660, avg=217.38, stdev=905.34
    clat percentiles (usec):
     |  1.00th=[    2],  5.00th=[    2], 10.00th=[    2], 20.00th=[    2],
     | 30.00th=[    3], 40.00th=[    3], 50.00th=[    3], 60.00th=[    3],
     | 70.00th=[    4], 80.00th=[    4], 90.00th=[    5], 95.00th=[    7],
     | 99.00th=[   14], 99.50th=[   18], 99.90th=[  124], 99.95th=[  290],
     | 99.99th=[ 1272]
    lat (usec) : 2=0.19%, 4=64.35%, 10=33.27%, 20=1.79%, 50=0.25%
    lat (usec) : 100=0.04%, 250=0.05%, 500=0.03%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
  cpu          : usr=0.95%, sys=13.64%, ctx=1037764, majf=0, minf=59
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=423528/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=3308.9MB, aggrb=56468KB/s, minb=56468KB/s, maxb=56468KB/s, mint=60002msec, maxt=60002msec

h1z1 · 2019-05-20T14:13:44Z

Can confirm the hung IO has been around for a long, long time. Had it happen on Thursday, the only "fix" is to reboot which is maddening. What I still fail to understand is why things like the deadman timer do not kick in even after the entire volume hangs.

darrenfreeman · 2019-05-20T15:56:33Z

I'm curious to know, can anyone else reproduce the issue, by running the fio benchmarks that I gave above?

prgwiz · 2019-05-20T18:00:57Z

From my experiences doing a zfs send from a server with 10 ssd's in raidz2 to a backup server that has 4 sata HDD's in raidz over a 10G network with jumbo frames will crash the backup server unless you nice the sshd on the backup server and ionice the zfs recv. I would imagine using mbuffer instead of ssh would "lighten the load" also. The backup server has no chance if it is running scrub at the same time as a recv. This is with ubuntu 18.x through 19.04. I cannot get 19.04 to even complete a recv with the 5.x kernel, had to drop to the 4.x.oem kernel.

…

On Mon, May 20, 2019 at 11:56 AM darrenfreeman ***@***.***> wrote: I'm curious to know, can anyone else reproduce the issue, by running the fio benchmarks that I gave above? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8751>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABTQJT6BRUI3VAXQUHMRLADPWLC4JANCNFSM4HNK6CNQ> .

-- Regards -------------------------------------- Gerald Drouillard Cell: 734.502.4356

behlendorf · 2019-05-24T17:40:04Z

@darrenfreeman testing on Ubuntu 18.04, with the final 0.8.0 release, I wasn't able to reproduce the issue using your reproducer. Would it be possible for you to update to zfs-0.8.0 and rerun the test in your environment to confirm it is still an issue. Based on your stack traces it looks like a deadlock so I may just not have been able to hit it in my environment.

darrenfreeman · 2019-05-25T05:22:06Z

Had to do a hard reset after lots of error messages about being unable to unmount the pool.

Rebooted into zfs-0.8.0 (compiled from the git tagged release). Also a slightly newer kernel, 4.9.0-9-amd64.

Can still reproduce the issue: 10x successful benchmarks, change to mmap, died about halfway in.

Including some fresh kernel stack traces just in case there's anything new:

May 25 15:13:12 host kernel: INFO: task kworker/u16:1:141 blocked for more than 120 seconds.
May 25 15:13:12 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 25 15:13:12 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 25 15:13:12 host kernel: kworker/u16:1   D    0   141      2 0x00000000
May 25 15:13:12 host kernel: Workqueue: writeback wb_workfn (flush-zfs-43)
May 25 15:13:12 host kernel:  ffff925734829000 0000000000000000 ffff92577555a080 ffff92579fa98980
May 25 15:13:12 host kernel:  ffff92577bc401c0 ffffb54e4395f8e0 ffffffffadc15a99 0000000000000000
May 25 15:13:12 host kernel:  0005f41a085f5ee9 ffff92579fa98980 000817550cc24122 ffff92577555a080
May 25 15:13:12 host kernel: Call Trace:
May 25 15:13:12 host kernel:  [<ffffffffadc15a99>] ? __schedule+0x239/0x6f0
May 25 15:13:12 host kernel:  [<ffffffffadc16750>] ? bit_wait+0x50/0x50
May 25 15:13:12 host kernel:  [<ffffffffadc15f82>] ? schedule+0x32/0x80
May 25 15:13:12 host kernel:  [<ffffffffadc1931d>] ? schedule_timeout+0x1dd/0x380
May 25 15:13:12 host kernel:  [<ffffffffad6f139e>] ? ktime_get+0x3e/0xb0
May 25 15:13:12 host kernel:  [<ffffffffadc16750>] ? bit_wait+0x50/0x50
May 25 15:13:12 host kernel:  [<ffffffffadc157fd>] ? io_schedule_timeout+0x9d/0x100
May 25 15:13:12 host kernel:  [<ffffffffad6bd2d7>] ? prepare_to_wait_exclusive+0x57/0x80
May 25 15:13:12 host kernel:  [<ffffffffadc16767>] ? bit_wait_io+0x17/0x60
May 25 15:13:12 host kernel:  [<ffffffffadc163a6>] ? __wait_on_bit_lock+0x56/0xc0
May 25 15:13:12 host kernel:  [<ffffffffad782d48>] ? find_get_pages_tag+0x158/0x2e0
May 25 15:13:12 host kernel:  [<ffffffffad7822e2>] ? __lock_page+0x82/0xa0
May 25 15:13:12 host kernel:  [<ffffffffad6bd680>] ? wake_atomic_t_function+0x60/0x60
May 25 15:13:12 host kernel:  [<ffffffffad791735>] ? write_cache_pages+0x2d5/0x490
May 25 15:13:12 host kernel:  [<ffffffffc20f9320>] ? zpl_write_common_iovec+0x100/0x100 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad935046>] ? cpumask_next_and+0x26/0x40
May 25 15:13:12 host kernel:  [<ffffffffad6acc72>] ? update_cfs_rq_load_avg+0x212/0x490
May 25 15:13:12 host kernel:  [<ffffffffc20f9707>] ? zpl_writepages+0x87/0x150 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad83b0fd>] ? __writeback_single_inode+0x3d/0x320
May 25 15:13:12 host kernel:  [<ffffffffad83b8d1>] ? writeback_sb_inodes+0x221/0x4f0
May 25 15:13:12 host kernel:  [<ffffffffad83bc27>] ? __writeback_inodes_wb+0x87/0xb0
May 25 15:13:12 host kernel:  [<ffffffffad83bf9e>] ? wb_writeback+0x27e/0x310
May 25 15:13:12 host kernel:  [<ffffffffad82782c>] ? get_nr_inodes+0x3c/0x60
May 25 15:13:12 host kernel:  [<ffffffffad83c904>] ? wb_workfn+0x2b4/0x380
May 25 15:13:12 host kernel:  [<ffffffffad69450a>] ? process_one_work+0x18a/0x430
May 25 15:13:12 host kernel:  [<ffffffffad6947fd>] ? worker_thread+0x4d/0x490
May 25 15:13:12 host kernel:  [<ffffffffad6947b0>] ? process_one_work+0x430/0x430
May 25 15:13:12 host kernel:  [<ffffffffad69a869>] ? kthread+0xd9/0xf0
May 25 15:13:12 host kernel:  [<ffffffffadc1a964>] ? __switch_to_asm+0x34/0x70
May 25 15:13:12 host kernel:  [<ffffffffad69a790>] ? kthread_park+0x60/0x60
May 25 15:13:12 host kernel:  [<ffffffffadc1a9f7>] ? ret_from_fork+0x57/0x70

May 25 15:13:12 host kernel: INFO: task fio:13382 blocked for more than 120 seconds.
May 25 15:13:12 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 25 15:13:12 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 25 15:13:12 host kernel: fio             D    0 13382  13380 0x00000000
May 25 15:13:12 host kernel:  ffff9256e4b12400 0000000000000000 ffff9255ca3060c0 ffff92579fa98980
May 25 15:13:12 host kernel:  ffff92577bc401c0 ffffb54e43153c30 ffffffffadc15a99 0000000000000202
May 25 15:13:12 host kernel:  00eb1ff915fd8fbf ffff92579fa98980 ffffffffae205678 ffff9255ca3060c0
May 25 15:13:12 host kernel: Call Trace:
May 25 15:13:12 host kernel:  [<ffffffffadc15a99>] ? __schedule+0x239/0x6f0
May 25 15:13:12 host kernel:  [<ffffffffadc16750>] ? bit_wait+0x50/0x50
May 25 15:13:12 host kernel:  [<ffffffffadc15f82>] ? schedule+0x32/0x80
May 25 15:13:12 host kernel:  [<ffffffffadc1931d>] ? schedule_timeout+0x1dd/0x380
May 25 15:13:12 host kernel:  [<ffffffffad782d48>] ? find_get_pages_tag+0x158/0x2e0
May 25 15:13:12 host kernel:  [<ffffffffadc16750>] ? bit_wait+0x50/0x50
May 25 15:13:12 host kernel:  [<ffffffffadc157fd>] ? io_schedule_timeout+0x9d/0x100
May 25 15:13:12 host kernel:  [<ffffffffad6bd217>] ? prepare_to_wait+0x57/0x80
May 25 15:13:12 host kernel:  [<ffffffffadc16767>] ? bit_wait_io+0x17/0x60
May 25 15:13:12 host kernel:  [<ffffffffadc16325>] ? __wait_on_bit+0x55/0x80
May 25 15:13:12 host kernel:  [<ffffffffad782d48>] ? find_get_pages_tag+0x158/0x2e0
May 25 15:13:12 host kernel:  [<ffffffffad781faf>] ? wait_on_page_bit+0x7f/0xa0
May 25 15:13:12 host kernel:  [<ffffffffad6bd680>] ? wake_atomic_t_function+0x60/0x60
May 25 15:13:12 host kernel:  [<ffffffffad7820b0>] ? __filemap_fdatawait_range+0xe0/0x140
May 25 15:13:12 host kernel:  [<ffffffffad784a24>] ? __filemap_fdatawrite_range+0xd4/0x100
May 25 15:13:12 host kernel:  [<ffffffffad78211f>] ? filemap_fdatawait_range+0xf/0x30
May 25 15:13:12 host kernel:  [<ffffffffad784b69>] ? filemap_write_and_wait_range+0x49/0x70
May 25 15:13:12 host kernel:  [<ffffffffc20f98b7>] ? zpl_fsync+0x37/0x90 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad7c6b94>] ? SyS_msync+0x184/0x200
May 25 15:13:12 host kernel:  [<ffffffffad603b7d>] ? do_syscall_64+0x8d/0xf0
May 25 15:13:12 host kernel:  [<ffffffffadc1a84e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

May 25 15:13:12 host kernel: INFO: task fio:13383 blocked for more than 120 seconds.
May 25 15:13:12 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 25 15:13:12 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 25 15:13:12 host kernel: fio             D    0 13383  13380 0x00000000
May 25 15:13:12 host kernel:  ffff9256e88fac00 0000000000000000 ffff9255d571ae40 ffff92579fad8980
May 25 15:13:12 host kernel:  ffff92577bc40f00 ffffb54e452cba10 ffffffffadc15a99 ffffffffadc1a964
May 25 15:13:12 host kernel:  00ffffffadc1a970 ffff92579fad8980 ffffffffad62568f ffff9255d571ae40
May 25 15:13:12 host kernel: Call Trace:
May 25 15:13:12 host kernel:  [<ffffffffadc15a99>] ? __schedule+0x239/0x6f0
May 25 15:13:12 host kernel:  [<ffffffffadc1a964>] ? __switch_to_asm+0x34/0x70
May 25 15:13:12 host kernel:  [<ffffffffad62568f>] ? __switch_to+0x1ef/0x660
May 25 15:13:12 host kernel:  [<ffffffffadc16750>] ? bit_wait+0x50/0x50
May 25 15:13:12 host kernel:  [<ffffffffadc15f82>] ? schedule+0x32/0x80
May 25 15:13:12 host kernel:  [<ffffffffadc1931d>] ? schedule_timeout+0x1dd/0x380
May 25 15:13:12 host kernel:  [<ffffffffadc15aa1>] ? __schedule+0x241/0x6f0
May 25 15:13:12 host kernel:  [<ffffffffad659860>] ? hpet_interrupt_handler+0x30/0x30
May 25 15:13:12 host kernel:  [<ffffffffadc16750>] ? bit_wait+0x50/0x50
May 25 15:13:12 host kernel:  [<ffffffffadc157fd>] ? io_schedule_timeout+0x9d/0x100
May 25 15:13:12 host kernel:  [<ffffffffad6bd2d7>] ? prepare_to_wait_exclusive+0x57/0x80
May 25 15:13:12 host kernel:  [<ffffffffadc16767>] ? bit_wait_io+0x17/0x60
May 25 15:13:12 host kernel:  [<ffffffffadc163a6>] ? __wait_on_bit_lock+0x56/0xc0
May 25 15:13:12 host kernel:  [<ffffffffad7822e2>] ? __lock_page+0x82/0xa0
May 25 15:13:12 host kernel:  [<ffffffffad6bd680>] ? wake_atomic_t_function+0x60/0x60
May 25 15:13:12 host kernel:  [<ffffffffc20daa55>] ? zfs_putpage+0x335/0x500 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad7c703f>] ? rmap_walk_file+0x19f/0x260
May 25 15:13:12 host kernel:  [<ffffffffad7c9704>] ? page_mkclean+0xa4/0xc0
May 25 15:13:12 host kernel:  [<ffffffffad8428c5>] ? __set_page_dirty_buffers+0x85/0xe0
May 25 15:13:12 host kernel:  [<ffffffffc20f9351>] ? zpl_putpage+0x31/0x40 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad791622>] ? write_cache_pages+0x1c2/0x490
May 25 15:13:12 host kernel:  [<ffffffffc20f9320>] ? zpl_write_common_iovec+0x100/0x100 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad6bd094>] ? __wake_up+0x34/0x50
May 25 15:13:12 host kernel:  [<ffffffffadc17d1e>] ? mutex_lock+0xe/0x30
May 25 15:13:12 host kernel:  [<ffffffffc08a2280>] ? cv_wait_common+0xf0/0x140 [spl]
May 25 15:13:12 host kernel:  [<ffffffffc20f9707>] ? zpl_writepages+0x87/0x150 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad784a18>] ? __filemap_fdatawrite_range+0xc8/0x100
May 25 15:13:12 host kernel:  [<ffffffffad784b53>] ? filemap_write_and_wait_range+0x33/0x70
May 25 15:13:12 host kernel:  [<ffffffffc20f98b7>] ? zpl_fsync+0x37/0x90 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad7c6b94>] ? SyS_msync+0x184/0x200
May 25 15:13:12 host kernel:  [<ffffffffad603b7d>] ? do_syscall_64+0x8d/0xf0
May 25 15:13:12 host kernel:  [<ffffffffadc1a84e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

May 25 15:13:12 host kernel: INFO: task fio:13384 blocked for more than 120 seconds.
May 25 15:13:12 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 25 15:13:12 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 25 15:13:12 host kernel: fio             D    0 13384  13380 0x00000000
May 25 15:13:12 host kernel:  ffff9255a9528000 0000000000000000 ffff92573b7f6e80 ffff92579fa18980
May 25 15:13:12 host kernel:  ffffffffae211500 ffffb54e4532bad8 ffffffffadc15a99 0000000000000046
May 25 15:13:12 host kernel:  00ffffffc08a3203 ffff92579fa18980 ffffb54e4532baf8 ffff92573b7f6e80
May 25 15:13:12 host kernel: Call Trace:
May 25 15:13:12 host kernel:  [<ffffffffadc15a99>] ? __schedule+0x239/0x6f0
May 25 15:13:12 host kernel:  [<ffffffffadc15f82>] ? schedule+0x32/0x80
May 25 15:13:12 host kernel:  [<ffffffffc08a22af>] ? cv_wait_common+0x11f/0x140 [spl]
May 25 15:13:12 host kernel:  [<ffffffffad6bd5e0>] ? prepare_to_wait_event+0xf0/0xf0
May 25 15:13:12 host kernel:  [<ffffffffc20cd65b>] ? rangelock_enter+0x2db/0x540 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffc20da82b>] ? zfs_putpage+0x10b/0x500 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad7c703f>] ? rmap_walk_file+0x19f/0x260
May 25 15:13:12 host kernel:  [<ffffffffad7c9704>] ? page_mkclean+0xa4/0xc0
May 25 15:13:12 host kernel:  [<ffffffffad7c78e0>] ? __page_check_address+0x1f0/0x1f0
May 25 15:13:12 host kernel:  [<ffffffffc20f9351>] ? zpl_putpage+0x31/0x40 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad791622>] ? write_cache_pages+0x1c2/0x490
May 25 15:13:12 host kernel:  [<ffffffffc20f9320>] ? zpl_write_common_iovec+0x100/0x100 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffadc17d1e>] ? mutex_lock+0xe/0x30
May 25 15:13:12 host kernel:  [<ffffffffc206150b>] ? rrw_exit+0x5b/0x140 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad784a18>] ? __filemap_fdatawrite_range+0xc8/0x100
May 25 15:13:12 host kernel:  [<ffffffffad784b53>] ? filemap_write_and_wait_range+0x33/0x70
May 25 15:13:12 host kernel:  [<ffffffffc20f98b7>] ? zpl_fsync+0x37/0x90 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad7c6b94>] ? SyS_msync+0x184/0x200
May 25 15:13:12 host kernel:  [<ffffffffad603b7d>] ? do_syscall_64+0x8d/0xf0
May 25 15:13:12 host kernel:  [<ffffffffadc1a84e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

May 25 15:13:12 host kernel: INFO: task fio:13385 blocked for more than 120 seconds.
May 25 15:13:12 host kernel:       Tainted: P           O    4.9.0-9-amd64 #1 Debian 4.9.168-1
May 25 15:13:12 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 25 15:13:12 host kernel: fio             D    0 13385  13380 0x00000000
May 25 15:13:12 host kernel:  ffff92561cfb2800 0000000000000000 ffff92566d688180 ffff92579fb98980
May 25 15:13:12 host kernel:  ffff92577bc6af80 ffffb54e459e7c48 ffffffffadc15a99 0000000000000246
May 25 15:13:12 host kernel:  0000000000000003 ffff92579fb98980 0000000000000000 ffff92566d688180
May 25 15:13:12 host kernel: Call Trace:
May 25 15:13:12 host kernel:  [<ffffffffadc15a99>] ? __schedule+0x239/0x6f0
May 25 15:13:12 host kernel:  [<ffffffffadc15f82>] ? schedule+0x32/0x80
May 25 15:13:12 host kernel:  [<ffffffffadc1623a>] ? schedule_preempt_disabled+0xa/0x10
May 25 15:13:12 host kernel:  [<ffffffffadc17c94>] ? __mutex_lock_slowpath+0xb4/0x130
May 25 15:13:12 host kernel:  [<ffffffffadc17d2b>] ? mutex_lock+0x1b/0x30
May 25 15:13:12 host kernel:  [<ffffffffc20e8242>] ? zil_commit_impl+0x3a2/0xca0 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad6bd01f>] ? __wake_up_common+0x4f/0x90
May 25 15:13:12 host kernel:  [<ffffffffad6bd094>] ? __wake_up+0x34/0x50
May 25 15:13:12 host kernel:  [<ffffffffc20f9745>] ? zpl_writepages+0xc5/0x150 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad784a18>] ? __filemap_fdatawrite_range+0xc8/0x100
May 25 15:13:12 host kernel:  [<ffffffffad784b53>] ? filemap_write_and_wait_range+0x33/0x70
May 25 15:13:12 host kernel:  [<ffffffffc20f98b7>] ? zpl_fsync+0x37/0x90 [zfs]
May 25 15:13:12 host kernel:  [<ffffffffad7c6b94>] ? SyS_msync+0x184/0x200
May 25 15:13:12 host kernel:  [<ffffffffad603b7d>] ? do_syscall_64+0x8d/0xf0
May 25 15:13:12 host kernel:  [<ffffffffadc1a84e>] ? entry_SYSCALL_64_after_swapgs+0x58/0xc6

dweeezil · 2019-05-26T13:53:13Z

I am able to 100% reliably reproduce this problem in a matter of seconds using master from today and a pool consisting of a single 20GiB file on a tmpfs file system. I've not looked into it too deeply yet but so far, I've eliminated the recently-added direct IO support as a cause by removing the "direct=1" fio flag and also by removing the direct IO support from the ZFS module because the kernel's writeback mechanism will use it as a standard course of its operation if it exists.

mlvensel · 2019-05-27T01:21:13Z

I had a similar issue with my zfs pool across multiple versions of the software on ubuntu 18.04 LFS and 19.04. I was able to mount my volume to capture the data out of it using the following command

zpool import -o readonly=on -o cachefile=none -o failmode=continue -Fn zfsdata

Your Mileage May Vary but wanted to share what got mine to work with this issue.

h1z1 · 2019-05-29T04:40:01Z

Updating this before I forget. The case I run into more often are ext4 zvols with either corrupt journals or otherwise unavailable (device changes for example). The mount will hang.

Stack trace

[<0>] io_schedule+0x16/0x40
[<0>] wait_on_page_bit_common+0x109/0x1c0
[<0>] __filemap_fdatawait_range+0x10a/0x180
[<0>] file_write_and_wait_range+0x6d/0xa0
[<0>] ext4_sync_file+0x114/0x370 [ext4]
[<0>] vfs_fsync_range+0x3f/0x80
[<0>] vfs_fsync+0x1c/0x20
[<0>] vn_fsync+0x17/0x20 [spl]
[<0>] vdev_file_io_start+0xae/0x120 [zfs]
[<0>] zio_vdev_io_start+0xc7/0x350 [zfs]
[<0>] zio_nowait+0xbf/0x150 [zfs]
[<0>] zio_flush+0x33/0x40 [zfs]
[<0>] zil_commit_writer+0x67b/0x760 [zfs]
[<0>] zil_commit.part.13+0x93/0x100 [zfs]
[<0>] zil_commit+0x17/0x20 [zfs]
[<0>] zvol_write+0x570/0x620 [zfs]
[<0>] zvol_request+0x22d/0x350 [zfs]
[<0>] generic_make_request+0x19a/0x3d0
[<0>] submit_bio+0x75/0x140
[<0>] submit_bh_wbc+0x16f/0x1a0
[<0>] __sync_dirty_buffer+0x70/0xd0
[<0>] ext4_commit_super+0x1df/0x2c0 [ext4]
[<0>] ext4_setup_super+0x158/0x1c0 [ext4]
[<0>] ext4_fill_super+0x2116/0x3c50 [ext4]
[<0>] mount_bdev+0x187/0x1c0
[<0>] ext4_mount+0x15/0x20 [ext4]
[<0>] mount_fs+0x3e/0x150
[<0>] vfs_kern_mount+0x67/0x130
[<0>] do_mount+0x1f0/0xca0
[<0>] ksys_mount+0x83/0xd0
[<0>] __x64_sys_mount+0x25/0x30
[<0>] do_syscall_64+0x60/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

dweeezil · 2019-05-31T20:58:06Z

@darrenfreeman What is the last version of ZoL on which this test did not deadlock?

After reviewing the stack traces I've been able to get and a short list of potential commits which may have caused this problem, I wasn't able to seem any obvious problem so I've done a bit of bisection and have been able to easily reproduce this problem as far back as 0f69f42, committed on Jul 27, 2017. The caveat being is that I'm still using a tmpfs-backed single-vdev pool which might actually be contributing to this problem.

darrenfreeman · 2019-05-31T21:18:40Z

I don't know how far back it goes, I had been having my whole system hang on high SSD zpool I/O, until a fairly recent commit. But after updating to the git version in my original post, I found the situation much improved. So I added the Optane NVMe SLOG and pushed it further.

dweeezil · 2019-06-01T04:23:13Z

After further investigation and research, I've discovered that reverting both 21a96fb and d958324 cause this test case to work. Here's the reproducer I've been using. I've also been using a similar script that uses a loopback device backed by an ext4 sparse file. Both typically deadlock within seconds of starting the random writes. With the 2 commits listed above reverted, the deadlock goes away.

#!/bin/bash
#
/usr/share/zfs/zfs.sh -u
modprobe zfs
umount /tank.img
mkdir -p /tank.img
mount -t tmpfs -o size=21g tank /tank.img &&
	truncate -s 20g /tank.img/tank.img &&
	zpool create tank /tank.img/tank.img &&
	zfs create tank/fs &&
	cd /tank/fs &&
fio \
	--name TEST \
	--eta-newline=5s \
	--filename=fio-tempfile.dat \
	--rw=randwrite \
	--size=512m \
	--io_size=10g \
	--blocksize=8k \
	--ioengine=mmap \
	--fsync=1 \
	--iodepth=1 \
	--numjobs=8 \
	--runtime=60 \
	--group_reporting

Pinging @behlendorf and @tuxoko to get their ideas on this.

stale · 2020-08-24T19:53:49Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

darrenfreeman · 2020-08-25T05:10:09Z

I don't believe this issue has been resolved, although I haven't repeated my tests against the latest version.

bglod · 2020-09-10T15:02:35Z

I have to rule this out a hardware issue, unless there's something I am missing. I have 12 drives of various manufacturers and models. I swapped out the cables, backplane, and HBA. I also tried the motherboard's onboard SATA connections. The txg_sync() timeout remains, and each is preceded by a deadman event. The drive that triggers the deadman/txg_sync() hang is different each time, and it has occurred enough times that all 12 drives have triggered it at least once.

Edit (11/19/2020): a few days after this, I was able to update ZFS on this system and it has not had a hang since (so over 2 months at this point). Previously, it would hang every few days, usually during a backup or scrub. I'm cautiously optimistic.

Edit (04/02/2021): still no hangs since September. Hardware, load, and general server usage has not changed.

jxdking · 2021-04-02T19:57:21Z

I have the same issue. I can repeat the issue using fio with fsync. It consumes all the available memory and all the swap space.
When it happens, it uses up all the slog space, too. However, based on what I read, slog allocation should NOT be greater than zfs_dirty_data_max. It seems something to do with fsync. It looks like fsync repeatedly resets the dirty data counter, which cause txg keeps accepting new write request.

PS: I am running Ubuntu 20.04.2 LTS with zfs 0.8.3

jxdking · 2021-07-15T18:09:53Z

Potential fixes: #12284

stale · 2022-07-31T19:48:09Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

darrenfreeman · 2022-08-01T01:18:13Z

I am not able to test on my original hardware setup, and the Optane has since failed. But it's serious enough to not close until it's proven fixed.

@dweeezil has a reproducer above.

darrenfreeman changed the title ~~I/O frozen to one dataset only, after running fio with mmap as the I/O method~~ Repeatably hang user processes, using memory-mapped I/O and fsync. May 20, 2019

behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label May 24, 2019

behlendorf added the Component: Memory Management kernel memory management label May 25, 2019

stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020

stale bot removed the Status: Stale No recent activity for issue label Aug 25, 2020

stale bot added the Status: Stale No recent activity for issue label Jul 31, 2022

behlendorf added Bot: Not Stale Override for the stale bot and removed Status: Stale No recent activity for issue labels Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repeatably hang user processes, using memory-mapped I/O and fsync. #8751

Repeatably hang user processes, using memory-mapped I/O and fsync. #8751

darrenfreeman commented May 16, 2019

darrenfreeman commented May 20, 2019

h1z1 commented May 20, 2019

darrenfreeman commented May 20, 2019

prgwiz commented May 20, 2019 via email

behlendorf commented May 24, 2019

darrenfreeman commented May 25, 2019

dweeezil commented May 26, 2019

mlvensel commented May 27, 2019

h1z1 commented May 29, 2019

dweeezil commented May 31, 2019

darrenfreeman commented May 31, 2019

dweeezil commented Jun 1, 2019

stale bot commented Aug 24, 2020

darrenfreeman commented Aug 25, 2020

bglod commented Sep 10, 2020 •

edited

Loading

jxdking commented Apr 2, 2021 •

edited

Loading

jxdking commented Jul 15, 2021

stale bot commented Jul 31, 2022

darrenfreeman commented Aug 1, 2022

Repeatably hang user processes, using memory-mapped I/O and fsync. #8751

Repeatably hang user processes, using memory-mapped I/O and fsync. #8751

Comments

darrenfreeman commented May 16, 2019

System information

Describe the problem you're observing

Describe how to reproduce the problem

Include any warning/errors/backtraces from the system logs

darrenfreeman commented May 20, 2019

h1z1 commented May 20, 2019

darrenfreeman commented May 20, 2019

prgwiz commented May 20, 2019 via email

behlendorf commented May 24, 2019

darrenfreeman commented May 25, 2019

dweeezil commented May 26, 2019

mlvensel commented May 27, 2019

h1z1 commented May 29, 2019

dweeezil commented May 31, 2019

darrenfreeman commented May 31, 2019

dweeezil commented Jun 1, 2019

stale bot commented Aug 24, 2020

darrenfreeman commented Aug 25, 2020

bglod commented Sep 10, 2020 • edited Loading

jxdking commented Apr 2, 2021 • edited Loading

jxdking commented Jul 15, 2021

stale bot commented Jul 31, 2022

darrenfreeman commented Aug 1, 2022

bglod commented Sep 10, 2020 •

edited

Loading

jxdking commented Apr 2, 2021 •

edited

Loading