Discard operations on empty zvols are "slow" #7951

ryao · 2018-09-24T20:48:39Z

I was trying to produce a counter example for #7937, but I ended up discovering a "deficiency" in our zvol code. I ran two commands:

zfs create -V 1E -s rpool/example
mkfs.xfs /dev/zvol/rpool/example

The first created a 1EB sparse zvol. The second was to make a XFS filesystem on it. Unfortunately, mkfs.xfs will do a discard before its format operation by default. This blocked on a discard operation:

[<0>] ___preempt_schedule+0x16/0x18
[<0>] taskq_dispatch+0x81/0x3d0 [spl]
[<0>] zvol_request+0x2b0/0x3f0 [zfs]
[<0>] generic_make_request+0x1d8/0x3c0
[<0>] submit_bio+0x73/0x140
[<0>] next_bio+0x38/0x40
[<0>] __blkdev_issue_discard+0x16f/0x220
[<0>] blkdev_issue_discard+0x6c/0xd0
[<0>] blk_ioctl_discard+0xc7/0x110
[<0>] blkdev_ioctl+0x8db/0x950
[<0>] block_ioctl+0x3d/0x50
[<0>] do_vfs_ioctl+0xa8/0x620
[<0>] ksys_ioctl+0x75/0x80
[<0>] __x64_sys_ioctl+0x1a/0x20
[<0>] do_syscall_64+0x5f/0x120
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff

iostat claims that we are writing to the zvol at 82354484213.86 kB/sec, which is ~82TB/sec. On a 1EB zvol, this will take 3 to 4 hours to complete. mkfs.xfs also cannot be killed until the discard operation has completed.

Anyway, there are a few things wrong here:

We are not making the operation a no-op on an empty zvol.
Userspace cannot interrupt this.

The text was updated successfully, but these errors were encountered:

trisk · 2018-09-24T21:00:21Z

Seems possible this is blocked by some other operation using the zvol_taskq. I wonder if the queue is full at the time?

ryao · 2018-09-24T21:19:21Z

@trisk This is the only zvol on this system and nothing else is touching it. The IO queue is definitely full though:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
zd0               0.00     0.00    0.00 639943.00     0.00 83878608896.00 262144.00    31.50    0.05    0.00    0.05   0.00 100.40

The zvol threads are also taking up all CPU time.

richardelling · 2018-09-24T21:56:56Z

perhaps the default value of zvol_max_discard_blocks is too high?

ryao · 2018-09-25T01:05:41Z

@richardelling The issue is two fold. One is that there is no way for the code to respond to a signal from userspace and exit early. The other is that the code doesn't check to see if the zvol is already empty and turn the discard into a no-op.

ryao · 2018-09-25T01:15:15Z

@richardelling Actually, I see what you are saying. It is likely the reverse. __blkdev_issue_discard will break discard operations into max_discard_sectors sized chunks, which prevents us from optimizing this use case.

richardelling · 2018-09-25T16:52:28Z

Two things:

the default for zvol_max_discard_blocks is likely a SWAG, dunno if it should be changed, but likely a lower value makes sense
we could be smarter about this for empty zvols, if that helps

alek-p · 2018-09-25T17:11:19Z

perhaps this is a dup of #6728 ? @ryao you could try setting zfs_per_txg_dirty_frees_percent=0 and seeing if that helps

ryao · 2018-09-25T19:58:05Z

In the scenario that I described where a discard is done on the entirety of an empty zvol that is 1EB, assuming the default volblocksie, zvol_request will be called approximately 7.6 billion times. That is the cause of the slow down. The DMU is great at making sure that we don’t actually write anything, taking less than 2 microseconds per operation on average, but we still run through the motions of trying to free data billions of times in a non-interruptible context. I do not believe that any amount of tweaking that does not involve getting the kernel to pass the entire discard to zvol_request will make this quick.

In that respect, I think that this is different than #6728. Here the zvol is empty, which is a special case. In the other issue, the state of the zvol is unclear. It could include zvols that have data. Formatting a zvol right after making it is a fairly common operation and apparently, the discard as part of the format is a common operation too, so we probably should change this. The fix for that won’t necessarily help a zvol that has data.

stale · 2020-08-25T01:58:41Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

stale · 2021-08-29T03:11:50Z

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

ryao changed the title ~~Discard operations on empty zvols are slow~~ Discard operations on empty zvols are "slow" Sep 24, 2018

bunder2015 added Component: ZVOL ZFS Volumes Type: Performance Performance improvement or performance problem labels Sep 24, 2018

bunder2015 added the Missing Template label Feb 14, 2019

stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020

behlendorf removed the Missing Template label Aug 29, 2020

stale bot removed the Status: Stale No recent activity for issue label Aug 29, 2020

stale bot added the Status: Stale No recent activity for issue label Aug 29, 2021

behlendorf added Bot: Not Stale Override for the stale bot Status: Understood The root cause of the issue is known and removed Status: Stale No recent activity for issue labels Sep 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discard operations on empty zvols are "slow" #7951

Discard operations on empty zvols are "slow" #7951

ryao commented Sep 24, 2018 •

edited

Loading

trisk commented Sep 24, 2018

ryao commented Sep 24, 2018 •

edited

Loading

richardelling commented Sep 24, 2018

ryao commented Sep 25, 2018

ryao commented Sep 25, 2018 •

edited

Loading

richardelling commented Sep 25, 2018

alek-p commented Sep 25, 2018 •

edited

Loading

ryao commented Sep 25, 2018 •

edited

Loading

stale bot commented Aug 25, 2020

stale bot commented Aug 29, 2021

Discard operations on empty zvols are "slow" #7951

Discard operations on empty zvols are "slow" #7951

Comments

ryao commented Sep 24, 2018 • edited Loading

trisk commented Sep 24, 2018

ryao commented Sep 24, 2018 • edited Loading

richardelling commented Sep 24, 2018

ryao commented Sep 25, 2018

ryao commented Sep 25, 2018 • edited Loading

richardelling commented Sep 25, 2018

alek-p commented Sep 25, 2018 • edited Loading

ryao commented Sep 25, 2018 • edited Loading

stale bot commented Aug 25, 2020

stale bot commented Aug 29, 2021

ryao commented Sep 24, 2018 •

edited

Loading

ryao commented Sep 24, 2018 •

edited

Loading

ryao commented Sep 25, 2018 •

edited

Loading

alek-p commented Sep 25, 2018 •

edited

Loading

ryao commented Sep 25, 2018 •

edited

Loading