Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discard operations on empty zvols are "slow" #7951

Open
ryao opened this issue Sep 24, 2018 · 10 comments
Open

Discard operations on empty zvols are "slow" #7951

ryao opened this issue Sep 24, 2018 · 10 comments
Labels
Bot: Not Stale Override for the stale bot Component: ZVOL ZFS Volumes Status: Understood The root cause of the issue is known Type: Performance Performance improvement or performance problem

Comments

@ryao
Copy link
Contributor

ryao commented Sep 24, 2018

I was trying to produce a counter example for #7937, but I ended up discovering a "deficiency" in our zvol code. I ran two commands:

zfs create -V 1E -s rpool/example
mkfs.xfs /dev/zvol/rpool/example

The first created a 1EB sparse zvol. The second was to make a XFS filesystem on it. Unfortunately, mkfs.xfs will do a discard before its format operation by default. This blocked on a discard operation:

[<0>] ___preempt_schedule+0x16/0x18
[<0>] taskq_dispatch+0x81/0x3d0 [spl]
[<0>] zvol_request+0x2b0/0x3f0 [zfs]
[<0>] generic_make_request+0x1d8/0x3c0
[<0>] submit_bio+0x73/0x140
[<0>] next_bio+0x38/0x40
[<0>] __blkdev_issue_discard+0x16f/0x220
[<0>] blkdev_issue_discard+0x6c/0xd0
[<0>] blk_ioctl_discard+0xc7/0x110
[<0>] blkdev_ioctl+0x8db/0x950
[<0>] block_ioctl+0x3d/0x50
[<0>] do_vfs_ioctl+0xa8/0x620
[<0>] ksys_ioctl+0x75/0x80
[<0>] __x64_sys_ioctl+0x1a/0x20
[<0>] do_syscall_64+0x5f/0x120
[<0>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[<0>] 0xffffffffffffffff

iostat claims that we are writing to the zvol at 82354484213.86 kB/sec, which is ~82TB/sec. On a 1EB zvol, this will take 3 to 4 hours to complete. mkfs.xfs also cannot be killed until the discard operation has completed.

Anyway, there are a few things wrong here:

  1. We are not making the operation a no-op on an empty zvol.
  2. Userspace cannot interrupt this.
@ryao ryao changed the title Discard operations on empty zvols are slow Discard operations on empty zvols are "slow" Sep 24, 2018
@trisk
Copy link
Contributor

trisk commented Sep 24, 2018

Seems possible this is blocked by some other operation using the zvol_taskq. I wonder if the queue is full at the time?

@ryao
Copy link
Contributor Author

ryao commented Sep 24, 2018

@trisk This is the only zvol on this system and nothing else is touching it. The IO queue is definitely full though:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
zd0               0.00     0.00    0.00 639943.00     0.00 83878608896.00 262144.00    31.50    0.05    0.00    0.05   0.00 100.40

The zvol threads are also taking up all CPU time.

@bunder2015 bunder2015 added Component: ZVOL ZFS Volumes Type: Performance Performance improvement or performance problem labels Sep 24, 2018
@richardelling
Copy link
Contributor

perhaps the default value of zvol_max_discard_blocks is too high?

@ryao
Copy link
Contributor Author

ryao commented Sep 25, 2018

@richardelling The issue is two fold. One is that there is no way for the code to respond to a signal from userspace and exit early. The other is that the code doesn't check to see if the zvol is already empty and turn the discard into a no-op.

@ryao
Copy link
Contributor Author

ryao commented Sep 25, 2018

@richardelling Actually, I see what you are saying. It is likely the reverse. __blkdev_issue_discard will break discard operations into max_discard_sectors sized chunks, which prevents us from optimizing this use case.

@richardelling
Copy link
Contributor

Two things:

  1. the default for zvol_max_discard_blocks is likely a SWAG, dunno if it should be changed, but likely a lower value makes sense
  2. we could be smarter about this for empty zvols, if that helps

@alek-p
Copy link
Contributor

alek-p commented Sep 25, 2018

perhaps this is a dup of #6728 ? @ryao you could try setting zfs_per_txg_dirty_frees_percent=0 and seeing if that helps

@ryao
Copy link
Contributor Author

ryao commented Sep 25, 2018

In the scenario that I described where a discard is done on the entirety of an empty zvol that is 1EB, assuming the default volblocksie, zvol_request will be called approximately 7.6 billion times. That is the cause of the slow down. The DMU is great at making sure that we don’t actually write anything, taking less than 2 microseconds per operation on average, but we still run through the motions of trying to free data billions of times in a non-interruptible context. I do not believe that any amount of tweaking that does not involve getting the kernel to pass the entire discard to zvol_request will make this quick.

In that respect, I think that this is different than #6728. Here the zvol is empty, which is a special case. In the other issue, the state of the zvol is unclear. It could include zvols that have data. Formatting a zvol right after making it is a fairly common operation and apparently, the discard as part of the format is a common operation too, so we probably should change this. The fix for that won’t necessarily help a zvol that has data.

@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot removed the Status: Stale No recent activity for issue label Aug 29, 2020
@stale
Copy link

stale bot commented Aug 29, 2021

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 29, 2021
@behlendorf behlendorf added Bot: Not Stale Override for the stale bot Status: Understood The root cause of the issue is known and removed Status: Stale No recent activity for issue labels Sep 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bot: Not Stale Override for the stale bot Component: ZVOL ZFS Volumes Status: Understood The root cause of the issue is known Type: Performance Performance improvement or performance problem
Projects
None yet
Development

No branches or pull requests

6 participants