Improve ZVOL queue behavior #554

dechamps · 2012-02-03T13:07:45Z

The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering.

Detailed rationale:

max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide.
max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide.
physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue Sequential file rewrite outside of block boundaries is dead slow #361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow.
The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL.

Note: #389 was the original pull request for this, but I made a new one since I seriously messed up the branch by doing unwanted merges. I'll try to be more careful this time.

The Linux block device queue subsystem exposes a number of configurable settings described in Linux block/blk-settings.c. The defaults for these settings are tuned for hard drives, and are not optimized for ZVOLs. Proper configuration of these options would allow upper layers (I/O scheduler) to take better decisions about write merging and ordering. Detailed rationale: - max_hw_sectors is set to unlimited (UINT_MAX). zvol_write() is able to handle writes of any size, so there's no reason to impose a limit. Let the upper layer decide. - max_segments and max_segment_size are set to unlimited. zvol_write() will copy the requests' contents into a dbuf anyway, so the number and size of the segments are irrelevant. Let the upper layer decide. - physical_block_size and io_opt are set to the ZVOL's block size. This has the potential to somewhat alleviate issue openzfs#361 for ZVOLs, by warning the upper layers that writes smaller than the volume's block size will be slow. - The NONROT flag is set to indicate this isn't a rotational device. Although the backing zpool might be composed of rotational devices, the resulting ZVOL often doesn't exhibit the same behavior due to the COW mechanisms used by ZFS. Setting this flag will prevent upper layers from making useless decisions (such as reordering writes) based on incorrect assumptions about the behavior of the ZVOL.

behlendorf · 2012-02-07T18:25:18Z

@etienne-dechamps-o Please review this behlendorf/zfs@ebb42f9 slightly reworked commit. It basically adds the needed autoconf checks and wrappers to map your tunings to the legacy versions of those function if they exist. If the concept being tuned doesn't exist in the older kernel that tuning is simply skipped.

If your happy with these slightly reworked zvol stack of changes I'll merge them in to master. They pass all my tests for 2.6.26 - 3.2 kernels. It'll be nice to get these lchanges merged sorry it took so long.

Also, wasn't there originally another tweak to change the default number of zvol threads? I didn't see that as a pull request, if you have another change for that which improves things I'm happy to take it. You've I'm sure done the most tuning for zvols so far.

behlendorf · 2012-02-08T00:27:49Z

Merged in to master as commit 34037af

dechamps · 2012-02-08T08:34:31Z

@behlendorf

Also, wasn't there originally another tweak to change the default number of zvol threads? I didn't see that as a pull request, if you have another change for that which improves things I'm happy to take it. You've I'm sure done the most tuning for zvols so far.

You're talking about #392. I could write a patch to set zvol_threads to 32 instead of the number of CPUs. It has a net positive effect on all systems I tested.

behlendorf · 2012-02-08T19:55:55Z

Yes, that's the one. Sorry, I lost track of it. If you could open a new pull request with the change to 32 and a nice explanation of why (such as what's in #392) I'll get it in the tree. As you say in the bug, I don't see a downside to doing this even if 32 isn't perfectly optimal for all systems.

dechamps · 2012-02-08T21:48:39Z

@behlendorf Done in #567.

Current rw_tryupgrade does rw_exit and then rw_tryenter(RW_RWITER), and then does rw_enter(RW_READER) if it fails. This violate the assumption that rw_tryupgrade should be atomic and could cause extra contention or even lock inversion. This patch we implement a proper rw_tryupgrade. For rwsem-spinlock, we take the spinlock to check rwsem->count and rwsem->wait_list. For normal rwsem, we use cmpxchg on rwsem->count to change the value from single reader to single writer. Signed-off-by: Chunwei Chen <david.chen@osnexus.com> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tim Chase <tim@chase2k.com> Closes openzfs#4692 Closes openzfs#554

…aster Merge remote-tracking branch '6.0/stage' into 'master'

dechamps mentioned this pull request Feb 3, 2012

Improve ZVOL queue behavior #389

Closed

behlendorf closed this Feb 8, 2012

pcd1193182 pushed a commit to pcd1193182/zfs that referenced this pull request Sep 26, 2023

Merge pull request openzfs#554 from delphix/projects/merge-upstream/m…

afcce54

…aster Merge remote-tracking branch '6.0/stage' into 'master'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ZVOL queue behavior #554

Improve ZVOL queue behavior #554

dechamps commented Feb 3, 2012

behlendorf commented Feb 7, 2012

behlendorf commented Feb 8, 2012

dechamps commented Feb 8, 2012

behlendorf commented Feb 8, 2012

dechamps commented Feb 8, 2012

Improve ZVOL queue behavior #554

Improve ZVOL queue behavior #554

Conversation

dechamps commented Feb 3, 2012

behlendorf commented Feb 7, 2012

behlendorf commented Feb 8, 2012

dechamps commented Feb 8, 2012

behlendorf commented Feb 8, 2012

dechamps commented Feb 8, 2012