Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdev_id: Support daisy-chained JBODs in multipath mode #10736

Closed
wants to merge 3 commits into from

Conversation

AeonJJohnson
Copy link

Motivation and Context

The current version of vdev_id is not able to differentiate between devices in multiple/daisy-chained JBOD enclosures.

Description

Within function sas_handler() userspace commands like /usr/sbin/multipath have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods.

How Has This Been Tested?

Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with zpool import -a -d /dev/disk/by-vdev. Low level udev debug outputs were traced to isolate and resolve errors.

Initial testing of a previous version of this change showed how reliance on userspace utilities like /usr/sbin/multipath and /usr/bin/lsscsi were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load.

/dev/disk/by-vdev
# ls /dev/disk/by-vdev/
A-0-0   A-0-14  A-0-2   A-0-25  A-0-30  A-0-36  A-0-41  A-0-47  A-0-52  A-0-58  A-1-0   A-1-14  A-1-2   A-1-25  A-1-30  A-1-36  A-1-41  A-1-47  A-1-52  A-1-58  B-0-0   B-0-14  B-0-2   B-0-25  B-0-30  B-0-36  B-0-41  B-0-47  B-0-52  B-0-58  B-1-0   B-1-14  B-1-2   B-1-25  B-1-30  B-1-36  B-1-41  B-1-47  B-1-52  B-1-58
A-0-1   A-0-15  A-0-20  A-0-26  A-0-31  A-0-37  A-0-42  A-0-48  A-0-53  A-0-59  A-1-1   A-1-15  A-1-20  A-1-26  A-1-31  A-1-37  A-1-42  A-1-48  A-1-53  A-1-59  B-0-1   B-0-15  B-0-20  B-0-26  B-0-31  B-0-37  B-0-42  B-0-48  B-0-53  B-0-59  B-1-1   B-1-15  B-1-20  B-1-26  B-1-31  B-1-37  B-1-42  B-1-48  B-1-53  B-1-59
A-0-10  A-0-16  A-0-21  A-0-27  A-0-32  A-0-38  A-0-43  A-0-49  A-0-54  A-0-6   A-1-10  A-1-16  A-1-21  A-1-27  A-1-32  A-1-38  A-1-43  A-1-49  A-1-54  A-1-6   B-0-10  B-0-16  B-0-21  B-0-27  B-0-32  B-0-38  B-0-43  B-0-49  B-0-54  B-0-6   B-1-10  B-1-16  B-1-21  B-1-27  B-1-32  B-1-38  B-1-43  B-1-49  B-1-54  B-1-6
A-0-11  A-0-17  A-0-22  A-0-28  A-0-33  A-0-39  A-0-44  A-0-5   A-0-55  A-0-7   A-1-11  A-1-17  A-1-22  A-1-28  A-1-33  A-1-39  A-1-44  A-1-5   A-1-55  A-1-7   B-0-11  B-0-17  B-0-22  B-0-28  B-0-33  B-0-39  B-0-44  B-0-5   B-0-55  B-0-7   B-1-11  B-1-17  B-1-22  B-1-28  B-1-33  B-1-39  B-1-44  B-1-5   B-1-55  B-1-7
A-0-12  A-0-18  A-0-23  A-0-29  A-0-34  A-0-4   A-0-45  A-0-50  A-0-56  A-0-8   A-1-12  A-1-18  A-1-23  A-1-29  A-1-34  A-1-4   A-1-45  A-1-50  A-1-56  A-1-8   B-0-12  B-0-18  B-0-23  B-0-29  B-0-34  B-0-4   B-0-45  B-0-50  B-0-56  B-0-8   B-1-12  B-1-18  B-1-23  B-1-29  B-1-34  B-1-4   B-1-45  B-1-50  B-1-56  B-1-8
A-0-13  A-0-19  A-0-24  A-0-3   A-0-35  A-0-40  A-0-46  A-0-51  A-0-57  A-0-9   A-1-13  A-1-19  A-1-24  A-1-3   A-1-35  A-1-40  A-1-46  A-1-51  A-1-57  A-1-9   B-0-13  B-0-19  B-0-24  B-0-3   B-0-35  B-0-40  B-0-46  B-0-51  B-0-57  B-0-9   B-1-13  B-1-19  B-1-24  B-1-3   B-1-35  B-1-40  B-1-46  B-1-51  B-1-57  B-1-9
zpool status
# zpool status
  pool: jb0-pool0
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	jb0-pool0   ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    A-0-0   ONLINE       0     0     0
	    A-0-1   ONLINE       0     0     0
	    A-0-2   ONLINE       0     0     0
	    A-0-3   ONLINE       0     0     0
	    A-0-4   ONLINE       0     0     0
	    A-0-5   ONLINE       0     0     0
	    A-0-6   ONLINE       0     0     0
	    A-0-7   ONLINE       0     0     0
	    A-0-8   ONLINE       0     0     0
	    A-0-9   ONLINE       0     0     0
	    A-0-10  ONLINE       0     0     0
	    A-0-11  ONLINE       0     0     0
	    A-0-12  ONLINE       0     0     0
	    A-0-13  ONLINE       0     0     0
	    A-0-14  ONLINE       0     0     0

errors: No known data errors

  pool: jb0-pool1
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	jb0-pool1   ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    A-0-15  ONLINE       0     0     0
	    A-0-16  ONLINE       0     0     0
	    A-0-17  ONLINE       0     0     0
	    A-0-18  ONLINE       0     0     0
	    A-0-19  ONLINE       0     0     0
	    A-0-20  ONLINE       0     0     0
	    A-0-21  ONLINE       0     0     0
	    A-0-22  ONLINE       0     0     0
	    A-0-23  ONLINE       0     0     0
	    A-0-24  ONLINE       0     0     0
	    A-0-25  ONLINE       0     0     0
	    A-0-26  ONLINE       0     0     0
	    A-0-27  ONLINE       0     0     0
	    A-0-28  ONLINE       0     0     0
	    A-0-29  ONLINE       0     0     0

errors: No known data errors

  pool: jb0-pool2
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	jb0-pool2   ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    A-0-30  ONLINE       0     0     0
	    A-0-31  ONLINE       0     0     0
	    A-0-32  ONLINE       0     0     0
	    A-0-33  ONLINE       0     0     0
	    A-0-34  ONLINE       0     0     0
	    A-0-35  ONLINE       0     0     0
	    A-0-36  ONLINE       0     0     0
	    A-0-37  ONLINE       0     0     0
	    A-0-38  ONLINE       0     0     0
	    A-0-39  ONLINE       0     0     0
	    A-0-40  ONLINE       0     0     0
	    A-0-41  ONLINE       0     0     0
	    A-0-42  ONLINE       0     0     0
	    A-0-43  ONLINE       0     0     0
	    A-0-44  ONLINE       0     0     0

errors: No known data errors

  pool: jb0-pool3
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	jb0-pool3   ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    A-0-45  ONLINE       0     0     0
	    A-0-46  ONLINE       0     0     0
	    A-0-47  ONLINE       0     0     0
	    A-0-48  ONLINE       0     0     0
	    A-0-49  ONLINE       0     0     0
	    A-0-50  ONLINE       0     0     0
	    A-0-51  ONLINE       0     0     0
	    A-0-52  ONLINE       0     0     0
	    A-0-53  ONLINE       0     0     0
	    A-0-54  ONLINE       0     0     0
	    A-0-55  ONLINE       0     0     0
	    A-0-56  ONLINE       0     0     0
	    A-0-57  ONLINE       0     0     0
	    A-0-58  ONLINE       0     0     0
	    A-0-59  ONLINE       0     0     0

errors: No known data errors

  pool: jb1-pool0
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	jb1-pool0   ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    A-1-0   ONLINE       0     0     0
	    A-1-1   ONLINE       0     0     0
	    A-1-2   ONLINE       0     0     0
	    A-1-3   ONLINE       0     0     0
	    A-1-4   ONLINE       0     0     0
	    A-1-5   ONLINE       0     0     0
	    A-1-6   ONLINE       0     0     0
	    A-1-7   ONLINE       0     0     0
	    A-1-8   ONLINE       0     0     0
	    A-1-9   ONLINE       0     0     0
	    A-1-10  ONLINE       0     0     0
	    A-1-11  ONLINE       0     0     0
	    A-1-12  ONLINE       0     0     0
	    A-1-13  ONLINE       0     0     0
	    A-1-14  ONLINE       0     0     0

errors: No known data errors

  pool: jb1-pool1
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	jb1-pool1   ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    A-1-15  ONLINE       0     0     0
	    A-1-16  ONLINE       0     0     0
	    A-1-17  ONLINE       0     0     0
	    A-1-18  ONLINE       0     0     0
	    A-1-19  ONLINE       0     0     0
	    A-1-20  ONLINE       0     0     0
	    A-1-21  ONLINE       0     0     0
	    A-1-22  ONLINE       0     0     0
	    A-1-23  ONLINE       0     0     0
	    A-1-24  ONLINE       0     0     0
	    A-1-25  ONLINE       0     0     0
	    A-1-26  ONLINE       0     0     0
	    A-1-27  ONLINE       0     0     0
	    A-1-28  ONLINE       0     0     0
	    A-1-29  ONLINE       0     0     0

errors: No known data errors

  pool: jb1-pool2
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	jb1-pool2   ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    A-1-30  ONLINE       0     0     0
	    A-1-31  ONLINE       0     0     0
	    A-1-32  ONLINE       0     0     0
	    A-1-33  ONLINE       0     0     0
	    A-1-34  ONLINE       0     0     0
	    A-1-35  ONLINE       0     0     0
	    A-1-36  ONLINE       0     0     0
	    A-1-37  ONLINE       0     0     0
	    A-1-38  ONLINE       0     0     0
	    A-1-39  ONLINE       0     0     0
	    A-1-40  ONLINE       0     0     0
	    A-1-41  ONLINE       0     0     0
	    A-1-42  ONLINE       0     0     0
	    A-1-43  ONLINE       0     0     0
	    A-1-44  ONLINE       0     0     0

errors: No known data errors

  pool: jb1-pool3
 state: ONLINE
  scan: none requested
config:

	NAME        STATE     READ WRITE CKSUM
	jb1-pool3   ONLINE       0     0     0
	  raidz2-0  ONLINE       0     0     0
	    A-1-45  ONLINE       0     0     0
	    A-1-46  ONLINE       0     0     0
	    A-1-47  ONLINE       0     0     0
	    A-1-48  ONLINE       0     0     0
	    A-1-49  ONLINE       0     0     0
	    A-1-50  ONLINE       0     0     0
	    A-1-51  ONLINE       0     0     0
	    A-1-52  ONLINE       0     0     0
	    A-1-53  ONLINE       0     0     0
	    A-1-54  ONLINE       0     0     0
	    A-1-55  ONLINE       0     0     0
	    A-1-56  ONLINE       0     0     0
	    A-1-57  ONLINE       0     0     0
	    A-1-58  ONLINE       0     0     0
	    A-1-59  ONLINE       0     0     0

errors: No known data errors

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

  • My code follows the ZFS on Linux [code style requirements]
  • I have updated the documentation accordingly.
  • I have read the [contributing document]
  • I have added tests to cover my changes.
  • I have run the ZFS Test Suite with this change applied.
  • All commit messages are properly formatted and contain [Signed-off-by]

Replaced reliance on userspace commands with sysfs data. Four JBOD,
240 SAS drive udev processing speedup from 3m30s to 12s.

Signed-off-by: Contributor <jeff.johnson@aeoncomputing.com>
@AeonJJohnson
Copy link
Author

Probably should have cleaned up my previous commits into one, sorry for the confusion. 1b225f9 is the one that matters.

@codecov
Copy link

codecov bot commented Aug 19, 2020

Codecov Report

Merging #10736 into master will decrease coverage by 16.22%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #10736       +/-   ##
===========================================
- Coverage   79.24%   63.02%   -16.23%     
===========================================
  Files         400      296      -104     
  Lines      122012   102099    -19913     
===========================================
- Hits        96687    64343    -32344     
- Misses      25325    37756    +12431     
Flag Coverage Δ
#kernel ?
#user 63.02% <ø> (-4.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
module/zfs/objlist.c 0.00% <0.00%> (-100.00%) ⬇️
module/zfs/pathname.c 0.00% <0.00%> (-100.00%) ⬇️
include/sys/dmu_redact.h 0.00% <0.00%> (-100.00%) ⬇️
include/sys/dmu_traverse.h 0.00% <0.00%> (-100.00%) ⬇️
module/zfs/zfs_rlock.c 0.00% <0.00%> (-95.96%) ⬇️
module/lua/ltablib.c 2.34% <0.00%> (-95.32%) ⬇️
module/zfs/bqueue.c 0.00% <0.00%> (-94.45%) ⬇️
module/zcommon/zfs_deleg.c 0.00% <0.00%> (-92.46%) ⬇️
module/zfs/dmu_diff.c 0.00% <0.00%> (-87.88%) ⬇️
module/zfs/zcp_iter.c 5.49% <0.00%> (-86.26%) ⬇️
... and 291 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6203d2...1b225f9. Read the comment docs.

@codecov
Copy link

codecov bot commented Aug 19, 2020

Codecov Report

Merging #10736 into master will decrease coverage by 6.94%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #10736      +/-   ##
==========================================
- Coverage   79.24%   72.29%   -6.95%     
==========================================
  Files         400      371      -29     
  Lines      122012   119441    -2571     
==========================================
- Hits        96687    86355   -10332     
- Misses      25325    33086    +7761     
Flag Coverage Δ
#kernel 72.20% <ø> (-7.59%) ⬇️
#user 63.02% <ø> (-4.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
include/sys/dmu_redact.h 0.00% <0.00%> (-100.00%) ⬇️
include/sys/zfs_project.h 0.00% <0.00%> (-100.00%) ⬇️
module/zcommon/zfs_deleg.c 0.00% <0.00%> (-92.46%) ⬇️
module/zfs/dmu_redact.c 0.00% <0.00%> (-80.88%) ⬇️
module/zfs/dsl_deleg.c 10.66% <0.00%> (-79.42%) ⬇️
cmd/zfs/zfs_project.c 0.00% <0.00%> (-77.20%) ⬇️
cmd/zed/agents/fmd_serd.c 8.91% <0.00%> (-69.31%) ⬇️
module/zfs/spa_checkpoint.c 32.27% <0.00%> (-65.83%) ⬇️
cmd/zpool/zpool_iter.c 26.61% <0.00%> (-60.08%) ⬇️
include/sys/trace_dmu.h 50.00% <0.00%> (-50.00%) ⬇️
... and 170 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e6203d2...1b225f9. Read the comment docs.

@gdevenyi
Copy link
Contributor

Can you please run shellcheck and fix any issues it reports? Thanks!

@AeonJJohnson
Copy link
Author

I can close this, run the shellcheck and do a new PR since I didn't rebase prior. Let me know.

@behlendorf
Copy link
Contributor

@AeonJJohnson thanks! What I'd suggest is fixing up the shellcheck warnings, squashing everything in to a single commit, then force update this PR. That'll kick off a new CI run.

@AeonJJohnson
Copy link
Author

@behlendorf @gdevenyi how far do you want me to take the shellcheck cleanup? My additions only or the entire script where functions use local variables and all the other complaints shellcheck reports, even on the master vdev_id?

@gdevenyi
Copy link
Contributor

@AeonJJohnson as you're currently probably the most well versed in this code I would love if you could tackle the whole file. I have it on my todos at #10512 but I would've had to learn exactly how vdev_id works, whereas you've already put in that work.

I would suggest however that the fixes to your code you can just squash into your existing commits, however the fixes for the rest of the code be broken out separately.

@AeonJJohnson
Copy link
Author

@gdevenyi I'll tackle the whole thing. Does it have to be 100% POSIX sh compliant? Retiring backticking I get, not using an array kinda drops a wrench into my multijbod approach.

@gdevenyi
Copy link
Contributor

@AeonJJohnson good question but above my pay grade. @behlendorf can you comment on upgrading this script to bash only?

@behlendorf
Copy link
Contributor

As it happens I opened #10512 yesterday which takes care of the current warnings I saw in vdev_id when running checkbashisms. @gdevenyi @AeonJJohnson if you could take a quick look at that PR we can get it merged and then tackle these new changes.

As for whether it needs to be POSIX sh compliant that's debatable. While bash is available basically everywhere there are a surprising number of distributions where it's not the default shell. If we want this to just work everywhere POSIX sh is the way to go. We could change the shebang to #/bin/bash but that could end up breaking some environments. For example a minimum Ubuntu install I believe only provides dash, and it wouldn't be too strange for a minimal initramfs not to have bash installed. At a minimum I think the script needs to work in bash, dash, and ash.

@gdevenyi
Copy link
Contributor

As it happens I opened #10512 yesterday

** #10755

@gdevenyi
Copy link
Contributor

gdevenyi commented Oct 2, 2020

@gdevenyi I'll tackle the whole thing. Does it have to be 100% POSIX sh compliant? Retiring backticking I get, not using an array kinda drops a wrench into my multijbod approach.

So I think this PR has gotten stuck on the question of whether we need to stick to POSIX sh or we can use bashisms.

Can we get an official ruling from the devs that this script can't be upgraded to bash?

@behlendorf
Copy link
Contributor

I believe @AeonJJohnson sorted out the bashisms and was just working on a few more improvements. But I'd love to see this updated so we can get it included.

@gdevenyi
Copy link
Contributor

While testing the current state of this to see if it can handle my JBOD (see #11095) I found that this version of the script won't run properly under sh:

sh -x vdev_id.new -d dm-0
+ PATH=/bin:/sbin:/usr/bin:/usr/sbin
+ CONFIG=/etc/zfs/vdev_id.conf
+ PHYS_PER_PORT=
+ DEV=
+ MULTIPATH=
+ TOPOLOGY=
+ BAY=
vdev_id.new: 191: Syntax error: "(" unexpected (expecting "fi")

@arshad512
Copy link
Contributor

Updated version is under #11520 - It is almost same Jeff's code. Mostly rebased to latest, 3 patch squashed to single and minor posix compliant fix.

My intention was to update this PR. Somehow could not get to update this PR. (Maybe permission?) - Once I figure what wrong is happening I will update this PR. Since this has lots of history and comments of reviewers.

@behlendorf
Copy link
Contributor

@arshad512 thanks! Migrating to a new PR would be fine if that's easier for you. We can always link back to this original PR so those commends and feedback won't be lost.

@behlendorf
Copy link
Contributor

Closing, replaced by #11526

@behlendorf behlendorf closed this Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants