-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdev_id: Support daisy-chained JBODs in multipath mode. #11526
vdev_id: Support daisy-chained JBODs in multipath mode. #11526
Conversation
Hi Brian, Gabriel - Please use Jeff's latest PR for your reviews of vdev_id improvement. Jeff's old PR #10736 and #11520 need not be reviewed further as all changes are bought into latest one. (I wanted to update #10736, however was ending up with multiple merges. Easier for me to create a new PR. - Sorry for the multiple versions) Few comments on your last review...
This is fixed in the latest update. Thanks for pointing this out.
Initially running local shellcheck would give us 169 warnings. We bought it down to 127 as with the latest PR. And would like to go with Brian's suggestion, that is, let this patch be as is that focus on functionality and improvement what Jeff has coded. The follow up patch would be purely shellcheck fix. That would be easier. Please let me know your thoughts. shellcheck run on Master
shellcheck run on latest code (PR)
|
I'm okay to leave some shellchecks unfixed, however I'd just like to highlight that I think any quoting shellchecks need to be carefully handled now, here's a dump of one of my JBOD's enclosure naming:
Yes, those slot names all have multiple spaces in them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should be very careful about the quoting. I also think it would be preferable to split this in to two commits. The first one can add the new functionality, and the second can at least make a first pass on addressing the bashisms.
An easy way to do this with git would be to reset the changes and then add back in individual hunks. Something like this should work.
git reset HEAD^
# Add in all of the new hunks for the new feature
git add -p
git commit
# Add in the remaining chunks which are style / portability fixes.
git add -p
git commit
I will take care of the handling the quoting in my next push. Along with other comments. |
caff865
to
49711ed
Compare
Resolved all "SC2086: Double quote to prevent globbing and word splitting" in the vdev_id code.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, the code looks good to me. Though I haven't yet had a chance to test it out on a test system.
Hi Brian Thanks for the review. In that case could you hold the merge until we have done few more test runs. |
Edit: got it sorted, |
This update doesn't solve the logic problems underlying my JBOD described in #11095 Only 39/90 disks in the enclosure get a slot number after trying the various "slot" options. |
I will have a look. I am cleaning up few more shellchecks which I think should be done. Could you please paste your latest output (ran under '-x' (debug)) from updated script here. |
I was a bit too quick. It seems this config will successfully find/ID all my slots, and it seems to also find them with the old code, not sure how I missed this config on the first try.
Edit: of course, A0 to A91 means there's more missing items, A52 also is missing |
Hi Gabriel - Thanks for the log. The script is still getting improved (other bugs resolved) - once we get over this I will have a look at your issue. |
49711ed
to
9bfa0ca
Compare
Hi Gabrial - I am now looking into your issue. While I get the local reproducer ... Could you please confirm for me that the issue is seen only on the latest version and old vdev_id works fine ? |
This issue exists in both the old (MASTER) and your version, so it's somewhere in the common logic/code. |
Thanks for the info. Then could we move this particular investigation to another bug ? Leaving this PR for Jeff's improvement only? However, This is what I found from your logs. The code correctly (from what I traced back) picks up and parses info from 11.log
map_slot() called with $SLOT
Same for 13.log
map_slot() called with $SLOT
|
I'm happy to leave this off to another issue. It will be far easier to test with your code running so much quicker. |
P.S. Thanks for the breakdown of the code pathing, I have found the answer, my JBOD attaches an "enclosure" device at port 12:
|
So far this is working on two types of JBODs we have, and I'm going to test on a third type tomorrow. |
Three types! It works fine on my older model 60-bay from Newisys |
Hi Tony, Gabriel
Thanks for the verification. We also continue to to test. Will update soon on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far this is working on two types of JBODs we have, and I'm going to test on a third type tomorrow.
It works on our third type as well. Approved
@arshad512 if you can just confirm you don't have any additional changes to add to this I'll go ahead and get it merged. Thanks for all the testing. |
Hi Brian, - There is one change (improvement) we are currently doing. This came out of Jeff's review/testing. We should be force pushing new version in a day. Please hold on the merge. |
9bfa0ca
to
9f26d55
Compare
Hi Brian, Tony, Gabriel Please review the new version. Mostly changes/fix are under map_jbod(). The fix/improvement is to read '/sys/class/enclosure/id' which contains a unique ID to identify JBOD instead of reading is serially. The final result is 'JBOD' 1 and 2 instead of 1 and 3. There was another bug which was also identified by Jeff where under dual HBA config only half of the disk was seen. Before fix
After fix
|
9f26d55
to
9df06cc
Compare
Based on our testing, and other comments in this PR the multiple JBOD enumeration works on: Western Digital Ultrastar Data 60 JBOD |
@arshad512 I'll give it another test. |
One of my JBODs worked with the updated vdev_id, one is not working with it, and one more left to test. I ran out of time before I could dig into why it wasn't working, but I'll try to dig deeper tomorrow. |
Thanks for trying the new version. As Jeff pointed out. We tested across multiple JBOD's and this version work. Could you please paste the '-x' output of the run which is not working for you? |
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
9df06cc
to
f2bdbce
Compare
Hi Tony, While I wait for your debug output. I went over the code again just to see if I could catch a corner case in the code. I could not. However, I realized the function get_uniq_encl_id() could be optimized a bit. My new push does that. Nothing functional changed in new push. |
@arshad512 I figured it out why it wasn't working - I had copied the new vdev_id to the node without setting I tried your latest push with all three of our JBODs and it works fine 👍 |
@arshad512 @tonyhutter great, then this PR is ready to merge. Thanks! |
This is a huge win, thanks @arshad512 and @AeonJJohnson ! |
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes openzfs#11526
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes openzfs#11526
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes openzfs#11526
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes openzfs#11526
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Reviewed-by: Gabriel A. Devenyi <gdevenyi@gmail.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Co-authored-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com> Closes openzfs#11526
Within function sas_handler() userspace commands like
'/usr/sbin/multipath' have been replaced with sourcing
device details from within sysfs which reduced a
significant amount of overhead and processing time.
Multiple JBOD enclosures and their order are sourced
from the bsg driver (/sys/class/enclosure) to isolate
chassis top-level expanders, which are then dynamically
indexed based on host channel of the multipath subordinate
disk member device being processed. Additionally added a
"mixed" mode for slot identification for environments where
a ZFS server system may contain SAS disk slots where there
is no expander (direct connect to HBA) while an attached
external JBOD with an expander have different slot identifier
methods.
How Has This Been Tested?