-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdev_id: Support daisy-chained JBODs in multipath mode #11520
Conversation
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
Within function sas_handler() userspace commands like '/usr/sbin/multipath' have been replaced with sourcing device details from within sysfs which reduced a significant amount of overhead and processing time. Multiple JBOD enclosures and their order are sourced from the bsg driver (/sys/class/enclosure) to isolate chassis top-level expanders, which are then dynamically indexed based on host channel of the multipath subordinate disk member device being processed. Additionally added a "mixed" mode for slot identification for environments where a ZFS server system may contain SAS disk slots where there is no expander (direct connect to HBA) while an attached external JBOD with an expander have different slot identifier methods. How Has This Been Tested? ~~~~~~~~~~~~~~~~~~~~~~ Testing was performed on a AMD EPYC based dual-server high-availability multipath environment with multiple HBAs per ZFS server and four SAS JBODs. The two primary JBODs were multipath/cross-connected between the two ZFS-HA servers. The secondary JBODs were daisy-chained off of the primary JBODs using aligned SAS expander channels (JBOD-0 expanderA--->JBOD-1 expanderA, JBOD-0 expanderB--->JBOD-1 expanderB, etc). Pools were created, exported and re-imported, imported globally with 'zpool import -a -d /dev/disk/by-vdev'. Low level udev debug outputs were traced to isolate and resolve errors. Result: ~~~~~~~ Initial testing of a previous version of this change showed how reliance on userspace utilities like '/usr/sbin/multipath' and '/usr/bin/lsscsi' were exacerbated by increasing numbers of disks and JBODs. With four 60-disk SAS JBODs and 240 disks the time to process a udevadm trigger was 3 minutes 30 seconds during which nearly all CPU cores were above 80% utilization. By switching reliance on userspace utilities to sysfs in this version, the udevadm trigger processing time was reduced to 12.2 seconds and negligible CPU load. This patch also fixes few shellcheck complains. Signed-off-by: Jeff Johnson <jeff.johnson@aeoncomputing.com> Signed-off-by: Arshad Hussain <arshad.hussain@aeoncomputing.com>
…_id-slot_mix Conflicts: cmd/vdev_id/vdev_id
This looks great, thanks for the hard work de-bashifying the code. There's a few checkbashism complaints
|
@arshad512 can you rebase this on master and force update the PR. In the process it'd be ideal if you could also split this in to two commits. One which makes the POSIX changes, and another which adds the new functionality. |
And here's the current state of shellcheck, I think there's some more fixups that could be made:
Attaching as well because github literal embed seems to still break formatting. |
Latest code moved to #11526. I am closing this PR. - Thanks |
Description
Within function sas_handler() userspace commands like
'/usr/sbin/multipath' have been replaced with sourcing
device details from within sysfs which reduced a
significant amount of overhead and processing time.
Multiple JBOD enclosures and their order are sourced
from the bsg driver (/sys/class/enclosure) to isolate
chassis top-level expanders, which are then dynamically
indexed based on host channel of the multipath subordinate
disk member device being processed. Additionally added a
"mixed" mode for slot identification for environments where
a ZFS server system may contain SAS disk slots where there
is no expander (direct connect to HBA) while an attached
external JBOD with an expander have different slot identifier
methods.
How Has This Been Tested?
Testing was performed on a AMD EPYC based dual-server
high-availability multipath environment with multiple
HBAs per ZFS server and four SAS JBODs. The two primary
JBODs were multipath/cross-connected between the two
ZFS-HA servers. The secondary JBODs were daisy-chained
off of the primary JBODs using aligned SAS expander
channels (JBOD-0 expanderA--->JBOD-1 expanderA,
JBOD-0 expanderB--->JBOD-1 expanderB, etc).
Pools were created, exported and re-imported, imported
globally with 'zpool import -a -d /dev/disk/by-vdev'.
Low level udev debug outputs were traced to isolate
and resolve errors.
Result:
Initial testing of a previous version of this change
showed how reliance on userspace utilities like
'/usr/sbin/multipath' and '/usr/bin/lsscsi' were
exacerbated by increasing numbers of disks and JBODs.
With four 60-disk SAS JBODs and 240 disks the time to
process a udevadm trigger was 3 minutes 30 seconds
during which nearly all CPU cores were above 80%
utilization. By switching reliance on userspace
utilities to sysfs in this version, the udevadm
trigger processing time was reduced to 12.2 seconds
and negligible CPU load.
This patch also fixes few shellcheck complains.
Signed-off-by: Jeff Johnson jeff.johnson@aeoncomputing.com
Signed-off-by: Arshad Hussain arshad.hussain@aeoncomputing.com
Types of changes
Checklist:
Signed-off-by
.