-
Notifications
You must be signed in to change notification settings - Fork 19
Update vfio-manage to choose best VFIO driver #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 20018055700Details
💛 - Coveralls |
fcfa6cf to
7829c7f
Compare
internal/nvpci/nvpci.go
Outdated
| modAliasPath := filepath.Join(d.Path, "modalias") | ||
| modAliasContent, err := os.ReadFile(modAliasPath) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to read modalias file for %s: %w", d.Address, err) | ||
| } | ||
|
|
||
| modAliasStr := strings.TrimSpace(string(modAliasContent)) | ||
| modAlias, err := parseModAliasString(modAliasStr) | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to parse modalias string %q for device %q: %w", modAliasStr, d.Address, err) | ||
| } | ||
| logrus.Debugf("modalias for device %q: %+v", d.Address, modAlias) | ||
|
|
||
| kernelVersion, err := getKernelVersion() | ||
| if err != nil { | ||
| return "", fmt.Errorf("failed to get kernel version: %w", err) | ||
| } | ||
| logrus.Debugf("kernel version: %s", kernelVersion) | ||
|
|
||
| modulesAliasFilePath := filepath.Join("/lib/modules", kernelVersion, "modules.alias") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we extract our file paths into constants?
const (
kernelModulesRoot = "/lib/modules"
modulesAliasFileName = "modules.alias"
)
We can also create helper functions to create the paths:
func getModulesAliasPath(kernelVersion string) string {
return filepath.Join(kernelModulesRoot, kernelVersion, modulesAliasFileName)
}
func getDeviceModaliasPath(devicePath string) string {
return filepath.Join(devicePath, "modalias")
}
This way, callers don't need to know the exact path structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved /lib/modules to a constant. I have decided against moving modules.alias to a constant -- for files that we are only opening once, I prefer to read the filenames in-line (instead of jumping to a constant). Obviously, if we need to refer to this filename in multiple parts of the code in the future, I would be happy to move this to a constant.
internal/nvpci/modalias.go
Outdated
| if matches, score := matchField(deviceModAlias.vendor, patternModAlias.vendor); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.device, patternModAlias.device); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.subvendor, patternModAlias.subvendor); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.subdevice, patternModAlias.subdevice); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.baseClass, patternModAlias.baseClass); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.subClass, patternModAlias.subClass); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| if matches, score := matchField(deviceModAlias.interface_, patternModAlias.interface_); !matches { | ||
| return false, 0 | ||
| } else { | ||
| specificity += score | ||
| } | ||
|
|
||
| return true, specificity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to use slices of getters here to shorten this code?
type fieldGetter func(*modAlias) string
fields := []struct {
getter fieldGetter
}{
{func(m *modAlias) string { return m.vendor }},
{func(m *modAlias) string { return m.device }},
// ... etc
}
for _, field := range fields {
deviceVal := field.getter(deviceModAlias)
patternVal := field.getter(patternModAlias)
if matches, score := matchField(deviceVal, patternVal); !matches {
return false, 0
}
specificity += score
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the implementation significantly. I believe it is simpler now. Let me know what you think.
internal/nvpci/modalias.go
Outdated
| for _, line := range lines { | ||
| line = strings.TrimSpace(line) | ||
|
|
||
| if !strings.HasPrefix(line, "alias vfio_pci:") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we extract this to a named constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
ff28e1c to
b5da7d9
Compare
a95b741 to
ad775b1
Compare
64f0f28 to
26d70dd
Compare
Rather than always binding GPUs to the vfio-pci driver, this commit
introduces logic to see if the running kernel has a VFIO variant
driver available that is a better match for the device. This is required
on Grace-based systems where the nvgrace_gpu_vfio_pci module is required
to be used in favor of the vfio-pci module.
We read the mod.alias file for a given device, then we look through
/lib/modules/${kernel_version}/modules.alias for the vfio_pci alias
that matches with the least number of wildcard ('*') fields.
The code introduced in this commit is inspired by:
https://gitlab.com/libvirt/libvirt/-/commit/82e2fac297105f554f57fb589002933231b4f711
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
Signed-off-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
26d70dd to
3bafc14
Compare
karthikvetrivel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.

Rather than always binding GPUs to the vfio-pci driver, this commit
introduces logic to see if the running kernel has a VFIO variant
driver available that is a better match for the device. This is required
on Grace-based systems where the nvgrace_gpu_vfio_pci module is required
to be used in favor of the vfio-pci module.
We read the mod.alias file for a given device, then we look through
/lib/modules/${kernel_version}/modules.alias for the vfio_pci alias
that matches with the least number of wildcard ('*') fields.
The code introduced in this commit is inspired by:
https://gitlab.com/libvirt/libvirt/-/commit/82e2fac297105f554f57fb589002933231b4f711
Depends on #127
Testing
On a GB200 compute tray:
On a system with one L40 (configured in graphics mode) and one L4 GPU: