Skip to content

fix: use default pgpu alias when an override is not available#44

Merged
rajatchopra merged 2 commits intoNVIDIA:mainfrom
rajatchopra:pgpualias
Feb 11, 2026
Merged

fix: use default pgpu alias when an override is not available#44
rajatchopra merged 2 commits intoNVIDIA:mainfrom
rajatchopra:pgpualias

Conversation

@rajatchopra
Copy link
Collaborator

@rajatchopra rajatchopra commented Feb 6, 2026

The GFD pod needs a GPU and it needs to know which 'device name' is used for publishing the devices. When P_GPU_ALIAS is present, it is that one, but otherwise pick one GPU device's native name.

@coveralls
Copy link

coveralls commented Feb 6, 2026

Pull Request Test Coverage Report for Build 21923530054

Details

  • 0 of 31 (0.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.9%) to 28.412%

Changes Missing Coverage Covered Lines Changed/Added Lines %
cmd/main.go 0 9 0.0%
pkg/device_plugin/gfd.go 0 22 0.0%
Totals Coverage Status
Change from base Build 21921289502: -0.9%
Covered Lines: 229
Relevant Lines: 806

💛 - Coveralls

@rajatchopra rajatchopra force-pushed the pgpualias branch 4 times, most recently from b473fd1 to bb0fcb6 Compare February 11, 2026 20:00
@zvonkok zvonkok requested a review from Copilot February 11, 2026 20:10
@jojimt
Copy link
Collaborator

jojimt commented Feb 11, 2026

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the NVIDIA sandbox device plugin to pick a usable GPU resource name for the GFD pod when P_GPU_ALIAS isn’t set, and refines CDI spec generation and logging around IOMMUFD/VFIO behavior.

Changes:

  • Use a discovered GPU device name as the default resource name for the GFD pod when P_GPU_ALIAS is not provided.
  • Rework CDI spec generation to support alias vs per-device-type output for both GPUs and NVSwitches.
  • Adjust IOMMUFD-related logging and allocation paths (and make cdiRoot configurable for tests).

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
pkg/device_plugin/gfd.go Builds GFD pod resource name from a derived GPU device name; adds helper to select that name.
pkg/device_plugin/generic_device_plugin.go Refactors allocation branching/logging between IOMMUFD and legacy VFIO modes.
pkg/device_plugin/device_plugin.go Updates logs and changes how IOMMU keys are derived in IOMMUFD mode.
pkg/device_plugin/constants.go Makes cdiRoot mutable for test redirection.
pkg/device_plugin/cdi.go Reworks CDI generation for alias vs heterogeneous modes and changes device naming in CDI specs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@rajatchopra
Copy link
Collaborator Author

@rajatchopra can we add the defaults at source (https://github.com/NVIDIA/sandbox-device-plugin/blob/main/cmd/main.go#L37-L38)

No I guess. For two reasons:

  1. We will occlude the original device name if that was the intent
  2. heterogeneous case will never work

…not set

Signed-off-by: Rajat Chopra <rajatc@nvidia.com>
@jojimt
Copy link
Collaborator

jojimt commented Feb 11, 2026

We will occlude the original device name if that was the intent

  1. This is something we inherited from kubevirt. I don't think a use case exists.
  2. We can make a distinction between not set vs set to "" by using os.LookupEnv.

@rajatchopra
Copy link
Collaborator Author

  1. We can make a distinction between not set vs set to "" by using os.LookupEnv.

Yes, that is a good way to get around. Fixed.

… set

Signed-off-by: Rajat Chopra <rajatc@nvidia.com>
Copy link
Collaborator

@jojimt jojimt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rajatchopra rajatchopra merged commit e35bd45 into NVIDIA:main Feb 11, 2026
8 checks passed
@rajatchopra rajatchopra deleted the pgpualias branch February 12, 2026 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants