Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for node labels to report GPU mode #768

Merged
merged 12 commits into from
Jun 27, 2024

Conversation

visheshtanksale
Copy link
Contributor

Some GPUs support switching between graphics mode and compute mode.

The mode is switched by a utility called displaymodeselector

This change identifies the mode on the GPU and labels the node to help end user schedule workloads

The assumption is that all the GPUs on the node have the same mode, if that is not the case then the label nvidia.com/gpu.mode value is unknown

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general comment, could we update the commit message to indicate that we are reporting the mode and not specifying it.

Then, should MIG devices have unknown reported? Should these not always be compute? What is the intended use of this label?

internal/resource/device_mock.go Outdated Show resolved Hide resolved
internal/resource/nvml-device.go Outdated Show resolved Hide resolved
internal/resource/nvml-device.go Outdated Show resolved Hide resolved
return resolvePCIAddressToMode(pciID)
}

func resolvePCIAddressToMode(addr string) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a function that we should have in go-nvpci for instead of reimplementing it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the implementation here to just read the class based on the pci address. Maybe we should still push getting the class function to go-nvml.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The place for this to live would always be go-nvlib. The package at go-nvml is a wrapper for NVML specifically and we don't really add additional functionality there.

internal/resource/sysfs-device.go Outdated Show resolved Hide resolved
tests/expected-output-mig-none.txt Outdated Show resolved Hide resolved
@@ -209,6 +209,7 @@ their meaning:
| nvidia.com/gpu.machine | String | Machine type | DGX-1 |
| nvidia.com/gpu.memory | Integer | Memory of the GPU in Mb | 2048 |
| nvidia.com/gpu.product | String | Model of the GPU | GeForce-GT-710 |
| nvidia.com/gpu.mode | String | Display or Compute Mode of the GPU | compute |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we provide more information on what this value means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Working on it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to add a reference as to what this means. From the documentation it is also only applicable to specific device types. Do we want to link to the relevant documentation here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIxed

@visheshtanksale visheshtanksale changed the title Add support for node labels to specify GPU mode Add support for node labels to report GPU mode Jun 14, 2024
@jojimt
Copy link

jojimt commented Jun 14, 2024

Then, should MIG devices have unknown reported? Should these not always be compute? What is the intended use of this label?
MIG devices should report compute. The intended use of the label is to let an application select a worker node based on the mode available on the node.

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
@visheshtanksale
Copy link
Contributor Author

As a general comment, could we update the commit message to indicate that we are reporting the mode and not specifying it.

Fixed this

Then, should MIG devices have unknown reported? Should these not always be compute? What is the intended use of this label?

Based on current list of GPUs that support mode switch it will be always compute. But fixed this to just pull it from the actual device.

internal/lm/nvml.go Outdated Show resolved Hide resolved
internal/resource/cuda-device.go Outdated Show resolved Hide resolved
internal/resource/sysfs-device.go Outdated Show resolved Hide resolved
internal/resource/types.go Outdated Show resolved Hide resolved
internal/resource/nvml-device.go Outdated Show resolved Hide resolved
internal/lm/nvml.go Outdated Show resolved Hide resolved
Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @visheshtanksale.

There are some typos in the function name and I'm also not clear on what the behaviour should be for a MIG device.

internal/lm/nvml.go Outdated Show resolved Hide resolved
internal/resource/cuda-device.go Outdated Show resolved Hide resolved
internal/resource/nvml-device.go Outdated Show resolved Hide resolved
@@ -132,3 +133,23 @@ func totalMemory(attr map[string]interface{}) (uint64, error) {
return 0, fmt.Errorf("unsupported attribute type %v", t)
}
}

func (d nvmlMigDevice) GetPIEClass() (uint32, error) {
info, retVal := d.MigDevice.GetPciInfo()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is PCI info valid for a MIG device? How does the busid differ from that of the parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the busID of the parent.

Comment on lines 139 to 148
if retVal != nvml.SUCCESS {
return 0, retVal
}
var bytes []byte
for _, char := range info.BusId {
if char == 0 {
break
}
bytes = append(bytes, byte(char))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of implemention this here this should be pulled into go-nvlib if it is valid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do this in a follow-up though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can decide where to put this behavior once we decide what behavior is expected for MIG devices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the hardcoding of the PCI class for MIG devices this code is removed

internal/lm/nvml.go Outdated Show resolved Hide resolved
@@ -209,6 +209,7 @@ their meaning:
| nvidia.com/gpu.machine | String | Machine type | DGX-1 |
| nvidia.com/gpu.memory | Integer | Memory of the GPU in Mb | 2048 |
| nvidia.com/gpu.product | String | Model of the GPU | GeForce-GT-710 |
| nvidia.com/gpu.mode | String | Display or Compute Mode of the GPU | compute |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to add a reference as to what this means. From the documentation it is also only applicable to specific device types. Do we want to link to the relevant documentation here?

}
gpuMode := getModeForClasses(classes)
labels := Labels{
"nvidia.com/gpu.mode": gpuMode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Since we also extract this label for a MIG device do we expect different labels per MIG profile?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devices that support MIG do not support changing the GPU mode. They are always going to return compute PCI class. We can actually hard code the class of MIG devices to be compute.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's rather do that. It would simplify the implementation quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the implementation

for _, d := range devices {
class, err := d.GetPIEClass()
if err != nil {
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we want to treat errors in getting the class as fatal? This will crash GFD and cause NO labels to be generated. Should we rather return unknown as the label in this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to keep this behavior consistent with other labeler, that is the reason I am returning the error instead instead of labeling it unknown .

internal/resource/sysfs-device.go Outdated Show resolved Hide resolved
Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
Copy link
Member

@elezar elezar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @visheshtanksale.

I have some minor comments, but these can be addressed in a follow-up.

}
for _, class := range classes {
if class != classes[0] {
return "unknown"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker for this PR, but we may want to log the content of classes here as a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@@ -204,3 +204,89 @@ func TestSharingLabeler(t *testing.T) {
})
}
}

func TestGPUModeLabeler(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests!

@@ -51,6 +51,14 @@ func NewDeviceMock(migEnabled bool) *DeviceMock {
IsMigEnabledFunc: func() (bool, error) { return migEnabled, nil },
IsMigCapableFunc: func() (bool, error) { return migEnabled, nil },
GetMigDevicesFunc: func() ([]resource.Device, error) { return nil, nil },
GetPCIClassFunc: func() (uint32, error) { return 0x030000, nil },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this return 0 by default since it's "unknown" / "undefined"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -30,4 +30,5 @@ nvidia\.com\/gpu\.engines\.jpeg=[0-9]+
nvidia\.com\/gpu\.engines\.ofa=[0-9]+
nvidia\.com\/gpu\.slices\.gi=[0-9]+
nvidia\.com\/gpu\.slices\.ci=[0-9]+
nvidia\.com\/gpu\.mode=[unknown|compute|graphics]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can only be compute now, correct? Not a blocker.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Signed-off-by: Vishesh Tanksale <vtanksale@nvidia.com>
@visheshtanksale visheshtanksale merged commit 35ad180 into NVIDIA:main Jun 27, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants