-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(reporter-plugin): support report rdma topology #314
Conversation
f60bd2d
to
d42ebe2
Compare
d42ebe2
to
997b7ad
Compare
@@ -50,12 +52,15 @@ func (o *KubeletPluginOptions) AddFlags(fss *cliflag.NamedFlagSets) { | |||
"the path of kubelet resource plugin") | |||
fs.BoolVar(&o.EnableReportTopologyPolicy, "enable-report-topology-policy", o.EnableReportTopologyPolicy, | |||
"whether to report topology policy") | |||
fs.BoolVar(&o.EnableReportRDMATopology, "enable-report-rdma-topology", false, "enable report rdma topology, default false") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use o.EnableReportRDMATopology
as the default value instead of false
?
pkg/agent/resourcemanager/fetcher/kubelet/topology/topology_adapter.go
Outdated
Show resolved
Hide resolved
pkg/util/cnr.go
Outdated
@@ -42,6 +42,8 @@ const ( | |||
CNRFieldNameTopologyZone = "TopologyZone" | |||
CNRFieldNameResources = "Resources" | |||
CNRFieldNameTopologyPolicy = "TopologyPolicy" | |||
|
|||
ResourceRDMA = "vke.volcengine.com/rdma" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you make the resource name configurable? We can assign the default value to resource.katalyst.kubewharf.io
, and then pass in a custom value.
@@ -315,6 +342,42 @@ func (p *topologyAdapterImpl) addNumaSocketChildrenZoneNodes(generator *util.Top | |||
return nil | |||
} | |||
|
|||
// addNumaSocketChildrenZoneNodes add the child nodes of socket or numa zone nodes to the generator, the child nodes are | |||
// generated by generateZoneNode according to TopologyLevel, Type and Name in TopologyAwareAllocatableQuantityList | |||
func (p *topologyAdapterImpl) addNICNumaChildrenZoneNodes(generator *util.TopologyZoneGenerator, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make this function more generic by using a map configuration (key: device.ResourceName, value: ZoneType). If the user has set this map, the function can construct NUMA's child zones by iterating through allocatableResources.Devices. If device.ResourceName is present in the map configuration, the zone name will be the device ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have refactored this pr. Could you PTAL?
997b7ad
to
ea4c7c5
Compare
Signed-off-by: fjding <dingfangjie@bytedance.com>
Signed-off-by: caohe <caohe9603@gmail.com>
ea4c7c5
to
9153bef
Compare
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #314 +/- ##
==========================================
- Coverage 53.43% 53.29% -0.14%
==========================================
Files 437 437
Lines 48155 48181 +26
==========================================
- Hits 25732 25680 -52
- Misses 19505 19586 +81
+ Partials 2918 2915 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -38,6 +39,7 @@ func NewKubeletPluginOptions() *KubeletPluginOptions { | |||
pluginapi.ResourcePluginPath, | |||
}, | |||
EnableReportTopologyPolicy: false, | |||
ResourceNameToZoneNameMap: make(map[string]string), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeviceZoneResourceNameToZoneTypeMap ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pkg/util/cnr.go
Outdated
@@ -373,3 +373,13 @@ func GenerateSocketZoneNode(socketID int) ZoneNode { | |||
}, | |||
} | |||
} | |||
|
|||
// GenerateDeviceZoneNode generates device zone node through device id, which must be unique | |||
func GenerateDeviceZoneNode(deviceId, zoneName string) ZoneNode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zoneType is more properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
for _, deviceId := range device.DeviceIds { | ||
deviceNode := util.GenerateDeviceZoneNode(deviceId, targetZoneName) | ||
if _, ok := zoneAllocationsMap[deviceNode]; !ok { | ||
zoneAllocationsMap[deviceNode] = []*nodev1alpha1.Allocation{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all this logic can implement in addContainerDevices, because the zoneNode should also need have allocatable and capacity when allocation has request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
af0d880
to
5c64218
Compare
…ices Signed-off-by: caohe <caohe9603@gmail.com>
5c64218
to
7362121
Compare
What type of PR is this?
Features
What this PR does / why we need it:
Support reporting devices (such as RDMA, GPU, etc.) as topology zones under NUMA, thereby supporting inter-RDMA affinity at switch level.
Which issue(s) this PR fixes:
Special notes for your reviewer: