-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(orm): support outofband-resource-manager #406
feat(orm): support outofband-resource-manager #406
Conversation
e371bb1
to
e79e43a
Compare
e79e43a
to
c983a3b
Compare
"github.com/kubewharf/katalyst-core/pkg/agent/resourcemanager/outofband/endpoint" | ||
) | ||
|
||
func TestCheckpoint(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
t.Parallel()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
) | ||
|
||
func InitORM(agentCtx *GenericContext, conf *config.Configuration, _ interface{}, _ string) (bool, Component, error) { | ||
m, err := outofband.NewManager(conf.PluginRegistrationDir+"/kubelet.sock", agentCtx.EmitterPool.GetDefaultMetricsEmitter(), agentCtx.MetaServer, conf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this sock be conflicted with already-existing ones? such as, qrm, agent, sysadvisor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, "/var/lib/katalyst/plugin-socks/kubelet.sock" is only used by ORM now.
cpuPlugin: qrm_cpu_plugin_dynamic.sock
memoryPlugin: qrm_memory_plugin_dynamic.sock
headroom reporter: headroom-reporter-plugin.sock
kubelet QRM: /var/lib/kubelet
|
||
fs.DurationVar(&o.ORMRconcilePeriod, "orm-reconcile-period", | ||
o.ORMRconcilePeriod, "orm resource reconcile period") | ||
fs.Var(cliflag.NewMapStringString(&o.ORMResourceNamesMap), "orm-resource-names-map", "A set of ResourceName=ResourceQuantity (e.g. best-effort-cpu=cpu,best-effort-memory=memory,...) pairs that map resource name \"best-effort-cpu\" to resource name \"cpu\" during QoS Resource Manager allocation period.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add more comments for this flag (for why we need this), since it is kind of wired for upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
lgtm for me now; you may still need to fix the conflict and then get approve from code owners |
a26b792
to
ca29648
Compare
|
||
fs.DurationVar(&o.ORMRconcilePeriod, "orm-reconcile-period", | ||
o.ORMRconcilePeriod, "orm resource reconcile period") | ||
fs.Var(cliflag.NewMapStringString(&o.ORMResourceNamesMap), "orm-resource-names-map", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use StringToStringVar directly?
return containerType == pluginapi.ContainerType_INIT | ||
} | ||
|
||
func isPodKatalystQoSLevelSystemCores(pod *v1.Pod) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use qos conf instead of annotation directly?
pkg/util/native/resources.go
Outdated
@@ -306,3 +306,7 @@ func MultiplyQuantity(quantity resource.Quantity, y float64) resource.Quantity { | |||
value = int64(float64(value) * y) | |||
return *resource.NewQuantity(value, quantity.Format) | |||
} | |||
|
|||
func ParseQuantityToFloat64(quantity resource.Quantity) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use AsApproximateFloat64 directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use AsApproximateFloat64 directly?
All comments have been resolved.
|
||
// errUnsupportedVersion is the error raised when the resource plugin uses an API version not | ||
// supported by the Kubelet registry | ||
errUnsupportedVersion = "requested API version %q is not supported by kubelet. Supported version is %q" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by ORM?
MetricAddPodTimeout = "ORM_add_pod_timeout" | ||
MetricDeletePodTImeout = "ORM_delete_pod_timeout" | ||
|
||
MainContainerNameAnnotationKey = "kubernetes.io/main-container-name" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to support the main container annotation in ORM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to support the main container annotation for ORM?
Not necessary, first container in container list will be defaulted as the main container, but we can keep it as a feature. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In vanilla K8s, all containers are treated as main containers. If we keep this manner, will there be any problems in the colocation scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To remain consistent with QRM, we can retain this usage. In the long term, when the K8s version is higher, we can use the native sidecar container feature provided by K8s. https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/
ContainerName: container.Name, | ||
ContainerType: containerType, | ||
ContainerIndex: containerIndex, | ||
// customize for tce, PodRole and PodType should be identified by more general annotations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can remove "customize for tce" ?
errBadSocket = "bad socketPath, must be an absolute path:" | ||
|
||
// errUnsupportedVersion is the error raised when the resource plugin uses an API version not | ||
// supported by the Kubelet registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also modify this comment?
5194e72
to
bdebb96
Compare
fs.DurationVar(&o.ORMRconcilePeriod, "orm-reconcile-period", | ||
o.ORMRconcilePeriod, "orm resource reconcile period") | ||
fs.StringToStringVar(&o.ORMResourceNamesMap, "orm-resource-names-map", o.ORMResourceNamesMap, | ||
"A set of ResourceName=ResourceQuantity pairs that map resource name during QoS Resource Manager allocation period. "+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more accurate to use Out-of-band Resource Manager instead of QoS Resource Manager in the message?
bdebb96
to
2e93922
Compare
What type of PR is this?
Features
What this PR does / why we need it:
orm(outofband-resource-manager) supports dynamically allocating resources to pods with different QoS asynchronously through a bypass approach, without relying on intrusive changes to kubelet.
The current version supports shared_cores and reclaimed_cores pool.
Which issue(s) this PR fixes:
#430