Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support service profiling manager #71

Conversation

luomingmeng
Copy link
Collaborator

What type of PR is this?

Features

What this PR does / why we need it:

We need use a service profiling manager implemented in the meta server to manage service level status for each pod. For example, agents can use it to judge whether this pod can be reclaimed according to this service performance.

@luomingmeng luomingmeng self-assigned this May 19, 2023
@luomingmeng luomingmeng added enhancement New feature or request workflow/draft draft: no need to review labels May 19, 2023
@luomingmeng luomingmeng added this to the v0.2 milestone May 19, 2023
@luomingmeng luomingmeng force-pushed the dev/support_service_profiling_manager branch 4 times, most recently from fff3a3f to 052e1d4 Compare May 19, 2023 17:37
@codecov
Copy link

codecov bot commented May 19, 2023

Codecov Report

Patch coverage: 64.32% and project coverage change: +0.15 🎉

Comparison is base (21d86d9) 51.30% compared to head (4e324d1) 51.46%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #71      +/-   ##
==========================================
+ Coverage   51.30%   51.46%   +0.15%     
==========================================
  Files         318      334      +16     
  Lines       32418    33484    +1066     
==========================================
+ Hits        16632    17231     +599     
- Misses      13840    14219     +379     
- Partials     1946     2034      +88     
Flag Coverage Δ
unittest 51.46% <64.32%> (+0.15%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...m-plugins/cpu/dynamicpolicy/allocation_handlers.go 47.02% <0.00%> (-0.24%) ⬇️
pkg/agent/qrm-plugins/cpu/dynamicpolicy/policy.go 38.42% <0.00%> (+0.08%) ⬆️
...ourcemanager/fetcher/kubelet/topology/interface.go 0.00% <0.00%> (ø)
pkg/client/control/cnr.go 37.80% <0.00%> (-1.94%) ⬇️
pkg/config/agent/global/base.go 100.00% <ø> (ø)
pkg/config/agent/global/metaserver.go 100.00% <ø> (ø)
pkg/util/general/error.go 0.00% <0.00%> (ø)
...ysadvisor/plugin/qosaware/server/cpu/cpu_server.go 54.26% <3.44%> (-1.40%) ⬇️
.../agent/resourcemanager/reporter/cnr/cnrreporter.go 62.20% <27.58%> (-2.06%) ⬇️
pkg/agent/resourcemanager/reporter/converter.go 33.33% <33.33%> (ø)
... and 49 more

... and 25 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

f := func(podUID string, containerName string, ci *types.ContainerInfo) bool {
containerEstimation, err := helper.EstimateContainerMemoryUsage(ci, p.metaReader, p.essentials.EnableReclaim)
enableReclaim := false
if p.essentials.EnableReclaim {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this logic to util in resource/help.go? (or add a new file in dir resource/) since it is the same for cpu and memory

if err != nil {
return nil, err
var serviceProfilingManager spd.ServiceProfilingManager
if conf.EnableServiceProfilingManager {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we need a flag to control whether this manager should be enabled or not?

@luomingmeng luomingmeng force-pushed the dev/support_service_profiling_manager branch 5 times, most recently from d9779e7 to 9130205 Compare May 22, 2023 05:45
waynepeking348
waynepeking348 previously approved these changes May 22, 2023
@luomingmeng luomingmeng force-pushed the dev/support_service_profiling_manager branch from 9130205 to 3c67f7a Compare May 22, 2023 06:16
@luomingmeng luomingmeng marked this pull request as ready for review May 22, 2023 06:20
@luomingmeng luomingmeng added workflow/need-review review: test succeeded, need to review and removed workflow/draft draft: no need to review labels May 22, 2023
@luomingmeng luomingmeng force-pushed the dev/support_service_profiling_manager branch from 3c67f7a to 4bfefb0 Compare May 22, 2023 06:47
waynepeking348
waynepeking348 previously approved these changes May 22, 2023
pkg/metaserver/spd/manager.go Show resolved Hide resolved
klog.Infof("[spd-manager] spd %s cache has been deleted", key)
return nil
if target.LowerBound != nil && indicatorValue[indicatorName] < *target.LowerBound {
return true, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, it's hard to say higher or lower indicator value represnts better business performance

}

// check whether the pod is degraded
degraded, err := metaServer.ServiceBusinessPerformanceDegraded(ctx, pod)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about being named as ServiceResourceReclaimEnable? Degrading means a decrease in performance score, but it may not be to indicate that it cannot be co-located with BE pods.

// avoid frequent requests to the api-server in some bad situations
s.spdCache.SetLastFetchRemoteTime(key, now)
for indicatorName, target := range indicatorTarget {
if target.UpperBound != nil && indicatorValue[indicatorName] > *target.UpperBound {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic here is questionable: judged as degraded when indicator value > upperbound or indicator value < lowerbound

return true, nil
}

// if pod is degraded, it can not be reclaimed
Copy link
Collaborator

@sun-yuliang sun-yuliang May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. Pod cannot be reclaimed when it is degraded over a threshold, not just degraded.

@luomingmeng luomingmeng force-pushed the dev/support_service_profiling_manager branch from 1900cb0 to 04b7d8e Compare May 23, 2023 12:54
@luomingmeng luomingmeng force-pushed the dev/support_service_profiling_manager branch from 04b7d8e to 4e324d1 Compare May 23, 2023 13:11
spdName, err := s.getPodSPDNameFunc(pod)
func NewServiceProfilingManager(clientSet *client.GenericClientSet, emitter metrics.MetricEmitter,
cncFetcher cnc.CNCFetcher, conf *pkgconfig.Configuration) (ServiceProfilingManager, error) {
fetcher, err := NewSPDFetcher(clientSet, emitter, cncFetcher, conf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting spdFetcher into MetaAgent so that we can get spdFetcher to create new ServiceProfilingManager and replace native ServiceProfilingManager with it if needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement custom ServiceProfilingManager by using NewSPDFetcher also, and we doesn't want someone use spd fetcher directly, because spd just one implement of service profiling manager

@luomingmeng luomingmeng added workflow/merge-ready merge-ready: code is ready and can be merged and removed workflow/need-review review: test succeeded, need to review labels May 24, 2023
@waynepeking348 waynepeking348 merged commit bc6fe43 into kubewharf:main May 24, 2023
luomingmeng added a commit to luomingmeng/katalyst-core that referenced this pull request Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request workflow/merge-ready merge-ready: code is ready and can be merged
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants