support service profiling manager #71

luomingmeng · 2023-05-19T13:15:58Z

What type of PR is this?

Features

What this PR does / why we need it:

We need use a service profiling manager implemented in the meta server to manage service level status for each pod. For example, agents can use it to judge whether this pod can be reclaimed according to this service performance.

codecov · 2023-05-19T17:48:19Z

Codecov Report

Patch coverage: 64.32% and project coverage change: +0.15 🎉

Comparison is base (21d86d9) 51.30% compared to head (4e324d1) 51.46%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #71      +/-   ##
==========================================
+ Coverage   51.30%   51.46%   +0.15%     
==========================================
  Files         318      334      +16     
  Lines       32418    33484    +1066     
==========================================
+ Hits        16632    17231     +599     
- Misses      13840    14219     +379     
- Partials     1946     2034      +88

Flag	Coverage Δ
unittest	`51.46% <64.32%> (+0.15%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...m-plugins/cpu/dynamicpolicy/allocation_handlers.go	`47.02% <0.00%> (-0.24%)`	⬇️
pkg/agent/qrm-plugins/cpu/dynamicpolicy/policy.go	`38.42% <0.00%> (+0.08%)`	⬆️
...ourcemanager/fetcher/kubelet/topology/interface.go	`0.00% <0.00%> (ø)`
pkg/client/control/cnr.go	`37.80% <0.00%> (-1.94%)`	⬇️
pkg/config/agent/global/base.go	`100.00% <ø> (ø)`
pkg/config/agent/global/metaserver.go	`100.00% <ø> (ø)`
pkg/util/general/error.go	`0.00% <0.00%> (ø)`
...ysadvisor/plugin/qosaware/server/cpu/cpu_server.go	`54.26% <3.44%> (-1.40%)`	⬇️
.../agent/resourcemanager/reporter/cnr/cnrreporter.go	`62.20% <27.58%> (-2.06%)`	⬇️
pkg/agent/resourcemanager/reporter/converter.go	`33.33% <33.33%> (ø)`
... and 49 more

... and 25 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

waynepeking348 · 2023-05-20T08:58:12Z

pkg/agent/sysadvisor/plugin/qosaware/resource/memory/headroompolicy/policy_canonical.go

 	f := func(podUID string, containerName string, ci *types.ContainerInfo) bool {
-		containerEstimation, err := helper.EstimateContainerMemoryUsage(ci, p.metaReader, p.essentials.EnableReclaim)
+		enableReclaim := false
+		if p.essentials.EnableReclaim {


can we move this logic to util in resource/help.go? (or add a new file in dir resource/) since it is the same for cpu and memory

waynepeking348 · 2023-05-20T09:03:22Z

pkg/metaserver/metaserver.go

-	if err != nil {
-		return nil, err
+	var serviceProfilingManager spd.ServiceProfilingManager
+	if conf.EnableServiceProfilingManager {


why we need a flag to control whether this manager should be enabled or not?

pkg/metaserver/spd/manager.go

cheney-lin · 2023-05-23T02:59:13Z

pkg/metaserver/spd/manager.go

-			klog.Infof("[spd-manager] spd %s cache has been deleted", key)
-			return nil
+		if target.LowerBound != nil && indicatorValue[indicatorName] < *target.LowerBound {
+			return true, nil


actually, it's hard to say higher or lower indicator value represnts better business performance

cheney-lin · 2023-05-23T03:15:04Z

pkg/agent/sysadvisor/plugin/qosaware/resource/helper/helper.go

+	}
+
+	// check whether the pod is degraded
+	degraded, err := metaServer.ServiceBusinessPerformanceDegraded(ctx, pod)


what about being named as ServiceResourceReclaimEnable? Degrading means a decrease in performance score, but it may not be to indicate that it cannot be co-located with BE pods.

sun-yuliang · 2023-05-23T03:19:07Z

pkg/metaserver/spd/manager.go

-			// avoid frequent requests to the api-server in some bad situations
-			s.spdCache.SetLastFetchRemoteTime(key, now)
+	for indicatorName, target := range indicatorTarget {
+		if target.UpperBound != nil && indicatorValue[indicatorName] > *target.UpperBound {


Logic here is questionable: judged as degraded when indicator value > upperbound or indicator value < lowerbound

sun-yuliang · 2023-05-23T03:21:05Z

pkg/agent/sysadvisor/plugin/qosaware/resource/helper/helper.go

+		return true, nil
+	}
+
+	// if pod is degraded, it can not be reclaimed


This is not true. Pod cannot be reclaimed when it is degraded over a threshold, not just degraded.

cheney-lin · 2023-05-23T16:00:33Z

pkg/metaserver/spd/manager.go

-	spdName, err := s.getPodSPDNameFunc(pod)
+func NewServiceProfilingManager(clientSet *client.GenericClientSet, emitter metrics.MetricEmitter,
+	cncFetcher cnc.CNCFetcher, conf *pkgconfig.Configuration) (ServiceProfilingManager, error) {
+	fetcher, err := NewSPDFetcher(clientSet, emitter, cncFetcher, conf)


How about putting spdFetcher into MetaAgent so that we can get spdFetcher to create new ServiceProfilingManager and replace native ServiceProfilingManager with it if needed?

You can implement custom ServiceProfilingManager by using NewSPDFetcher also, and we doesn't want someone use spd fetcher directly, because spd just one implement of service profiling manager

luomingmeng self-assigned this May 19, 2023

luomingmeng added enhancement New feature or request workflow/draft draft: no need to review labels May 19, 2023

luomingmeng added this to the v0.2 milestone May 19, 2023

luomingmeng requested review from sun-yuliang, cheney-lin and waynepeking348 May 19, 2023 13:20

luomingmeng force-pushed the dev/support_service_profiling_manager branch 4 times, most recently from fff3a3f to 052e1d4 Compare May 19, 2023 17:37

waynepeking348 reviewed May 20, 2023

View reviewed changes

luomingmeng force-pushed the dev/support_service_profiling_manager branch 5 times, most recently from d9779e7 to 9130205 Compare May 22, 2023 05:45

waynepeking348 previously approved these changes May 22, 2023

View reviewed changes

luomingmeng dismissed waynepeking348’s stale review via 3c67f7a May 22, 2023 06:16

luomingmeng force-pushed the dev/support_service_profiling_manager branch from 9130205 to 3c67f7a Compare May 22, 2023 06:16

luomingmeng marked this pull request as ready for review May 22, 2023 06:20

luomingmeng added workflow/need-review review: test succeeded, need to review and removed workflow/draft draft: no need to review labels May 22, 2023

luomingmeng force-pushed the dev/support_service_profiling_manager branch from 3c67f7a to 4bfefb0 Compare May 22, 2023 06:47

waynepeking348 previously approved these changes May 22, 2023

View reviewed changes

cheney-lin reviewed May 23, 2023

View reviewed changes

sun-yuliang reviewed May 23, 2023

View reviewed changes

luomingmeng dismissed waynepeking348’s stale review via 1900cb0 May 23, 2023 12:39

luomingmeng force-pushed the dev/support_service_profiling_manager branch from 1900cb0 to 04b7d8e Compare May 23, 2023 12:54

luomingmeng requested review from waynepeking348, sun-yuliang and cheney-lin May 23, 2023 12:55

support service profiling manager

4e324d1

luomingmeng force-pushed the dev/support_service_profiling_manager branch from 04b7d8e to 4e324d1 Compare May 23, 2023 13:11

waynepeking348 approved these changes May 23, 2023

View reviewed changes

cheney-lin reviewed May 23, 2023

View reviewed changes

cheney-lin approved these changes May 24, 2023

View reviewed changes

sun-yuliang approved these changes May 24, 2023

View reviewed changes

luomingmeng added workflow/merge-ready merge-ready: code is ready and can be merged and removed workflow/need-review review: test succeeded, need to review labels May 24, 2023

waynepeking348 merged commit bc6fe43 into kubewharf:main May 24, 2023

luomingmeng added a commit to luomingmeng/katalyst-core that referenced this pull request Oct 11, 2024

support service profiling manager (kubewharf#71)

bb48b93

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support service profiling manager #71

support service profiling manager #71

luomingmeng commented May 19, 2023

codecov bot commented May 19, 2023 •

edited

Loading

waynepeking348 May 20, 2023

waynepeking348 May 20, 2023

cheney-lin May 23, 2023

cheney-lin May 23, 2023

sun-yuliang May 23, 2023

sun-yuliang May 23, 2023 •

edited

Loading

cheney-lin May 23, 2023

luomingmeng May 23, 2023

support service profiling manager #71

support service profiling manager #71

Conversation

luomingmeng commented May 19, 2023

What type of PR is this?

What this PR does / why we need it:

codecov bot commented May 19, 2023 • edited Loading

Codecov Report

waynepeking348 May 20, 2023

Choose a reason for hiding this comment

waynepeking348 May 20, 2023

Choose a reason for hiding this comment

cheney-lin May 23, 2023

Choose a reason for hiding this comment

cheney-lin May 23, 2023

Choose a reason for hiding this comment

sun-yuliang May 23, 2023

Choose a reason for hiding this comment

sun-yuliang May 23, 2023 • edited Loading

Choose a reason for hiding this comment

cheney-lin May 23, 2023

Choose a reason for hiding this comment

luomingmeng May 23, 2023

Choose a reason for hiding this comment

codecov bot commented May 19, 2023 •

edited

Loading

sun-yuliang May 23, 2023 •

edited

Loading