Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the issue of missing workqueue metrics in the Karmada controller #5972

Conversation

XiShanYongYe-Chang
Copy link
Member

@XiShanYongYe-Chang XiShanYongYe-Chang commented Dec 24, 2024

What type of PR is this?

/kind bug

What this PR does / why we need it:

This pr removes the apiserver dependency on resourceinterpreter. For details about why to remove the apiserver reference, see #5945 (comment).

This pr is part of #5945. It focuses on removing only the dependency of the apiserver repository, thus directly solving the problem of missing workqueue metrics and facilitating the prcherry-pick to the previous version (v1.10, v1.11 and v1.12).

Which issue(s) this PR fixes:
Fixes #5696
Part of #5954

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

`karmada-controller-manager`: Fixed the issue of missing work queue metrics.

@karmada-bot karmada-bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 24, 2024
@karmada-bot karmada-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 24, 2024
@XiShanYongYe-Chang
Copy link
Member Author

/cc @RainbowMango @chaosi-zju

@codecov-commenter
Copy link

codecov-commenter commented Dec 24, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 65.00000% with 14 lines in your changes missing coverage. Please review.

Project coverage is 48.26%. Comparing base (c9ca6ac) to head (5bd3d01).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
...interpreter/customized/webhook/serviceresolvers.go 66.66% 13 Missing ⚠️
...sourceinterpreter/customized/webhook/customized.go 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5972      +/-   ##
==========================================
+ Coverage   48.23%   48.26%   +0.02%     
==========================================
  Files         664      665       +1     
  Lines       54749    54788      +39     
==========================================
+ Hits        26410    26444      +34     
- Misses      26624    26630       +6     
+ Partials     1715     1714       -1     
Flag Coverage Δ
unittests 48.26% <65.00%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@XiShanYongYe-Chang XiShanYongYe-Chang force-pushed the remove-apiserver-dependency-in-resourceinterpreter branch from 4c493bd to b0ba3cb Compare December 24, 2024 12:41
@XiShanYongYe-Chang
Copy link
Member Author

/retest

@XiShanYongYe-Chang XiShanYongYe-Chang force-pushed the remove-apiserver-dependency-in-resourceinterpreter branch from b0ba3cb to 5ed3932 Compare December 25, 2024 01:32
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/assign

@RainbowMango
Copy link
Member

cc @lxtywypc
Could you please take a look?

@XiShanYongYe-Chang XiShanYongYe-Chang force-pushed the remove-apiserver-dependency-in-resourceinterpreter branch from 5ed3932 to 8f4414b Compare December 25, 2024 03:38
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, but please

  1. organize the tests case by order normal cases --> abnormal cases, and make sure each test has a meaningful name.
  2. Please provide a test report, to show the metrics are emitted as expected from both karmada-controller-manager and karmada-agent.

@XiShanYongYe-Chang
Copy link
Member Author

organize the tests case by order normal cases --> abnormal cases

Test cases are organized by service type. If all abnormal cases are placed at the end, the organization relationship will be broken.

@XiShanYongYe-Chang XiShanYongYe-Chang force-pushed the remove-apiserver-dependency-in-resourceinterpreter branch from 8f4414b to a029089 Compare December 25, 2024 06:39
@XiShanYongYe-Chang
Copy link
Member Author

karmada-controller-manager:

curl http://127.0.0.1:8080/metrics | grep workqueue_work_duration_seconds_sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 22344    0 22344    0     0  10.2M      0 --:--:-- --:--:-- --:--:-- 10.6M
workqueue_work_duration_seconds_sum{controller="endpointslice-collect",name="endpointslice-collect"} 0
workqueue_work_duration_seconds_sum{controller="scale target worker",name="scale target worker"} 0
workqueue_work_duration_seconds_sum{controller="service-export",name="service-export"} 0
workqueue_work_duration_seconds_sum{controller="work-status",name="work-status"} 0

karmada-agent:

 curl http://127.0.0.1:8080/metrics | grep workqueue_work_duration_seconds_sum
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 18997    0 18997    0     0  8576k      0 --:--:-- --:--:-- --:--:-- 9275k
workqueue_work_duration_seconds_sum{controller="endpointslice-collect",name="endpointslice-collect"} 0
workqueue_work_duration_seconds_sum{controller="service-export",name="service-export"} 0
workqueue_work_duration_seconds_sum{controller="work-status",name="work-status"} 0

Signed-off-by: changzhen <changzhen5@huawei.com>
@XiShanYongYe-Chang XiShanYongYe-Chang force-pushed the remove-apiserver-dependency-in-resourceinterpreter branch from a029089 to 5bd3d01 Compare December 25, 2024 07:52
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

wait @lxtywypc for a while

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 25, 2024
@CharlesQQ
Copy link
Member

Test results were as expected @RainbowMango

# curl http://127.0.0.1:10358/metrics |grep workqueue_depth
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  3927    0  3927    0     0   6756      0 --:--:-- --:--:-- --:--:--  6759# HELP workqueue_depth Current depth of workqueue
# TYPE workqueue_depth gauge
workqueue_depth{controller="agent-csr-approving-controller",name="agent-csr-approving-controller"} 0
workqueue_depth{controller="binding-controller",name="binding-controller"} 1712
workqueue_depth{controller="binding-eviction",name="binding-eviction"} 0
workqueue_depth{controller="cluster-binding-eviction",name="cluster-binding-eviction"} 0
workqueue_depth{controller="cluster-controller",name="cluster-controller"} 0
workqueue_depth{controller="cluster-resource-binding-application-failover-controller",name="cluster-resource-binding-application-failover-controller"} 0
workqueue_depth{controller="cluster-resource-binding-controller",name="cluster-resource-binding-controller"} 0
workqueue_depth{controller="cluster-resource-binding-graceful-eviction-controller",name="cluster-resource-binding-graceful-eviction-controller"} 0
workqueue_depth{controller="cluster-resource-binding-status-controller",name="cluster-resource-binding-status-controller"} 0
workqueue_depth{controller="cluster-status-controller",name="cluster-status-controller"} 0
workqueue_depth{controller="clusterPropagationPolicy reconciler",name="clusterPropagationPolicy reconciler"} 0
workqueue_depth{controller="cronfederatedhpa-controller",name="cronfederatedhpa-controller"} 0
workqueue_depth{controller="dependencies resource detector",name="dependencies resource detector"} 993
workqueue_depth{controller="dependencies-distributor",name="dependencies-distributor"} 941
workqueue_depth{controller="endpointslice-controller",name="endpointslice-controller"} 0
workqueue_depth{controller="execution-controller",name="execution-controller"} 4005
workqueue_depth{controller="federated-resource-quota-status-controller",name="federated-resource-quota-status-controller"} 0
workqueue_depth{controller="federated-resource-quota-sync-controller",name="federated-resource-quota-sync-controller"} 0
workqueue_depth{controller="federatedHPA-controller",name="federatedHPA-controller"} 0
workqueue_depth{controller="namespace-sync-controller",name="namespace-sync-controller"} 0
workqueue_depth{controller="propagationPolicy reconciler",name="propagationPolicy reconciler"} 778
workqueue_depth{controller="remedy-controller",name="remedy-controller"} 0
workqueue_depth{controller="resource detector",name="resource detector"} 1019
workqueue_depth{controller="resource-binding-application-failover-controller",name="resource-binding-application-failover-controller"} 0
workqueue_depth{controller="resource-binding-graceful-eviction-controller",name="resource-binding-graceful-eviction-controller"} 0
workqueue_depth{controller="resource-binding-status-controller",name="resource-binding-status-controller"} 0
workqueue_depth{controller="service-export",name="service-export"} 0
workqueue_depth{controller="service-export-controller",name="service-export-controller"} 0
workqueue_depth{controller="service-import-controller",name="service-import-controller"} 0
workqueue_depth{controller="taint-manager",name="taint-manager"} 0
workqueue_depth{controller="unified-auth-controller",name="unified-auth-controller"} 0
workqueue_depth{controller="work-status",name="work-status"} 1992
workqueue_depth{controller="work-status-controller",name="work-status-controller"} 4002
workqueue_depth{controller="workload-rebalancer",name="workload-rebalancer"} 0

@RainbowMango RainbowMango added this to the v1.13 milestone Dec 26, 2024
@RainbowMango
Copy link
Member

Thanks @CharlesQQ for the feedback.

@lxtywypc
Copy link
Contributor

lxtywypc commented Dec 26, 2024

I have finished test in our environment, it works well.

LGTM.

Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!
Thanks for the confirmation.

Now, let's fix this on master, and then this will be backported to all affected releases.

/approve

@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 26, 2024
@karmada-bot karmada-bot merged commit e8a6ab3 into karmada-io:master Dec 26, 2024
21 checks passed
karmada-bot added a commit that referenced this pull request Dec 26, 2024
…k-of-#5972-upstream-release-1.12

Automated cherry pick of #5972: fix the issue of missing workqueue metrics in the Karmada controller
karmada-bot added a commit that referenced this pull request Dec 26, 2024
…k-of-#5972-upstream-release-1.11

Automated cherry pick of #5972: fix the issue of missing workqueue metrics in the Karmada controller
karmada-bot added a commit that referenced this pull request Dec 26, 2024
…k-of-#5972-upstream-release-1.10

Automated cherry pick of #5972: fix the issue of missing workqueue metrics in the Karmada controller
@XiShanYongYe-Chang XiShanYongYe-Chang deleted the remove-apiserver-dependency-in-resourceinterpreter branch December 27, 2024 01:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

karmada controller can't find metrics workqueue_depth
6 participants