Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(detector): resource detector matched policy potimization #5802

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

CharlesQQ
Copy link
Member

@CharlesQQ CharlesQQ commented Nov 11, 2024

What type of PR is this?
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:
part of #5790

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


@karmada-bot karmada-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Nov 11, 2024
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rainbowmango for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 11, 2024
@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch 2 times, most recently from 7527032 to cfc8912 Compare November 11, 2024 08:00
@codecov-commenter
Copy link

codecov-commenter commented Nov 11, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 28.15534% with 74 lines in your changes missing coverage. Please review.

Project coverage is 48.05%. Comparing base (8d43fe2) to head (23dc788).
Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
pkg/detector/detector.go 22.61% 63 Missing and 2 partials ⚠️
pkg/detector/policy.go 0.00% 4 Missing ⚠️
pkg/detector/preemption.go 71.42% 4 Missing ⚠️
cmd/controller-manager/app/controllermanager.go 0.00% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5802      +/-   ##
==========================================
+ Coverage   47.97%   48.05%   +0.07%     
==========================================
  Files         674      674              
  Lines       55841    55823      -18     
==========================================
+ Hits        26789    26825      +36     
+ Misses      27305    27254      -51     
+ Partials     1747     1744       -3     
Flag Coverage Δ
unittests 48.05% <28.15%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@chaosi-zju
Copy link
Member

does d.Client and d.InformerManager all have cache, and different cache?

Is it possible that there is a policy in the d.InformerManager cache, but it has not been synced to the d.Client's cache?

In that case, if the creation time of deployment and policy are very close, maybe d.client could not find the policy. (the failed e2e job 32793260336 maybe caused by it)

@chaosi-zju
Copy link
Member

the failed e2e job 32793260336 maybe caused by it

my guass maybe correct, as you see corresponding detector log

2024-11-11T08:19:26.656651192Z stderr F I1111 08:19:26.656575       1 detector.go:236] Reconciling object: apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc
2024-11-11T08:19:26.656739492Z stderr F I1111 08:19:26.656663       1 detector.go:919] PropagationPolicy(karmadatest-xnw86/deploy-4dpmc) has been added or updated.
2024-11-11T08:19:26.65699674Z stderr F I1111 08:19:26.656902       1 detector.go:361] Attempts to match policy for resource(apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc)
2024-11-11T08:19:26.657175843Z stderr F I1111 08:19:26.657084       1 compare.go:66] No propagationpolicy match for resource(apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc)
2024-11-11T08:19:26.657185351Z stderr F I1111 08:19:26.657101       1 detector.go:378] Attempts to match cluster policy for resource(apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc)
2024-11-11T08:19:26.65742698Z stderr F I1111 08:19:26.657118       1 detector.go:386] No clusterpropagationpolicy find.
2024-11-11T08:19:26.660140465Z stderr F I1111 08:19:26.660036       1 detector.go:1160] Matched 0 resources by policy(karmadatest-xnw86/deploy-4dpmc)
2024-11-11T08:19:26.660781832Z stderr F I1111 08:19:26.660194       1 detector.go:785] Add object(apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc) to waiting list, length of list is: 215
2024-11-11T08:19:26.667659376Z stderr F I1111 08:19:26.666134       1 detector.go:236] Reconciling object: apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc
2024-11-11T08:19:26.667677359Z stderr F I1111 08:19:26.666184       1 detector.go:361] Attempts to match policy for resource(apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc)
2024-11-11T08:19:26.667682419Z stderr F I1111 08:19:26.666302       1 compare.go:66] No propagationpolicy match for resource(apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc)
2024-11-11T08:19:26.667686015Z stderr F I1111 08:19:26.666311       1 detector.go:378] Attempts to match cluster policy for resource(apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc)
2024-11-11T08:19:26.667689341Z stderr F I1111 08:19:26.666326       1 detector.go:386] No clusterpropagationpolicy find.
2024-11-11T08:19:26.667692678Z stderr F I1111 08:19:26.666461       1 policy.go:98] No matched policy for object: apps/v1, kind=Deployment, karmadatest-xnw86/deploy-4dpmc

@CharlesQQ
Copy link
Member Author

In that case, if the creation time of deployment and policy are very close, maybe d.client could not find the policy.

Both obtain resource objects from the cache. I think If d.Client has this problem, d.InformerManager may also have it.

@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch 5 times, most recently from 5761d96 to a03b852 Compare November 12, 2024 07:19
@chaosi-zju
Copy link
Member

chaosi-zju commented Nov 12, 2024

The failed reason can be explained as follows:

image

@XiShanYongYe-Chang
Copy link
Member

You are advised to obtain the policy list from karmada-apiserver and check the performance consumption caused by karmada-apiserver to see if the benefits are greatly improved.

@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch 8 times, most recently from 8e1d81b to a3b488b Compare November 19, 2024 08:25
@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch from a3b488b to fe7afcd Compare November 22, 2024 10:48
@karmada-bot karmada-bot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 22, 2024
@karmada-bot karmada-bot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Nov 22, 2024
@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch 5 times, most recently from c345c81 to 0345eed Compare November 23, 2024 16:04
@XiShanYongYe-Chang
Copy link
Member

/retest

@zach593
Copy link
Contributor

zach593 commented Nov 30, 2024

Is the purpose of this PR to remove the dependency on unstructured?
Why do we need this generated typed informer manager instead of using the controller-runtime informer?

@CharlesQQ
Copy link
Member Author

Why do we need this generated typed informer manager instead of using the controller-runtime informer?

If controller-runtime is used, the logic of resource detector resources entering the workqueue also needs to use controller-runtime.

@zach593
Copy link
Contributor

zach593 commented Feb 23, 2025

If controller-runtime is used, the logic of resource detector resources entering the workqueue also needs to use controller-runtime.

This is not the case... I tried replacing the generic InformerManger with the controller-runtime cache and it passes all the tests and works perfectly. This did not require replacing the asyncWorker, just replacing the lister and informer was enough.

Here is the code: ctripcloud@e358fd1

@CharlesQQ
Copy link
Member Author

CharlesQQ commented Feb 24, 2025

I tried replacing the generic InformerManger with the controller-runtime cache and it passes all the tests and works perfectly.

you replaced the informer with controller-runtime, so you think use controller-runtime is better than genericInformer? What do you think?

@zach593
Copy link
Contributor

zach593 commented Feb 24, 2025

so you think use controller-runtime is better than genericInformer?

Yes, because controller-runtime informer is also a typed informer.

@CharlesQQ
Copy link
Member Author

CharlesQQ commented Feb 24, 2025

Yes, because controller-runtime informer is also a type informer.

What do you think? @XiShanYongYe-Chang

@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch from 0345eed to e4064c0 Compare March 8, 2025 14:32
@karmada-bot karmada-bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 8, 2025
@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch 4 times, most recently from 566a305 to d6d5308 Compare March 8, 2025 16:10
if err != nil {
klog.Errorf("Failed to add event handler for ClusterPropagationPolicy: %v", err)
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to put those two AddEventHandler() after the d.Processor.Run(d.ConcurrentResourceTemplateSyncs, d.stopCh), because AddEventHandler() will immediately start the workers, and in the end of the workers logic, they will call d.Processor.Add()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean d.Processor must run before d.ReconcilePropagationPolicy or d.ReconcileClusterPropagationPolicy trigger logic? If d.Processor has not been created, d.Processor.Add() might have problems?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, occasional panic on startup

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, tks

@CharlesQQ
Copy link
Member Author

/retest

@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch from d6d5308 to 962701d Compare March 10, 2025 03:31
Signed-off-by: chang.qiangqiang <chang.qiangqiang@immomo.com>
@CharlesQQ CharlesQQ force-pushed the resource-detector-optimization branch from 962701d to 23dc788 Compare March 10, 2025 03:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants