-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: avoid resources lock contention (#8172) #20329
fix: avoid resources lock contention (#8172) #20329
Conversation
❌ Preview Environment deleted from BunnyshellAvailable commands (reply to this comment):
|
83c216a
to
bb09323
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Integration tests are failing. Can you fix the CI Failures?
bb09323
to
07c2774
Compare
Sure, I’ll examine the logs to determine the root cause of the failing integration tests. |
@mpelekh lmk if you need any help running e2e tests locally to speed up iteration. |
25d04ff
to
e9f59e9
Compare
From discussion at SIG Scalability:
|
584c9af
to
5b6ec16
Compare
@crenshaw-dev Here are the test results when the ticker was increased to 10s: Test results when the ticker was decreased to 0.1s: |
5b6ec16
to
f0781f5
Compare
9f7b680
to
3302385
Compare
This is what I see in the code:
root@017f9686563c:/go/src/github.com/argoproj/argo-cd# dist/argocd app get argocd-e2e-external/test-namespaced-config-map --plaintext --server localhost:8080 --auth-token TOKEN --insecure
Name: argocd-e2e-external/test-namespaced-config-map
Project: default
Server: https://kubernetes.default.svc/
Namespace: argocd-e2e--test-namespaced-config-map-dwnet
URL: http://localhost:8080/applications/test-namespaced-config-map
Source:
- Repo: file:///tmp/argo-e2e/testdata.git
Target:
Path: config-map
SyncWindow: Sync Allowed
Sync Policy: Manual
Sync Status: OutOfSync from (bf5ea9e)
Health Status: Healthy
GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
ConfigMap argocd-e2e--test-namespaced-config-map-dwnet my-map OutOfSync Missing
root@017f9686563c:/go/src/github.com/argoproj/argo-cd# dist/argocd app sync argocd-e2e-external/test-namespaced-config-map --prune --plaintext --server localhost:8080 --auth-token TOKEN --insecure
TIMESTAMP GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
2024-10-30T21:55:03+00:00 ConfigMap argocd-e2e--test-namespaced-config-map-dwnet my-map OutOfSync Missing
2024-10-30T21:55:03+00:00 ConfigMap argocd-e2e--test-namespaced-config-map-dwnet my-map OutOfSync Missing configmap/my-map created
Name: argocd-e2e-external/test-namespaced-config-map
Project: default
Server: https://kubernetes.default.svc/
Namespace: argocd-e2e--test-namespaced-config-map-dwnet
URL: http://localhost:8080/applications/argocd-e2e-external/test-namespaced-config-map
Source:
- Repo: file:///tmp/argo-e2e/testdata.git
Target:
Path: config-map
SyncWindow: Sync Allowed
Sync Policy: Manual
Sync Status: OutOfSync from (bf5ea9e)
Health Status: Healthy
Operation: Sync
Sync Revision: bf5ea9e6c4184bb3ee436b957f54c1bc8adb3ce3
Phase: Succeeded
Start: 2024-10-30 21:55:03 +0000 UTC
Finished: 2024-10-30 21:55:03 +0000 UTC
Duration: 0s
Message: successfully synced (all tasks run)
GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
ConfigMap argocd-e2e--test-namespaced-config-map-dwnet my-map OutOfSync Missing configmap/my-map created
root@017f9686563c:/go/src/github.com/argoproj/argo-cd# dist/argocd app get argocd-e2e-external/test-namespaced-config-map --plaintext --server localhost:8080 --auth-token TOKEN --insecure
Name: argocd-e2e-external/test-namespaced-config-map
Project: default
Server: https://kubernetes.default.svc/
Namespace: argocd-e2e--test-namespaced-config-map-dwnet
URL: http://localhost:8080/applications/test-namespaced-config-map
Source:
- Repo: file:///tmp/argo-e2e/testdata.git
Target:
Path: config-map
SyncWindow: Sync Allowed
Sync Policy: Manual
Sync Status: Synced to (bf5ea9e)
Health Status: Healthy
GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
ConfigMap argocd-e2e--test-namespaced-config-map-dwnet my-map Synced configmap/my-map created Resources get synced once the related events are received in This callback requests the app refresh, which in turn puts item in the queue:
Then, this item is received from the queue in a loop: And processed: And only after the Sync status gets Synced, as we can see from the output. So, it means that the sync and operation status are not atomically updated. They just very quickly updated. |
d017226
to
6c52c9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, though I'd convert this to draft until the Gitops engine reference is updated back.
controller/metrics/metrics.go
Outdated
) | ||
|
||
resourceEventsNumberGauge = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Name: "argocd_resource_events_number", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name can be a bit misleading, I'd use something like argocd_resource_events_processed_in_batch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completely agreed. Updated.
1dbf738
to
8555d33
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #20329 +/- ##
==========================================
+ Coverage 53.80% 55.23% +1.42%
==========================================
Files 324 324
Lines 55603 55676 +73
==========================================
+ Hits 29918 30753 +835
+ Misses 23082 22294 -788
- Partials 2603 2629 +26 ☔ View full report in Codecov by Sentry. |
8555d33
to
b9bfe67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve with expectation that gitops engine override would be removed.
74885c1
to
c08615b
Compare
@@ -114,6 +120,7 @@ func init() { | |||
clusterCacheListSemaphoreSize = env.ParseInt64FromEnv(EnvClusterCacheListSemaphore, clusterCacheListSemaphoreSize, 0, math.MaxInt64) | |||
clusterCacheAttemptLimit = int32(env.ParseNumFromEnv(EnvClusterCacheAttemptLimit, int(clusterCacheAttemptLimit), 1, math.MaxInt32)) | |||
clusterCacheRetryUseBackoff = env.ParseBoolFromEnv(EnvClusterCacheRetryUseBackoff, false) | |||
clusterCacheBatchEventsProcessing = env.ParseBoolFromEnv(EnvClusterCacheBatchEventsProcessing, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two broader questions:
- Should we expose this as part of the
argocd-cmd-params
config map? - Should we default it to
true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review @rumstead
- Should we expose this as part of the argocd-cmd-params config map?
ARGOCD_CLUSTER_CACHE_BATCH_EVENTS_PROCESSING
option was exposed as env variable the same way as all other clustercache
(part of gitops-engine) options are exposed.
- Should we default it to true?
Currently, the default value should be false
; that doesn't change the event's processing mode, so no one can be affected by this change. To change the event's processing mode, you must explicitly set the ARGOCD_CLUSTER_CACHE_BATCH_EVENTS_PROCESSING
env variable to true
.
c08615b
to
b71bf4d
Compare
m.resourceEventsProcessingHistogram.WithLabelValues(server).Observe(duration.Seconds()) | ||
m.resourceEventsNumberGauge.WithLabelValues(server).Set(float64(processedEventsNumber)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add docs for these new metrics: https://argo-cd.readthedocs.io/en/stable/operator-manual/metrics/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I should have done it before. Done.
Signed-off-by: Mykola Pelekh <mpelekh@demonware.net>
Signed-off-by: Mykola Pelekh <mpelekh@demonware.net>
Signed-off-by: Mykola Pelekh <mpelekh@demonware.net>
Signed-off-by: Mykola Pelekh <mpelekh@demonware.net>
b71bf4d
to
8678e12
Compare
Signed-off-by: Michael Crenshaw <350466+crenshaw-dev@users.noreply.github.com>
Closes #8172
Improve reconciliation performance for large clusters by avoiding resource lock contention - argoproj/gitops-engine#629.
Checklist: