Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Replace status.observedAt with redis pub/sub channels for resource tree updates (#1340) #4208

Merged
merged 1 commit into from
Aug 31, 2020

Conversation

alexmt
Copy link
Collaborator

@alexmt alexmt commented Aug 31, 2020

Closes #1340

More information about why this PR is required now:

The Application CRD status has two fields observedAt and reconciledAt.

  • the reconciledAt is updated when application resources are compared with Git and sync/health status got refreshed.
  • the observedAt is updated every time when reconciledAt is updated and when any "child" application resource (e.g. Pod, ReplicaSet etc) has changed.

The observedAt is changed very frequently and change is required just to notify UI that it should reload resources tree. If Argo CD manages thousands of application then observedAt field updates cause thousands of K8S PATCH requests every minute. This slowdown K8S API server and affects all controllers installed in the cluster.

This PR deprecates observedAt - controller no longer updates it. The UI is using /v1/stream/applications/<app-name>/resource-tree stream API which is powered by redis pub-sub.

@alexmt alexmt requested review from jannfis and jessesuen August 31, 2020 01:58
@alexmt
Copy link
Collaborator Author

alexmt commented Aug 31, 2020

PR should resolve or at least improve #3924 and #3556 . Argo CD should make significantly less K8S requests and reduce etcd load.

@alexmt alexmt force-pushed the 1340-redis-pub-sub branch from 0364358 to 593ae01 Compare August 31, 2020 02:10
@alexmt alexmt force-pushed the 1340-redis-pub-sub branch from 593ae01 to 628da9d Compare August 31, 2020 02:23
@codecov-commenter
Copy link

Codecov Report

Merging #4208 into master will decrease coverage by 0.13%.
The diff coverage is 34.61%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4208      +/-   ##
==========================================
- Coverage   42.98%   42.84%   -0.14%     
==========================================
  Files         124      124              
  Lines       18332    18377      +45     
==========================================
- Hits         7880     7874       -6     
- Misses       9438     9492      +54     
+ Partials     1014     1011       -3     
Impacted Files Coverage Δ
pkg/apis/application/v1alpha1/types.go 59.01% <ø> (ø)
server/application/application.go 29.25% <0.00%> (-0.23%) ⬇️
server/cache/cache.go 61.70% <0.00%> (-2.75%) ⬇️
util/cache/cache.go 69.09% <0.00%> (-5.42%) ⬇️
util/cache/inmemory.go 68.00% <0.00%> (-12.96%) ⬇️
util/cache/redis.go 56.00% <7.14%> (-19.00%) ⬇️
util/cache/appstate/cache.go 68.29% <33.33%> (-6.71%) ⬇️
controller/appcontroller.go 49.14% <100.00%> (-1.98%) ⬇️
server/server.go 57.45% <100.00%> (+0.94%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d7a70bf...628da9d. Read the comment docs.

Copy link
Member

@jannfis jannfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, quite awesome fix!

@darshanime
Copy link
Member

The cache is throwaway in the current implementation, we can delete it and nothing will break. Will that change by this patch?

@alexmt
Copy link
Collaborator Author

alexmt commented Aug 31, 2020

@darshanime , it is still safe to lose cache. Redis recreates pub-sub channel on the fly. Also redis-go automatically re-connect to channel, so UI won't notice if redis restart.

@alexmt alexmt merged commit fca0f69 into argoproj:master Aug 31, 2020
@alexmt alexmt deleted the 1340-redis-pub-sub branch August 31, 2020 17:18
alexmt pushed a commit that referenced this pull request Aug 31, 2020
@alexmt
Copy link
Collaborator Author

alexmt commented Aug 31, 2020

The change significantly improved the performance of the busy Argo CD. Number of K8S requests reduced by 5x ~ 7x times:

image (3)

Avg reconciliation duration changed from 1s~2s to < 0.25s:

image (4)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace status.observedAt with redis pub/sub channels for resource tree updates
4 participants