Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Use more optimal iterate hierarchy v2 (#18929) #18972

Conversation

andrii-korotkov-verkada
Copy link
Contributor

@andrii-korotkov-verkada andrii-korotkov-verkada commented Jul 7, 2024

Closes #18929
Helps with #18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this does not need to be in the release notes.
  • The title of the PR states what changed and the related issues number (used for the release note).
  • The title of the PR conforms to the Toolchain Guide
  • I've included "Closes [ISSUE #]" or "Fixes [ISSUE #]" in the description to automatically close the associated issue.
  • I've updated both the CLI and UI to expose my feature, or I plan to submit a second PR with them.
  • Does this PR require documentation updates?
  • I've updated documentation as required by this PR.
  • I have signed off all my commits as required by DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My build is green (troubleshooting builds).
  • My new feature complies with the feature status guidelines.
  • I have added a brief description of why this PR is necessary and/or what this PR solves.
  • Optional. My organization is added to USERS.md.
  • Optional. For bug fixes, I've indicated what older releases this fix should be cherry-picked into (this may or may not happen depending on risk/complexity).

@andrii-korotkov-verkada andrii-korotkov-verkada requested a review from a team as a code owner July 7, 2024 20:46
Copy link

bunnyshell bot commented Jul 7, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

Copy link

bunnyshell bot commented Jul 7, 2024

❌ Preview Environment deleted from Bunnyshell

Available commands (reply to this comment):

  • 🚀 /bns:deploy to deploy the environment

Copy link

codecov bot commented Jul 7, 2024

Codecov Report

Attention: Patch coverage is 32.50000% with 27 lines in your changes missing coverage. Please review.

Project coverage is 50.65%. Comparing base (ea725a9) to head (2d1db69).
Report is 545 commits behind head on master.

Files with missing lines Patch % Lines
controller/appcontroller.go 40.62% 14 Missing and 5 partials ⚠️
controller/cache/cache.go 0.00% 8 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18972      +/-   ##
==========================================
- Coverage   50.66%   50.65%   -0.01%     
==========================================
  Files         315      315              
  Lines       43347    43359      +12     
==========================================
+ Hits        21961    21964       +3     
- Misses      18880    18887       +7     
- Partials     2506     2508       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@agaudreault agaudreault left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For performance improvements, can you provide the pprof CPU graph or something similar of before and after the changes to compare the CPU time used by IterateHierarchy compared to IterateHierarchyV2 as well as the memory tradeoff.

I will look a bit more in depth at https://github.dev/argoproj/gitops-engine/pull/601 when I have more time.

go.mod Outdated Show resolved Hide resolved
@andrii-korotkov-verkada
Copy link
Contributor Author

For performance improvements, can you provide the pprof CPU graph or something similar of before and after the changes to compare the CPU time used by IterateHierarchy compared to IterateHierarchyV2 as well as the memory tradeoff.

I will look a bit more in depth at https://github.dev/argoproj/gitops-engine/pull/601 when I have more time.

I was looking at timing information from the new logs "Finished setting resource tree" for the same app. In our staging env, it was consistently taking 3-4 min on processing managed resources for one of the apps when calling IterateHierarchy and <1 sec when using IterateHierarchyV2 (issue #18929 has some screenshots). The same app was taking 30-60 min to refresh in production environment with more namespace resources, now it take ~1 min (mostly compare app state).

@andrii-korotkov-verkada
Copy link
Contributor Author

andrii-korotkov-verkada commented Jul 9, 2024

Here are some perf views of the system collected following #13534 (comment).

The build is from master on 2024/07/07 including #18972, #18694, argoproj/gitops-engine#601, argoproj/gitops-engine#603.

Screenshot 2024-07-09 at 3 47 18 PM
Screenshot 2024-07-09 at 3 48 02 PM
Screenshot 2024-07-09 at 3 48 39 PM
Screenshot 2024-07-09 at 4 21 08 PM
Screenshot 2024-07-09 at 9 35 32 PM
Screenshot 2024-07-09 at 9 35 20 PM

@crenshaw-dev
Copy link
Member

crenshaw-dev commented Jul 16, 2024

Some local testing indicates that the memory cost could be substantial (like 10x the current cost). Do you have any live-environment before/after stats?

Here's the result of a benchmark I just cooked up. It's pretty contrived, but a big enough cost to be concerning.

BenchmarkIterateHierarchyV2-16
7         145114196 ns/op        71001336 B/op     375845 allocs/op
BenchmarkIterateHierarchy-16
1        111452504750 ns/op       6927424 B/op      22221 allocs/op

@andrii-korotkov-verkada
Copy link
Contributor Author

In live environment the memory usage actually seems lower after my changes. Below are examples for our main staging and production environments.
Screenshot 2024-07-16 at 4 41 06 PM
Screenshot 2024-07-16 at 4 41 20 PM

@crenshaw-dev
Copy link
Member

Nice, I wonder if it's because we simply spend less time in that code, so there are fewer concurrent iterations using up memory. I'll try to get some data from Intuit as well.

@andrii-korotkov-verkada
Copy link
Contributor Author

Nice, I wonder if it's because we simply spend less time in that code, so there are fewer concurrent iterations using up memory. I'll try to get some data from Intuit as well.

Note that I tested with a locking improvement included as well, though that would have more impact on doing more updates and keeping apps fresher essentially.

@crenshaw-dev
Copy link
Member

Ah sure. No worries, I'll try to isolate the iterate change when I test internally. Not sure whether I'll get to the lock improvements this week.

@gazidizdaroglu
Copy link
Contributor

Hey, I was facing a performance issue with the resource tree. Thanks to Michael, we ran a pprof against the application controller and discovered a bottleneck in the iterateChildren function. He recommended trying out this new implementation.

After integrating your branch, the issue has been resolved 🎉.

In terms of resource consumption, memory usage remains about the same, but CPU usage has nearly doubled.

@andrii-korotkov-verkada
Copy link
Contributor Author

@gazidizdaroglu, I've noticed cpu drop in staging environment, but increase in a prod environment. Since this PR integrates locking changes in gitops-changes as well, I've observed just more updates being done in prod and things being much fresher - longest running processor has dropped from 30-60min to 1-2min. I wonder if you experience the same thing. If yes, I think cpu increase is worth it given much less staleness. Let me know if you integrated only iterate hierarchy changes and specifically avoided lock changes.

Screenshot 2024-07-19 at 3 32 29 AM
Screenshot 2024-07-19 at 3 32 34 AM

@andrii-korotkov-verkada
Copy link
Contributor Author

Also, with @crenshaw-dev optimizations to iterate hierarchy v2 on top of my changes we may get even better performance.

@andrii-korotkov-verkada
Copy link
Contributor Author

I've added a benchmark to argoproj/gitops-engine#603 with some goroutines doing processing of updates. Seems like a new locking doesn't benefit us and even has some regression when using less updates or having cheaper updates. I wonder if the wins I saw were from having locking acquired for less time while iterating hierarchy more efficiently.

New BenchmarkProcessEvents-10    	       1	3399116791 ns/op	128337808 B/op	 1931210 allocs/op
Old BenchmarkProcessEvents-10    	       1	3241700500 ns/op	128167704 B/op	 1930942 allocs/op

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
@andrii-korotkov-verkada andrii-korotkov-verkada force-pushed the 18929-use-more-optimal-iterate-hierarchy-and-better-locking branch from 00e4428 to 2d1db69 Compare July 19, 2024 13:08
@andrii-korotkov-verkada
Copy link
Contributor Author

I've updated this PR to use gitops-engine master and skip the locking change. I'll try it out, probably with #18694.

@andrii-korotkov-verkada andrii-korotkov-verkada changed the title chore: Use more optimal iterate hierarchy v2 and better locking (#18929) chore: Use more optimal iterate hierarchy v2 (#18929) Jul 19, 2024
@andrii-korotkov-verkada
Copy link
Contributor Author

@crenshaw-dev, I've updated the PR to only reference our gitops-engine changes in master and to use iterate hierarchy v2. Since the changes were already tested, should be a short review, when you have a bit of time. The license check failure is plaguing many PRs and is probably unrelated.

@crenshaw-dev crenshaw-dev merged commit c59cb52 into argoproj:master Jul 19, 2024
30 of 31 checks passed
ggjulio pushed a commit to ggjulio/argo-cd that referenced this pull request Jul 21, 2024
…oj#18972)

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
vfaergestad pushed a commit to vfaergestad/argo-cd that referenced this pull request Jul 22, 2024
…oj#18972)

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Signed-off-by: Vegard Færgestad <vegardf@mnemonic.no>
mpelekh added a commit to mpelekh/argo-cd that referenced this pull request Aug 1, 2024
…oj#18972)

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
mpelekh added a commit to mpelekh/argo-cd that referenced this pull request Sep 10, 2024
…oj#18972)

Closes argoproj#18929
Helps with argoproj#18500
Use iterate hierarchy v2 to have a roughly linear performance for getting the resource tree instead of up to quadratic.

Signed-off-by: Andrii Korotkov <andrii.korotkov@verkada.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize processing managed resources in getResourceTree, ideally from quadratic to linear
4 participants