-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(repo-server): excess git requests, add shared cache lock on revisions (Issue #14725) #17109
Conversation
18eb32c
to
821ce1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nromriell Thanks for following up with this improvement PRs.
Took an initial quick view on the PR. Left some minor comments. Will test this PR and share feedback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall PR looks great. @nromriell just minor comments .
@leoluz @crenshaw-dev Could you provide an additional set of eyes to review in case I overlooked anything?
821ce1e
to
fd0fa44
Compare
8ee25e3
to
cdddedd
Compare
With the refactor I need to re-run the tests with the changes live, I'll try and take a look at that tonight |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nromriell left some minor nits.
@agaudreault do you have any other concerns on the PR?
No new comments, LGTM :) Good work and good refactor 👍 |
Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
Signed-off-by: nromriell <nateromriell@gmail.com>
cdddedd
to
d8c2313
Compare
Thanks @ishitasequeira @agaudreault for all your time on these great reviews! Fixed that PR feedback @ishitasequeira and I was able to finally test this out and it's looking good to me after fixing one more issue I found where the I had to swap the rollup period to actually show the difference between 2.10.4 and this change since it includes the previous work, but here's the metrics between 2.10.4 and this PR. The swap is at about the 10:37 mark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nromriell for addressing the comments. I missed this one last check earlier
reposerver/cache/cache.go
Outdated
case err != nil && err != ErrCacheMiss: | ||
return "", err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to log the err if it's not ErrCacheMiss
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As is this will always log in util/git/client.go#getRefs
for debug but this is treated as a critical failure so definitely seems like a good idea to have here to make sure it gets logged. Added in an error log for it
Signed-off-by: nromriell <nateromriell@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @nromriell, LGTM!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Good job 💪
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Good job 💪
…ions (Issue argoproj#14725) (argoproj#17109) * fix(repo-server): excess git requests, cache lock on revisions Signed-off-by: nromriell <nateromriell@gmail.com> * fix: pr feedback, simplify, add configurable variable Signed-off-by: nromriell <nateromriell@gmail.com> * fix: codegen, lint Signed-off-by: nromriell <nateromriell@gmail.com> * fix: test print, no opts set, var type nit Signed-off-by: nromriell <nateromriell@gmail.com> * chore: add additional logging for unexpected cache error Signed-off-by: nromriell <nateromriell@gmail.com> --------- Signed-off-by: nromriell <nateromriell@gmail.com> Co-authored-by: Ishita Sequeira <46771830+ishitasequeira@users.noreply.github.com> Signed-off-by: Kevin Lyda <kevin@lyda.ie>
…ions (Issue argoproj#14725) (argoproj#17109) * fix(repo-server): excess git requests, cache lock on revisions Signed-off-by: nromriell <nateromriell@gmail.com> * fix: pr feedback, simplify, add configurable variable Signed-off-by: nromriell <nateromriell@gmail.com> * fix: codegen, lint Signed-off-by: nromriell <nateromriell@gmail.com> * fix: test print, no opts set, var type nit Signed-off-by: nromriell <nateromriell@gmail.com> * chore: add additional logging for unexpected cache error Signed-off-by: nromriell <nateromriell@gmail.com> --------- Signed-off-by: nromriell <nateromriell@gmail.com> Co-authored-by: Ishita Sequeira <46771830+ishitasequeira@users.noreply.github.com>
…ions (Issue argoproj#14725) (argoproj#17109) * fix(repo-server): excess git requests, cache lock on revisions Signed-off-by: nromriell <nateromriell@gmail.com> * fix: pr feedback, simplify, add configurable variable Signed-off-by: nromriell <nateromriell@gmail.com> * fix: codegen, lint Signed-off-by: nromriell <nateromriell@gmail.com> * fix: test print, no opts set, var type nit Signed-off-by: nromriell <nateromriell@gmail.com> * chore: add additional logging for unexpected cache error Signed-off-by: nromriell <nateromriell@gmail.com> --------- Signed-off-by: nromriell <nateromriell@gmail.com> Co-authored-by: Ishita Sequeira <46771830+ishitasequeira@users.noreply.github.com>
Additional Fixes #14725
Third part of a split of the #16309 and follow-up to #16410
Another piece of the fixes for #14725. This resolves the excess ls-remote requests that still occur in high application count environments due to the fact that all active processes get their cache invalidated at the same time. When this happens they all try to re-fetch the remotes at the same time and all miss the cache.
The solution is to add a soft cache level lock on the key, the first process to try and access that key on a miss will obtain the lock and attempt to refresh the cache.
This changed required adding the ability to use "set only if not exists" on the util/cache (redis NX or memcached add) interface. Without this option all of the processes could still end up in a position where they think they own the lock, so there are some minor changes to the util/cache interface:
DisableOverwrite
option to represent the NX/add during set optionItem
struct now also contains options for the cache set actions. This allows passingDelete
andDisableOverwrite
in a shared options value and replaces the position of the currentdelete
option in the SetItem callThis is the final major logic portion of the changes to remove all the excess git calls, but we can get some good performance improvements by also adding the two layer cache to the caching layer for revisions. So I'll be following up with another PR to add that. Most of the abstraction to prepare for that was handled as part of this PR.
Baseline without changes with 200 multi-source applications with a ref only source:
data:image/s3,"s3://crabby-images/afab4/afab4c62223051528aa1ab863927239449eeece0" alt="Screenshot from 2023-11-18 18-24-30"
With all the changes up to and including this PR under the same conditions:
data:image/s3,"s3://crabby-images/65041/650414b95fcd39ef3a32b7d86e79cc2e34e29361" alt="Screenshot from 2023-11-21 03-19-14"
Checklist: