-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sync operation getting stuck with high number of resources in Application #14224
Comments
There were some deadlock fixes that came in |
running argocd + rollouts + image updater, all on latest release, doing blue/green with pre- and post-analysis. routinely see similar behavior in a large but significantly smaller app (few hundred resource). providing context in case it helps since this is the closest bug report i've found to what we see:
we are still root causing, but the main log we see during this "stuck' state is "another operation is in progress" which brought me here. |
updating to 2.6.8 didn't help |
@deadlysyn I think your issue may be different from @bartoszbryk. Just a theory, but I have recent experience which makes me think this. @bartoszbryk I think what's happening is that, when Argo CD is trying to sync your ~4000 resources, it's trying to patch You will not see corresponding Argo CD error logs because, currently, we don't log errors encountered when updating the operation state. We retry indefinitely. I've put up a PR to fix this. Incidentally, the retry spams the k8s API with requests that are doomed to fail. Intuit's k8s team was the first to notice the issue, due to an elevated number or error responses. We bought ourselves a little time by setting But we quickly hit the limit again as the number of resources increased. We ended up splitting the app into two apps to get back under the limit. But it's just a band-aid. I've opened an issue to brainstorm ways to get Argo CD to gracefully handle large apps. I've scheduled time at the next SIG Scalability meeting to discuss as well. Please let me know if this theory matches what you're seeing. I'd love to help work out a solution. |
@crenshaw-dev Were there any findings/actions from the SIG meeting? |
@PavelPikat only that the idea of compressing the status seems like a reasonable way to counteract the problem. |
Any updates or solutions to this issue? We are facing issues too |
Checklist:
argocd version
.Describe the bug
ArgoCD doesn't finish syncing the application and the sync seems to be getting stuck with higher number of resources (around 4000) in Application. The sync also cannot be terminated in this state. Only deleting the application-controller pod helps. However all the resources in the application appear to be synced and the log doesn't indicate any reason for sync being stuck.
To Reproduce
Create an application with high number of resources (in our case 4000 Kafka topics and users) and try to sync it automatically
Expected behavior
The sync finishes successfully
Screenshots


Version
Logs
The text was updated successfully, but these errors were encountered: