-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to sync cluster #1006
Comments
@adieu does the other cluster have any api resource, which extends k8s using api server aggregation (instead of CRDs), and the deployment which is supposed to handle that api resource is down? An example of a resource with this is Service Catalog. We have seen issues where if the deployment which backs service catalog is down, it causes problem for Argo CD. This may be similar for |
This was our previous issue with ServiceCatalog: #650 One way to test is to run |
Also it will help to enable kubernetes related logs in the |
@jessesuen Thank you for your reply. I tested |
Ok it may be another one. Can you share:
Running |
I can confirm there are requests got timeout but which resource caused the problem is unknown. I'll try crd resources one by one.
EDITED: I can confirm it's the apiserver aggregation caused the problem. The backend service is not down but it got timeout when we have lots of snapshots. Maybe need to filter it out.
|
This fix may help. We also want to add a way to exclude resources kinds from our watch. |
@adieu you may want to try |
Sadly the latest image does not help. I guess it's because the backend server is running alive and responding to discovery request correctly. The |
Ok. Then we need to implement resource exclusion (issue #1010). |
No. I don't need snapshot resource. I manage to work around the problem by patch the code to exclude the snapshots resource. |
Resource exclusion feature was implemented |
I just upgraded our argocd instance to 0.11.0 today. The overall experience was smooth but had one issue which did not exist in 0.10.5.
Argocd is running in one of our k8s clusters and it's trying to manage applications in other k8s clusters.
One cluster failed to sync and here is the log:
It looks like the sync process got timeout after around 1 minute which is possible because those two clusters are not in the same location.
We didn't have this issue before the latest release and I was trying to identify the timeout request but failed.
I saw this log on the apiserver of the failed cluster:
Not sure if it's related.
The text was updated successfully, but these errors were encountered: