-
Notifications
You must be signed in to change notification settings - Fork 625
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source controller is unstable #2072
Comments
What's the reason for source-controller restarts? If it's OOM you should increase the memory limits. |
how can we check if it is failing due to OOM, i don't see it is showing OOM errors in logs. I do see pod memory utilization reaching around 98% memory utilization sometimes |
Please see https://sysdig.com/blog/troubleshoot-kubernetes-oom/ If the memory goes to 98% then you need to increase the limit, setting it to 2Gi should fix it. |
@ANKAMBLE my question might not be directly related to your issue. did you use |
we used flux bootstrap, the issue got resolved after increasing the memory |
@ANKAMBLE glad to know your problem got solved. |
source-controller memory usage is governed by the number of Helm charts it needs to build. The default limit of |
Is it possible to add VPA object for governing the source controller memory size limit? |
Describe the bug
We are seeing source controller pod is getting restarted frequently every hour and it take 15-30 mins to recover from below error
failed to download artifact, error: GET http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/ocp-fleet/2998cb88140643569d3264e00b51a0ac552d856f.tar.gz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/ocp-fleet/2998cb88140643569d3264e00b51a0ac552d856f.tar.gz": dial tcp 172.30.237.29:80: i/o timeout
Steps to reproduce
We have flux muti tenancy enabled on Opensshift cluster with multiple namespace. We created below for repositories for maintaining infrastructure and application release files and helm charts
Expected behavior
flux should be able to connect to four repositories defined and update the cluster/namespace accordingly on each attempt, randomly we are seeing failures with below errors
failed to download artifact, error: GET http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/ocp-fleet/2998cb88140643569d3264e00b51a0ac552d856f.tar.gz giving up after 10 attempt(s): Get "http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/ocp-fleet/2998cb88140643569d3264e00b51a0ac552d856f.tar.gz": dial tcp 172.30.237.29:80: i/o timeout
Screenshots and recordings
OS / Distro
RHCOS, Openshift 4.6
Flux version
v0.17.2
Flux check
Git provider
No response
Container Registry provider
No response
Additional context
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: