source-controller pod restarting (OOMKilled) #192

avacaru · 2020-11-02T15:04:01Z

I have noticed that the source-controller pod of my gotk deployment restarting a huge number of times over the weekend (148 times -- version 0.1.1). I've re-deployed a newer version (0.2.1) but the restarts keep happening (about 2 every half hour).

$> k describe po -n gotk-system source-controller-5cc54c757c-ccwz8
Name:         source-controller-5cc54c757c-ccwz8
Namespace:    gotk-system
Priority:     0
Node:         my-node/10.0.10.11
Start Time:   Mon, 02 Nov 2020 13:57:18 +0000
Labels:       app=source-controller
              pod-template-hash=5cc54c757c
Annotations:  prometheus.io/port: 8080
              prometheus.io/scrape: true
Status:       Running
IP:           10.0.10.12
IPs:
  IP:           10.0.10.12
Controlled By:  ReplicaSet/source-controller-5cc54c757c
Containers:
  manager:
    Container ID:  docker://6b4a1a89311360cb832fe1d540b4f4cb96c9b8a6591fb01349390ffcdfc99b90
    Image:         my-registry.com/fluxcd/source-controller:v0.2.1
    Image ID:      docker-pullable://my-registry.com/fluxcd/source-controller@sha256:e8b708159f6d651a9577695af14bf3291ef844ca5cd7e85f182416b76561d27c
    Ports:         9090/TCP, 8080/TCP
    Host Ports:    0/TCP, 0/TCP
    Args:
      --events-addr=
      --watch-all-namespaces=true
      --log-level=info
      --log-json
      --enable-leader-election
      --storage-path=/data
    State:          Running
      Started:      Mon, 02 Nov 2020 14:34:16 +0000
    Last State:     Terminated
      Reason:       OOMKilled
      Exit Code:    137
      Started:      Mon, 02 Nov 2020 14:13:59 +0000
      Finished:     Mon, 02 Nov 2020 14:34:15 +0000
    Ready:          True
    Restart Count:  2
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      50m
      memory:   64Mi
    Liveness:   http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      RUNTIME_NAMESPACE:  gotk-system (v1:metadata.namespace)
      HTTPS_PROXY:        http://http.my-proxy.com:8000
      NO_PROXY:           10.0.0.0/8,172.0.0.0/8
    Mounts:
      /data from data (rw)
      /tmp from tmp (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-bfr47 (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  default-token-bfr47:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-bfr47
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/arch=amd64
                 kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age                 From                                       Message
  ----    ------     ----                ----                                       -------
  Normal  Scheduled  <unknown>           default-scheduler                          Successfully assigned gotk-system/source-controller-5cc54c757c-ccwz8 to my-node
  Normal  Pulled     4m6s (x3 over 41m)  kubelet, my-node  Container image "my-registry.com/fluxcd/source-controller:v0.2.1" already present on machine
  Normal  Created    4m6s (x3 over 41m)  kubelet, my-node  Created container manager
  Normal  Started    4m6s (x3 over 41m)  kubelet, my-node  Started container manager

This causes the helm-controller to not be able to reconcile HelmReleases:

$> k get hr --all-namespaces
NAMESPACE                NAME                                       READY   STATUS                                                                                                                                                                                                                                      AGE
namespace1           chart1        False   Get "http://source-controller.gotk-system/helmchart/namespace1/chart1/chart1-v0.15.5.tgz": dial tcp 172.20.225.87:80: connect: connection refused              2d18h
( . . .)
( . . .)
( . . .)
namespace2          chart11       False   Get "http://source-controller.gotk-system/helmchart/namespace2/chart11/chart11-v0.1.3.tgz": dial tcp 172.20.225.87:80: connect: connection refused              2d18h

The source controller manages one GitRepository and two HelmRepositories.
The helm controller takes care of 11 HelmReleases, each with similar configuration:

apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
  name: my-release
  namespace: namespace1
spec:
  install:
    remediation:
      retries: -1
  upgrade:
    remediation:
      retries: -1
  interval: 1m0s
  releaseName: my-release
  chart:
    spec:
      version: 1.0.2
      chart: my-chart
      sourceRef:
        kind: HelmRepository
        name: my-repository
        namespace: namespace1
  valuesFrom:
  - kind: ConfigMap
    name: my-values
    valuesKey: environment
    targetPath: myEnv
  values:
    my-value: 30

While writing up this issue the source-controller restarted 3 more times
Logs from the source controller don't indicate any errors:

{"level":"info","ts":"2020-11-02T15:00:17.646Z","logger":"controllers.HelmChart","msg":"Reconciliation finished in 364.572799ms, next run in 1m0s","controller":"helmchart","request":"namespace1/chart1"}
( . . . )
( . . . )
( . . . )
{"level":"info","ts":"2020-11-02T15:00:17.646Z","logger":"controllers.HelmChart","msg":"Reconciliation finished in 364.572799ms, next run in 1m0s","controller":"helmchart","request":"namespace1/chart11"}
{"level":"info","ts":"2020-11-02T15:01:12.527Z","logger":"controllers.GitRepository","msg":"Reconciliation finished in 1.631398488s, next run in 3m0s","controller":"gitrepository","request":"namespace1/my-git-repo"}
{"level":"info","ts":"2020-11-02T15:01:12.870Z","logger":"controllers.HelmRepository","msg":"Reconciliation finished in 1.165803995s, next run in 3m0s","controller":"helmrepository","request":"namespace1/my-repository"}

The text was updated successfully, but these errors were encountered:

stefanprodan · 2020-11-02T15:10:57Z

Can you post here the kubelet error, should be under describe replicaset or pod. Can you also post what interval are you using in the GitRepository.

avacaru · 2020-11-02T15:21:47Z

The pod description doesn't show any error, just that the pod was Terminated with reason OOMKilled.
Here's the ReplicaSet descritption:

$> k describe replicasets.apps -n gotk-system source-controller-5cc54c757c
Name:           source-controller-5cc54c757c
Namespace:      gotk-system
Selector:       app=source-controller,pod-template-hash=5cc54c757c
Labels:         app=source-controller
                pod-template-hash=5cc54c757c
Annotations:    deployment.kubernetes.io/desired-replicas: 1
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 1
Controlled By:  Deployment/source-controller
Replicas:       1 current / 1 desired
Pods Status:    1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       app=source-controller
                pod-template-hash=5cc54c757c
  Annotations:  prometheus.io/port: 8080
                prometheus.io/scrape: true
  Containers:
   manager:
    Image:       my-registry.com/fluxcd/source-controller:v0.2.1
    Ports:       9090/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      --events-addr=
      --watch-all-namespaces=true
      --log-level=info
      --log-json
      --enable-leader-election
      --storage-path=/data
    Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:      50m
      memory:   64Mi
    Liveness:   http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Readiness:  http-get http://:http/ delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      RUNTIME_NAMESPACE:   (v1:metadata.namespace)
      HTTPS_PROXY:        http://http.my-proxy.com:8000
      NO_PROXY:           10.0.0.0/8,172.0.0.0/8
    Mounts:
      /data from data (rw)
      /tmp from tmp (rw)
  Volumes:
   data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
   tmp:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
Events:         <none>

And here are the events in gotk-system:

$> k get events -n gotk-system
LAST SEEN   TYPE      REASON           OBJECT                                   MESSAGE
60m         Normal    LeaderElection   configmap/305740c0.fluxcd.io             source-controller-5cc54c757c-ccwz8_37899998-7d0f-4107-8ec2-cfb907cfe7a3 became leader
39m         Normal    LeaderElection   configmap/305740c0.fluxcd.io             source-controller-5cc54c757c-ccwz8_99debb56-a4d6-43d7-90c1-d44ee06d08e1 became leader
29m         Normal    LeaderElection   configmap/305740c0.fluxcd.io             source-controller-5cc54c757c-ccwz8_5c11d40a-c436-4281-97c3-66205ab3d16d became leader
19m         Normal    LeaderElection   configmap/305740c0.fluxcd.io             source-controller-5cc54c757c-ccwz8_1a1aa41c-5a5e-440e-941b-498aaf80d59a became leader
18m         Normal    LeaderElection   configmap/305740c0.fluxcd.io             source-controller-5cc54c757c-ccwz8_c48a4f03-14f8-4d66-87dc-a38fdb8cc608 became leader
16m         Normal    LeaderElection   configmap/305740c0.fluxcd.io             source-controller-5cc54c757c-ccwz8_8f619f94-12ba-4acd-ae8c-346325b1e77a became leader
10m         Normal    LeaderElection   configmap/305740c0.fluxcd.io             source-controller-5cc54c757c-ccwz8_16f7528e-336f-4f78-9ce5-c6cc424658b5 became leader
70s         Normal    LeaderElection   configmap/305740c0.fluxcd.io             source-controller-5cc54c757c-ccwz8_bf7cc933-c400-46b6-a7d4-fcbf30e6fd49 became leader
16m         Normal    Pulled           pod/source-controller-5cc54c757c-ccwz8   Container image "my-registry.com/fluxcd/source-controller:v0.2.1" already present on machine
16m         Normal    Created          pod/source-controller-5cc54c757c-ccwz8   Created container manager
16m         Normal    Started          pod/source-controller-5cc54c757c-ccwz8   Started container manager
4m8s        Warning   BackOff          pod/source-controller-5cc54c757c-ccwz8   Back-off restarting failed container
16m         Warning   Unhealthy        pod/source-controller-5cc54c757c-ccwz8   Readiness probe failed: Get http://10.0.10.11:9090/: dial tcp 10.0.10.11:9090: connect: connection refused
16m         Warning   Unhealthy        pod/source-controller-5cc54c757c-ccwz8   Liveness probe failed: Get http://10.0.10.11:9090/: dial tcp 10.0.10.11:9090: connect: connection refused

This is how the GitRepository is defined:

apiVersion: source.toolkit.fluxcd.io/v1beta1
kind: GitRepository
metadata:
  name: my-git-repo
spec:
  url: https://my-git-server.com/scm/my-repo
  secretRef:
    name: git-secret
  interval: 3m
  timeout: 60s
  ref:
    branch: my-branch

stefanprodan · 2020-11-02T15:23:20Z

Are all your HelmReleases coming from HelmRepositories or you do have charts in GitRepositories?

avacaru · 2020-11-02T15:27:31Z

All the HelmReleases have a HelmRepository as source (the same repository reference)

hiddeco · 2020-11-02T15:28:23Z

Can you share the sizes of the .tgz files as produced by the source-controller for the HelmChart resources, the size of the YAML files produced for the HelmRepository resources, and the size of the artifact created for the GitRepository?

Also: note that the interval you have set for the HelmRelease is extremely low, and is inherited in the template for the HelmChart defined in the spec.chart.spec. This means the source-controller will load (parts) of the chart into memory to make observations every minute, in addition to the index file for the repository (times 11 in a short time span, but not simultaneously given the limited amount of workers).

avacaru · 2020-11-02T15:38:23Z

I'm not sure if this is exactly what you're asking for but here are the sizes on the source-controller pod:

data/helmchart: 304K
data/helmrepository: 44.4M
data/gitrepository: 40K
data: 44.7M

stefanprodan · 2020-11-02T15:53:53Z

A 44MB index would explain the OOM, every minute the index is loaded into memory for each release and parsed, with the default number of workers means: 3*4*44=528MB, if GC is slow for some reason (busy Kubernetes node) at the 2nd run it will OOM.

avacaru · 2020-11-02T15:58:17Z

I've just increased the interval on the HelmReleases to 3m0s, I will update if I still see the issue.

avacaru · 2020-11-03T09:19:51Z

Increasing the interval from 1m to 3m in the HelmRelease didn't solve the issue. There have been over 70 restarts in 17h. When doing kubectl get hr --all-namespaces all of them are Ready=False with the error message: Get "http://source-controller.gotk-system/helmchart/namespace1/chart1/chart1-v0.15.5.tgz": dial tcp 172.70.139.122:80: connect: connection refused.

stefanprodan · 2020-11-03T10:04:02Z

Yeah that's expected, doesn't matter the interval if it's the same for all HRs. You either increase the memory limit or you trim down the 44MB index, for reference, the stable Helm repository index is 7MB.

brianpham · 2021-01-12T00:31:10Z

We are experiencing a similar issue too. The repo that we clone is pretty big
443.7M a15bf30c07b0378b262003cd99ce4c3fb19f0c8a.tar.gz which causes us to see this error every once in awhile.

failed to download artifact from http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/firespotter-bpham/41b16bd8b344a40836eec3ced8b8a031d78c7c4c.tar.gz, error: Get "http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/firespotter-bpham/41b16bd8b344a40836eec3ced8b8a031d78c7c4c.tar.gz": dial tcp 10.239.249.31:80: connect: connection refused

Is the only way to fix this would be to increase the limit on the source-controller? What did you end up setting your limit to? @avacaru

stefanprodan · 2021-01-12T08:55:20Z

You can change any field of Flux manifests with Kustomize patches without interfering with bootstrap, please read the docs https://toolkit.fluxcd.io/guides/installation/#customize-flux-manifests

stefanprodan · 2021-01-12T09:58:10Z

@brianpham make sure you use .sourceignore and you exclude everything else but the yaml manifests or consider having the manifests in a dedicated branch.

billimek · 2021-02-06T14:46:10Z

I believe that I'm encountering this issue as well with source controller (v0.7.4)

From the latest OOM kill last night:

Last 5 OOM kills:

Interestingly, the memory usage 'spike' coincides with a bunch of errors logged from source-controller, but I'm not certain if the errors are the cause or the symptom of the memory issue:

  |   | 2021-02-06 01:07:18 | 2021-02-06T06:07:18.849711004Z stderr F {"level":"info","ts":"2021-02-06T06:07:18.845Z","logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
  |   | 2021-02-06 01:07:13 | 2021-02-06T06:07:13.474216905Z stderr F {"level":"info","ts":"2021-02-06T06:07:13.474Z","logger":"controller.helmchart","msg":"Reconciliation finished in 12.27240148s, next run in 5m0s","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-coredns","namespace":"flux-system"}
  |   | 2021-02-06 01:07:13 | 2021-02-06T06:07:13.082799388Z stderr F {"level":"info","ts":"2021-02-06T06:07:13.082Z","logger":"controller.helmchart","msg":"Reconciliation finished in 12.153198745s, next run in 5m0s","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-nfs-client-provisioner","namespace":"flux-system"}
  |   | 2021-02-06 01:07:01 | 2021-02-06T06:07:01.201897549Z stderr F {"level":"info","ts":"2021-02-06T06:07:01.201Z","logger":"controller.helmchart","msg":"Reconciliation finished in 6.532376284s, next run in 5m0s","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system"}
  |   | 2021-02-06 01:07:00 | 2021-02-06T06:07:00.931256162Z stderr F {"level":"info","ts":"2021-02-06T06:07:00.929Z","logger":"controller.helmchart","msg":"Reconciliation finished in 6.283877764s, next run in 5m0s","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-oauth2-proxy","namespace":"flux-system"}
  |   | 2021-02-06 01:06:54 | 2021-02-06T06:06:54.664863416Z stderr F {"level":"error","ts":"2021-02-06T06:06:54.664Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-coredns","namespace":"flux-system","error":"Get \"https://charts.helm.sh/stable/packages/coredns-1.13.8.tgz\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:54 | 2021-02-06T06:06:54.657228588Z stderr F {"level":"error","ts":"2021-02-06T06:06:54.656Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"kubernetes-stable-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.helm.sh/stable/index.yaml\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:54 | 2021-02-06T06:06:54.645472257Z stderr F {"level":"error","ts":"2021-02-06T06:06:54.645Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-nfs-client-provisioner","namespace":"flux-system","error":"Get \"https://charts.helm.sh/stable/packages/nfs-client-provisioner-1.2.11.tgz\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:50 | 2021-02-06T06:06:50.463077309Z stderr F {"level":"error","ts":"2021-02-06T06:06:50.462Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:50 | 2021-02-06T06:06:50.462720564Z stderr F {"level":"error","ts":"2021-02-06T06:06:50.462Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:48 | 2021-02-06T06:06:48.402854269Z stderr F {"level":"info","ts":"2021-02-06T06:06:48.402Z","logger":"controller.gitrepository","msg":"Reconciliation finished in 6.915541887s, next run in 1m0s","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system"}
  |   | 2021-02-06 01:06:46 | 2021-02-06T06:06:46.600008065Z stderr F {"level":"error","ts":"2021-02-06T06:06:46.599Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"kubernetes-stable-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.helm.sh/stable/index.yaml\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:46 | 2021-02-06T06:06:46.597701935Z stderr F {"level":"error","ts":"2021-02-06T06:06:46.597Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-coredns","namespace":"flux-system","error":"Get \"https://charts.helm.sh/stable/packages/coredns-1.13.8.tgz\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:41 | 2021-02-06T06:06:41.158465791Z stderr F {"level":"error","ts":"2021-02-06T06:06:41.158Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:41 | 2021-02-06T06:06:41.158411393Z stderr F {"level":"error","ts":"2021-02-06T06:06:41.158Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:40 | 2021-02-06T06:06:40.212160284Z stderr F {"level":"error","ts":"2021-02-06T06:06:40.207Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:38 | 2021-02-06T06:06:38.560144001Z stderr F {"level":"error","ts":"2021-02-06T06:06:38.560Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"kubernetes-stable-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.helm.sh/stable/index.yaml\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:38 | 2021-02-06T06:06:38.549495113Z stderr F {"level":"error","ts":"2021-02-06T06:06:38.548Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-coredns","namespace":"flux-system","error":"Get \"https://charts.helm.sh/stable/packages/coredns-1.13.8.tgz\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:32 | 2021-02-06T06:06:32.486828454Z stderr F {"level":"error","ts":"2021-02-06T06:06:32.486Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:32 | 2021-02-06T06:06:32.485786292Z stderr F {"level":"error","ts":"2021-02-06T06:06:32.485Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:32 | 2021-02-06T06:06:31.450189766Z stderr F {"level":"error","ts":"2021-02-06T06:06:31.450Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:30 | 2021-02-06T06:06:30.517170051Z stderr F {"level":"error","ts":"2021-02-06T06:06:30.517Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"kubernetes-stable-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.helm.sh/stable/index.yaml\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:30 | 2021-02-06T06:06:30.512759542Z stderr F {"level":"error","ts":"2021-02-06T06:06:30.512Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-coredns","namespace":"flux-system","error":"Get \"https://charts.helm.sh/stable/packages/coredns-1.13.8.tgz\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:24 | 2021-02-06T06:06:24.129828108Z stderr F {"level":"error","ts":"2021-02-06T06:06:24.129Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:24 | 2021-02-06T06:06:24.12976421Z stderr F {"level":"error","ts":"2021-02-06T06:06:24.129Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:23 | 2021-02-06T06:06:23.108242718Z stderr F {"level":"error","ts":"2021-02-06T06:06:23.108Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:21 | 2021-02-06T06:06:20.840575366Z stderr F {"level":"error","ts":"2021-02-06T06:06:20.840Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-coredns","namespace":"flux-system","error":"Get \"https://charts.helm.sh/stable/packages/coredns-1.13.8.tgz\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:16 | 2021-02-06T06:06:15.933484828Z stderr F {"level":"error","ts":"2021-02-06T06:06:15.933Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:16 | 2021-02-06T06:06:15.909249047Z stderr F {"level":"error","ts":"2021-02-06T06:06:15.908Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:15 | 2021-02-06T06:06:14.926537152Z stderr F {"level":"error","ts":"2021-02-06T06:06:14.926Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:11 | 2021-02-06T06:06:11.211943496Z stderr F {"level":"error","ts":"2021-02-06T06:06:11.211Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-coredns","namespace":"flux-system","error":"Get \"https://charts.helm.sh/stable/packages/coredns-1.13.8.tgz\": dial tcp: lookup charts.helm.sh on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:07 | 2021-02-06T06:06:07.791478986Z stderr F {"level":"error","ts":"2021-02-06T06:06:07.791Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:07 | 2021-02-06T06:06:07.790164871Z stderr F {"level":"error","ts":"2021-02-06T06:06:07.790Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:06:06 | 2021-02-06T06:06:06.79442691Z stderr F {"level":"error","ts":"2021-02-06T06:06:06.794Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:59 | 2021-02-06T06:05:59.722096551Z stderr F {"level":"error","ts":"2021-02-06T06:05:59.721Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:59 | 2021-02-06T06:05:59.720861799Z stderr F {"level":"error","ts":"2021-02-06T06:05:59.720Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:58 | 2021-02-06T06:05:58.71582877Z stderr F {"level":"error","ts":"2021-02-06T06:05:58.715Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:51 | 2021-02-06T06:05:51.674075128Z stderr F {"level":"error","ts":"2021-02-06T06:05:51.673Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:51 | 2021-02-06T06:05:51.674032835Z stderr F {"level":"error","ts":"2021-02-06T06:05:51.673Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:50 | 2021-02-06T06:05:50.652427545Z stderr F {"level":"error","ts":"2021-02-06T06:05:50.652Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:43 | 2021-02-06T06:05:43.640698151Z stderr F {"level":"error","ts":"2021-02-06T06:05:43.640Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:43 | 2021-02-06T06:05:43.640360015Z stderr F {"level":"error","ts":"2021-02-06T06:05:43.640Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:42 | 2021-02-06T06:05:42.617590202Z stderr F {"level":"error","ts":"2021-02-06T06:05:42.617Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:35 | 2021-02-06T06:05:35.610319682Z stderr F {"level":"error","ts":"2021-02-06T06:05:35.609Z","logger":"controller.helmrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"bitnami-charts","namespace":"flux-system","error":"failed to download repository index: Get \"https://charts.bitnami.com/bitnami/index.yaml\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:35 | 2021-02-06T06:05:35.608077914Z stderr F {"level":"error","ts":"2021-02-06T06:05:35.607Z","logger":"controller.helmchart","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"kube-system-metallb","namespace":"flux-system","error":"Get \"https://charts.bitnami.com/bitnami/metallb-1.1.0.tgz\": dial tcp: lookup charts.bitnami.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:34 | 2021-02-06T06:05:34.588944813Z stderr F {"level":"error","ts":"2021-02-06T06:05:34.588Z","logger":"controller.gitrepository","msg":"Reconciler error","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"GitRepository","name":"flux-system","namespace":"flux-system","error":"unable to clone 'ssh://git@github.com/billimek/k8s-gitops', error: dial tcp: lookup github.com on 10.43.0.10:53: server misbehaving"}
  |   | 2021-02-06 01:05:01 | 2021-02-06T06:05:01.722442354Z stderr F {"level":"info","ts":"2021-02-06T06:05:01.722Z","logger":"controller.helmrepository","msg":"Reconciliation finished in 6.71510648s, next run in 10m0s","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmRepository","name":"banzaicloud-charts","namespace":"flux-system"}
  |   | 2021-02-06 01:04:50 | 2021-02-06T06:04:50.10747884Z stderr F {"level":"info","ts":"2021-02-06T06:04:50.106Z","logger":"controller.helmchart","msg":"Reconciliation finished in 738.712206ms, next run in 5m0s","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"cert-manager-cert-manager","namespace":"flux-system"}
  |   | 2021-02-06 01:04:44 | 2021-02-06T06:04:44.664699112Z stderr F {"level":"info","ts":"2021-02-06T06:04:44.664Z","logger":"controller.helmchart","msg":"Reconciliation finished in 423.6548ms, next run in 5m0s","reconciler group":"source.toolkit.fluxcd.io","reconciler kind":"HelmChart","name":"default-ser2sock","namespace":"flux-system"}

Is the appropriate remedy to increase the memory limit for source controller? It's currently set to 1Gi.

hiddeco · 2021-02-06T14:50:04Z

@billimek if you have many Helm related resources in your cluster, you may want to try this, as for some operations we need to read e.g. whole repository indexes from memory.

billimek · 2021-02-06T15:03:42Z

@billimek if you have many Helm related resources in your cluster, you may want to try this, as for some operations we need to read e.g. whole repository indexes from memory.

Thanks @hiddeco, I beleive that there are probably a lot in this case. I bumped the limit to 2Gi. Appreciate the super fast response!

❯ k get helmreleases.helm.toolkit.fluxcd.io -A | wc -l
44

❯ k get helmcharts.source.toolkit.fluxcd.io -A | wc -l
44

❯ k get helmrepositories.source.toolkit.fluxcd.io -A | wc -l
24

/data $ du -hs /data/*
124.0K  /data/gitrepository
1.6M    /data/helmchart
19.9M   /data/helmrepository

hiddeco · 2021-02-06T23:17:20Z

@billimek the files are not as enormous as I would have expected (I have seen indexes of ~50MiB).

I have created a PR to enable pprof endpoints on the metrics server so that we can get a better insight into the resource consumption of your controller.

Ayatallah · 2021-04-05T13:14:16Z

You can change any field of Flux manifests with Kustomize patches without interfering with bootstrap, please read the docs https://toolkit.fluxcd.io/guides/installation/#customize-flux-manifests

Should I add the following part, to the kustomization.yaml that is in the same directory alongside with gotk-sync.yaml and gotk-components.yaml ??
patchesStrategicMerge:

flux-patch.yaml
Cause I did that and still the flux instance in my cluster did not reflect the customization i written in flux-patch.yaml

Ayatallah · 2021-04-05T13:14:38Z

You can change any field of Flux manifests with Kustomize patches without interfering with bootstrap, please read the docs https://toolkit.fluxcd.io/guides/installation/#customize-flux-manifests

Should I add the following part, to the kustomization.yaml that is in the same directory alongside with gotk-sync.yaml and gotk-components.yaml ??
patchesStrategicMerge:

flux-patch.yaml
Cause I did that and still the flux instance in my cluster did not reflect the customization i written in flux-patch.yaml

@stefanprodan

onedr0p · 2021-04-05T13:16:55Z

@Ayatallah see my example here and here.

Ayatallah · 2021-04-05T13:38:38Z

@Ayatallah see my example here and here.

Thank you! I almost did the same and source-controller memory limit still the same, do you do anything specific for the flux instance to sync with these kustomization or just commit and push them to git and it sync automatically?!

Ayatallah · 2021-04-05T13:38:56Z

@Ayatallah see my example here and here.

Thank you! I almost did the same and source-controller memory limit still the same, do you do anything specific for the flux instance to sync with these kustomization or just commit and push them to git and it sync automatically?!

@onedr0p

onedr0p · 2021-04-05T13:40:33Z

@Ayatallah see my example here and here.

Thank you! I almost did the same and source-controller memory limit still the same, do you do anything specific for the flux instance to sync with these kustomization or just commit and push them to git and it sync automatically?!

@onedr0p

IIRC it was synced automatically.

Ayatallah · 2021-04-05T14:37:50Z

@Ayatallah see my example here and here.

Thank you! I almost did the same and source-controller memory limit still the same, do you do anything specific for the flux instance to sync with these kustomization or just commit and push them to git and it sync automatically?!

@onedr0p

IIRC it was synced automatically.

Okay, can you let me know if I'm missing out anyth:
-- I bootstraped flux instance called staging using bootstrap command, and I got the following directory created automatically
staging/
gotk-sync.yaml
gotk-components.yaml
kustomization.yaml

I added the following to kustomization.yaml:
patchesStrategicMerge:
- gotk-patches.yaml

so it now looks like this:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-components.yaml
-gotk-sync.yaml
patchesStrategicMerge:
-gotk-patches.yam

and gotk-patches.yaml content is as follows:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: source-controller
namespace: staging
spec:
template:
spec:
containers:
- name: manager
resources:
limits:
memory: 1.3Gi

so staging directory now contains 4 files:
staging/
gotk-sync.yaml
gotk-components.yaml
kustomization.yaml
gotk-patches.yaml

then commit and push to git but not automatic sync happening!

Ayatallah · 2021-04-05T14:49:41Z

@Ayatallah see my example here and here.

Thank you! I almost did the same and source-controller memory limit still the same, do you do anything specific for the flux instance to sync with these kustomization or just commit and push them to git and it sync automatically?!

@onedr0p

IIRC it was synced automatically.

Okay, can you let me know if I'm missing out anyth:
-- I bootstraped flux instance called staging using bootstrap command, and I got the following directory created automatically
staging/
gotk-sync.yaml
gotk-components.yaml
kustomization.yaml

I added the following to kustomization.yaml:
patchesStrategicMerge:
- gotk-patches.yaml

so it now looks like this:
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- gotk-components.yaml
-gotk-sync.yaml
patchesStrategicMerge:
-gotk-patches.yaml

and gotk-patches.yaml content is as follows:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: source-controller
namespace: staging
spec:
template:
spec:
containers:
- name: manager
resources:
limits:
memory: 1.3Gi

so staging directory now contains 4 files:
staging/
gotk-sync.yaml
gotk-components.yaml
kustomization.yaml
gotk-patches.yaml

then commit and push to git but not automatic sync happening!

@onedr0p @stefanprodan

stefanprodan · 2021-04-05T14:51:08Z

@Ayatallah many things look wrong in there, there is a typo in the patch file name, also the namespace is wrong, should be flux-system. Please use code blocks and paste the YAML inside them.

Ayatallah · 2021-04-05T14:57:13Z

@Ayatallah many things look wrong in there, there is a typo in the patch file name, also the namespace is wrong, should be flux-system. Please use code blocks and paste the YAML inside them.

Should I use flux-system as namespace even if its not the namespace i bootstrapped the flux instance in?!

flux bootstrap gitlab --owner=name --repository=name --branch=name --path=path/to/ --token-auth --namespace=staging

Ayatallah · 2021-04-05T15:00:32Z

@Ayatallah many things look wrong in there, there is a typo in the patch file name, also the namespace is wrong, should be flux-system. Please use code blocks and paste the YAML inside them.

Should I use flux-system as namespace even if its not the namespace i bootstrapped the flux instance in?!

flux bootstrap gitlab --owner=name --repository=name --branch=name --path=path/to/ --token-auth --namespace=staging

@stefanprodan

Ayatallah · 2021-04-05T15:24:05Z

Is it a must to use namespace=flux-system ?!
Like, if I need to create multiple flux instances in my cluster, one instance per namespace, I should not create flux instance in each namespace of them, I should instead create flux instance in flux-system namespace and replicate it using Kustomization patches, and give each Kustomization different path to track

stefanprodan · 2021-04-05T15:25:48Z

@Ayatallah Flux v2 is not meant to be installed more than once per cluster. See https://github.com/fluxcd/flux2-multi-tenancy on how to do multi-tenancy if that's what you're after.

Ayatallah · 2021-04-06T13:28:33Z

@Ayatallah Flux v2 is not meant to be installed more than once per cluster. See https://github.com/fluxcd/flux2-multi-tenancy on how to do multi-tenancy if that's what you're after.
@stefanprodan
I'm looking for doing multi tenancy yes, but my tenant repository architecture is not :
Base/
       app1/
       app2/
Overlay/
       app1/
       app2/
However, its the following
App1/
       base/
       overlays/
             dev/
             prod/
App2/
       base/
       overlays/
             dev/
             prod/
That's why I thought of creating more than one flux instance per namespace (prod, dev, ..etc), would that be manageable with 1 flux instance in the cluster?! Seems like the steps in the link is based on specific architecture for tenant repository.

Ayatallah · 2021-04-06T13:34:01Z

@Ayatallah Flux v2 is not meant to be installed more than once per cluster. See https://github.com/fluxcd/flux2-multi-tenancy on how to do multi-tenancy if that's what you're after.
@stefanprodan
I'm looking for doing multi tenancy yes, where I have one cluster having several namespaces (prod, dev, ..etc), and for dev namespace, it should have the dev instance of APP1 and dev instance of APP2 running, and same for prod, but my tenant repository architecture is not :
Base/
       app1/
       app2/
Overlay/
       app1/
       app2/
However, its the following
App1/
       base/
       overlays/
             dev/
             prod/
App2/
       base/
       overlays/
             dev/
             prod/
That's why I thought of creating more than one flux instance per namespace (prod, dev, ..etc), would that be manageable with 1 flux instance in the cluster?! Seems like the steps in the link is based on specific architecture for tenant repository.

or for flux v2 multi tenancy to be applied properly, I have to re-structure my repo?!

hiddeco · 2021-04-06T14:28:32Z

No, you can create multiple Kustomization resources (in different namespaces) that all select a different (environment) folder.

gdoctor · 2021-04-27T17:34:54Z

Nothing in my git or helm repositories are particularly large (screenshot); yet, during source-controller pod startup, the pod spikes past 2.5Gb of memory. What are known implementation decisions that would cause a large memory spike on startup? After a couple minutes, it comes back down to around 1Gb where it seems to be staying. I do have decently fast interval on my GitRepository (2 total objects, 1 min interval each) and HelmRepository objects (3 total objects, 1 min interval each). But I also have the same configurations running in about 8 other clusters with no issues.

So my question is what is causing this in only one cluster and not elsewhere? I suspect it could be related to the way I have my helm charts configured in this cluster. I have 3 charts being fetched and packaged directly from a GitRepository. I am not doing this in other clusters so I am guessing that could be the root cause. Is there known performance trade offs with using GitRepositories in a helm chart?

stefanprodan · 2021-04-28T05:55:36Z

I have 3 charts being fetched and packaged directly from a GitRepository

Do you have .helmignore files in there that exclude .git/?

gdoctor · 2021-04-28T16:41:46Z

I have 3 charts being fetched and packaged directly from a GitRepository

Do you have .helmignore files in there that exclude .git/?

Yes some of my charts have a .helmignore that excludes .git/; however, that is only true for the charts that have a Source of HelmRepository. None of the charts that have a Source of GitRepository have that condition. Could that still create issues?

Edit: I should add that ALL of the charts are stored in the same git repo. So there are definitely charts that have a .helmignore excluding .git/ in that git repo.

kingdonb · 2021-05-24T15:49:04Z

Is .helmignore actually honored? I don't find any references to it in the source or documentation, so I would assume it is a still a helm-client-only feature and not something you can count on Flux to honor when using GitRepository helm source. (This is perhaps unfortunate, but it looks like this feature hasn't been requested before!)

Documentation at helm.sh about the purpose of this file indicates that .helmignore is for use by helm chart packagers, this is ostensibly something that source-controller should consider doing, but I don't think it is implemented currently at all.

What would be really nice is if you could forward .helmignore to source-controller, (since that's where the OOM condition is encountered,) and at that point source-controller doesn't really even know if it is being used to carry a helm chart, it is simply a git repository.

If source controller gitrepositories could somehow know that they'll be used for serving a helm chart, and they should honor .helmignore and consider it equal to .sourceignore, that would make it possible to use with repos like this one as an upstream, which is currently not possible according to report from Slack user: https://github.com/neo4j-contrib/neo4j-helm

Right now I think the only other way to accomplish this installation is to fork and add .sourceignore, manually copy the content from the .helmignore file, or write it into spec of a gitrepo source and keep it up to date somehow, at spec.ignore.

rajivchirania · 2021-10-12T10:09:56Z

@stefanprodan Also facing the same OOM issue with source controller.
However in our case flux v2 is installed using terraform provider.
Can you please tell how to set the limit for source controller in this case.

stefanprodan · 2021-10-12T10:18:46Z

@rajivchirania see fluxcd/terraform-provider-flux#178

rajivchirania · 2021-10-12T13:29:19Z

@rajivchirania see fluxcd/terraform-provider-flux#178

@stefanprodan

So i made the changes but still it does apply the limit or the request that i have set for source controller

This is my kustomization template file

apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: ${sync_name}
  namespace: flux-system
spec:
  force: false
  interval: ${interval}
  path: ./${target}
  prune: true
  sourceRef:
    kind: GitRepository
    name: ${sync_name}
  validation: client
  patchesStrategicMerge:
  - apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: source-controller
      namespace: flux-system
    spec:
      template:
        spec:
          containers:
            resources:
              limits:
                cpu: "1"
                memory: 2Gi
              requests:
                cpu: 50m
                memory: 256Mi

My kustomization.tf file is like this where i apply this above template

locals {
  kustomization_template = templatefile("${path.module}/values/kustomization.yaml.tpl", {
    sync_name = var.sync_name
    target    = var.flux_target_path
    interval  = var.interval_default_kustomization
  })

resource "kubectl_manifest" "kustomization" {
  yaml_body = local.kustomization_template
}

Please let me know if i am doing something wrong

matt-woodruff-f3 · 2021-11-17T18:11:48Z

Hey all. I'm currently experiencing the same issue along with @ekosov-form3 on our work. This thread has been helpful and increasing the memory limit is our solution for now.

However we'd like to understand more about the source controller's memory requirements so we can understand why its so high and look at alternative solutions like reducing the index size.

We'd like to know

roughly how to calculate the memory requirements for a reconciliation. @stefanprodan you touched on it here but we were unable to understand the significance of the 3 in the equation. Also we're assuming the 4 relates to the number of workers.
is the index loaded for every reconciliation or is it cached for other reconciliations using the same source?
does the reconciliation interval have any effect on memory? Maybe if its too frequent garbage collection may not trigger in time?

Currently we're seeing memory fluctuate roughly between 600MB and 1200MB with the following resources and a default installation except for the source controller's memory limit.

All helm charts have 1m interval. All git/helm repositories used by helm releases have 5m interval.

73	helm charts
14	git repositories
5	helm repositories

Total sizes in /data
21M	gitrepository
804K	helmchart
21M	helmrepository

1 helm chart    -> helm repo 1 [9.4MB]
1 helm chart    -> helm repo 2 [9.4MB]
1 helm chart    -> helm repo 3 [1.5MB]
1 helm chart    -> helm repo 4 [148k]
1 helm chart    -> helm repo 5 [8k]

2 helm charts	-> git repo 1	[9.7MB]
1 helm chart	-> git repo 2	[548k]
2 helm charts	-> git repo 3	[548k]
1 helm chart	-> git repo 4	[548k]
1 helm chart	-> git repo 5	[548k]
1 helm chart	-> git repo 6	[548k]
1 helm chart	-> git repo 7	[548k]
5 helm charts	-> git repo 8	[188k]
20 helm charts  -> git repo 9	[188k]
5 helm charts	-> git repo 10	[188k]
22 helm charts  -> git repo 11	[188k]
3 helm charts	-> git repo 12	[188k]
4 helm charts	-> git repo 13	[188k]

1 kustomization -> git repo 14  [2.2MB, 30s interval]

Thanks

hiddeco · 2021-11-17T22:05:23Z

@matt-woodruff-f3 given you have collected such detailed statistics about your Helm usage, would you be willing to give an image based on #485 a spin? This is getting into a shape that it'll likely end up in a release soon, and will greatly effect the answers to your questions (and should heavily improve performance).

If so, please reach out to me on Slack (@hidde), or comment here.

hiddeco · 2021-11-18T10:53:56Z

Release candidate for the above PR has been made available, and instructions are added to the PR for testing purposes. It would be great if some of you could try this out and share results, as simulating real-world Helm setups has proven to be extremely difficult.

kingdonb · 2021-12-07T17:48:21Z

I believe these changes are in source-controller 0.19.0 and Flux 0.24.0, so this issue can be closed out now.

(Is that correct?)

hiddeco · 2021-12-08T08:42:38Z

The changes have indeed been released in 0.19.x, but I would like to see a confirmation from e.g. @matt-woodruff-f3 around resource usage reduction before I think this can be closed.

matt-woodruff-f3 · 2021-12-08T16:12:03Z

@hiddeco Thanks for the update! We've been running 0.19.0 in 3 of our environments for a few days now and can report no OOM issues. We've even reverted the memory requirements back to default from Max 2Gi to 1Gi.

kingdonb · 2021-12-08T17:53:11Z

Awesome. Thanks for the confirmation @matt-woodruff-f3 – I'll close this now, based on your confirmation!

hiddeco mentioned this issue Feb 6, 2021

Enable pprof endpoints on metrics server #282

Merged

stefanprodan mentioned this issue May 26, 2021

Update Git packages #365

Merged

runningman84 mentioned this issue Aug 26, 2021

source-controller OOM events #303

Closed

famousgarkin mentioned this issue Oct 26, 2021

OOM crash in source-controller pod fluxcd/flux2#991

Closed

kingdonb closed this as completed Dec 8, 2021

source-controller pod restarting (OOMKilled) #192

source-controller pod restarting (OOMKilled) #192

Comments

avacaru commented Nov 2, 2020

stefanprodan commented Nov 2, 2020 • edited Loading

avacaru commented Nov 2, 2020

stefanprodan commented Nov 2, 2020

avacaru commented Nov 2, 2020

hiddeco commented Nov 2, 2020 • edited Loading

avacaru commented Nov 2, 2020 • edited Loading

stefanprodan commented Nov 2, 2020 • edited Loading

avacaru commented Nov 2, 2020

avacaru commented Nov 3, 2020

stefanprodan commented Nov 3, 2020

brianpham commented Jan 12, 2021

stefanprodan commented Jan 12, 2021

stefanprodan commented Jan 12, 2021

billimek commented Feb 6, 2021

hiddeco commented Feb 6, 2021

billimek commented Feb 6, 2021 • edited Loading

hiddeco commented Feb 6, 2021

Ayatallah commented Apr 5, 2021

Ayatallah commented Apr 5, 2021

onedr0p commented Apr 5, 2021

Ayatallah commented Apr 5, 2021

Ayatallah commented Apr 5, 2021

onedr0p commented Apr 5, 2021

Ayatallah commented Apr 5, 2021 • edited Loading

Ayatallah commented Apr 5, 2021 • edited Loading

stefanprodan commented Apr 5, 2021

Ayatallah commented Apr 5, 2021

Ayatallah commented Apr 5, 2021

Ayatallah commented Apr 5, 2021

stefanprodan commented Apr 5, 2021

Ayatallah commented Apr 6, 2021

Ayatallah commented Apr 6, 2021 • edited Loading

hiddeco commented Apr 6, 2021

gdoctor commented Apr 27, 2021 • edited Loading

stefanprodan commented Apr 28, 2021

gdoctor commented Apr 28, 2021 • edited Loading

kingdonb commented May 24, 2021

rajivchirania commented Oct 12, 2021

stefanprodan commented Oct 12, 2021

rajivchirania commented Oct 12, 2021

matt-woodruff-f3 commented Nov 17, 2021

hiddeco commented Nov 17, 2021

hiddeco commented Nov 18, 2021

kingdonb commented Dec 7, 2021

hiddeco commented Dec 8, 2021 • edited Loading

matt-woodruff-f3 commented Dec 8, 2021 • edited Loading

kingdonb commented Dec 8, 2021

stefanprodan commented Nov 2, 2020 •

edited

Loading

hiddeco commented Nov 2, 2020 •

edited

Loading

avacaru commented Nov 2, 2020 •

edited

Loading

stefanprodan commented Nov 2, 2020 •

edited

Loading

billimek commented Feb 6, 2021 •

edited

Loading

Ayatallah commented Apr 5, 2021 •

edited

Loading

Ayatallah commented Apr 5, 2021 •

edited

Loading

Ayatallah commented Apr 6, 2021 •

edited

Loading

gdoctor commented Apr 27, 2021 •

edited

Loading

gdoctor commented Apr 28, 2021 •

edited

Loading

hiddeco commented Dec 8, 2021 •

edited

Loading

matt-woodruff-f3 commented Dec 8, 2021 •

edited

Loading