Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RabbitMQ operator restarts on Flux's labels change #414

Closed
Tolsto opened this issue Nov 3, 2020 · 5 comments
Closed

RabbitMQ operator restarts on Flux's labels change #414

Tolsto opened this issue Nov 3, 2020 · 5 comments

Comments

@Tolsto
Copy link

Tolsto commented Nov 3, 2020

I'm experiencing an issue where Flux seems to change resources in the cluster that were not changed in the commit pushed to the source repository. I noticed this because the RabbitMQ operator in the cluster senses a change that requires a restart of the RabbitMQ pods. In other words, every push currently causes a restart of our RabbitMQ cluster even though no RabbitMQ manifests were changed.

This is a log excerpt for adding a label to a single deployment:

2020-11-03T10:34:51.816882733Z {"level":"info","ts":"2020-11-03T10:34:51.816Z","logger":"controllers.Kustomization","msg":"requesting reconciliation due to GitRepository revision change","kustomization":"flux-system/flux-system","revision":"master/1b41d7c408060c7631b48bb456f8f18ee7790c93"}
2020-11-03T10:34:59.373028068Z {"level":"info","ts":"2020-11-03T10:34:59.372Z","logger":"controllers.Kustomization","msg":"Kustomization applied in 3.791725111s","kustomization":"flux-system/flux-system","output":{"alert.notification.toolkit.fluxcd.io/on-call-webapp":"configured","apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io":"configured","clusterissuer.cert-manager.io/letsencrypt-live":"configured","clusterissuer.cert-manager.io/letsencrypt-staging":"configured","clusterrole.rbac.authorization.k8s.io/argo-cluster-role":"configured","clusterrole.rbac.authorization.k8s.io/argo-clusterworkflowtemplate-role":"configured","clusterrole.rbac.authorization.k8s.io/calico-node":"configured","clusterrole.rbac.authorization.k8s.io/cluster-autoscaler":"configured","clusterrole.rbac.authorization.k8s.io/ebs-external-attacher-role":"configured","clusterrole.rbac.authorization.k8s.io/ebs-external-provisioner-role":"configured",[...]
2020-11-03T10:34:59.843431298Z {"level":"error","ts":"2020-11-03T10:34:59.843Z","logger":"controllers.Kustomization","msg":"unable to update status after reconciliation","controller":"kustomization","request":"flux-system/flux-system","error":"Operation cannot be fulfilled on kustomizations.kustomize.toolkit.fluxcd.io \"flux-system\": the object has been modified; please apply your changes to the latest version and try again"}
2020-11-03T10:34:59.843460554Z {"level":"error","ts":"2020-11-03T10:34:59.843Z","logger":"controller","msg":"Reconciler error","reconcilerGroup":"kustomize.toolkit.fluxcd.io","reconcilerKind":"Kustomization","controller":"kustomization","name":"flux-system","namespace":"flux-system","error":"Operation cannot be fulfilled on kustomizations.kustomize.toolkit.fluxcd.io \"flux-system\": the object has been modified; please apply your changes to the latest version and try again"}
2020-11-03T10:35:05.852641791Z {"level":"info","ts":"2020-11-03T10:35:05.852Z","logger":"controllers.Kustomization","msg":"Kustomization applied in 2.235112033s","kustomization":"flux-system/flux-system","output":{"alert.notification.toolkit.fluxcd.io/on-call-webapp":"unchanged","apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io":"unchanged","clusterissuer.cert-manager.io/letsencrypt-live":"unchanged","clusterissuer.cert-manager.io/letsencrypt-staging":"unchanged","clusterrole.rbac.authorization.k8s.io/argo-cluster-role":"unchanged","clusterrole.rbac.authorization.k8s.io/argo-clusterworkflowtemplate-role":"unchanged","clusterrole.rbac.authorization.k8s.io/calico-node":"unchanged","clusterrole.rbac.authorization.k8s.io/cluster-autoscaler":"unchanged","clusterrole.rbac.authorization.k8s.io/ebs-external-attacher-role":"unchanged","clusterrole.rbac.authorization.k8s.io/ebs-external-provisioner-role":"unchanged",[...]

I've shortened the 2nd and last line. The 2nd line shows every resource in the cluster as configured.
The last line shows most resources as unchanged, including the deployment that the label was added to. Some resources, especially CRD definitions show up as configured.

Any directions for debugging this issue further are appreciated.

The installation was bootstrapped with Flux version 0.2.1

@stefanprodan
Copy link
Member

stefanprodan commented Nov 3, 2020

All objects coming from the repo are labeled but that shouldn't cause any restarts unless the RabbitMQ operator reacts to changes in the metadata.labels. The label value will be updated only if there is a new commit and the content of YAMLs has changed.

@Tolsto
Copy link
Author

Tolsto commented Nov 3, 2020

Thanks a lot for pointing that out. I didn't realize that a new commit will change the label for all resources. I just tried to add a label to the custom resource that defines the RabbitMQ cluster and sure enough that causes the operator to restart the RabbitMQ pods.
The documentation also says Modifying labels triggers a rolling restart of StatefulSet.
I guess I'll raise this issue with the RabbitMQ folks. Thanks again.

@stefanprodan
Copy link
Member

For now the workaround is to move the RabbitMQ manifests to the root of your repo e.g. ./base/rabbitmq then inside your cluster dir, create a Kustomization for it:

apiVersion: kustomize.toolkit.fluxcd.io/v1beta1
kind: Kustomization
metadata:
  name: rabbitmq
  namespace: flux-system
spec:
  interval: 5m
  path: "./base/rabbitmq"
  prune: false
  sourceRef:
    kind: GitRepository
    name: flux-system

After you commit the change and rabbitmq is reconciled, you can suspend it with flux suspend kustomization rabbitmq.

I'm going to make some changes to how labels work so that the checksum label will not be added for prune: false, this way you can disable pruning and prevent Rabbit restarts in the future.

@stefanprodan stefanprodan changed the title Flux touches unchanged resources RabbitMQ operator restarts on Flux's labels change Nov 3, 2020
@Tolsto
Copy link
Author

Tolsto commented Nov 3, 2020

Thank you, the workaround works. I'll then remove the suspension after the next Flux release.

@stefanprodan
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants