Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oscillating reconciles for aggregated clusterroles #1041

Closed
Tracked by #4556
erikgb opened this issue Dec 21, 2023 · 5 comments · Fixed by fluxcd/pkg#719
Closed
Tracked by #4556

Oscillating reconciles for aggregated clusterroles #1041

erikgb opened this issue Dec 21, 2023 · 5 comments · Fixed by fluxcd/pkg#719
Labels
area/server-side-apply SSA related issues and pull requests bug Something isn't working

Comments

@erikgb
Copy link
Contributor

erikgb commented Dec 21, 2023

After upgrading Flux from version 2.1.2 to 2.2.1 (kustomize-controller from version 1.1.1 to 1.2.1), we observe oscillating reconciles for all our aggregated clusterroles. I asked on Slack about this but haven't gotten anything back yet.

This happens for all our aggregated clusterroles originating from Kyverno and Crossplane Helm chart. Note: even if the resource examples indicate that we are using Helm, this is untrue. We inflate all Helm charts into YAML manifests using kustomize (CLI) and use a Flux Kustomization to provision. I will use one of the Kyverno aggregated clusterroles as an example, but the symptom is the same for all aggregated clusterroles in our Flux installation.

Expected behavior (Flux 2.1.2): When Flux reconciles a Kustomization containing an aggregated clusterrole (without any rules in source), the SSA should be a no-op and don't create any update event.

Actualt behavior (Flux 2.2.1): When Flux reconciles a Kustomization containing an aggregated clusterrole (without any rules in source), it seems like the SSA performed by Flux is resetting the rules field creating an update event for each aggregated clusterrole. The rules field is immediately populated by the aggregated clusterrole controller. So the next time Flux performs a reconcile, this happens again....

Example clusterrole in source:

aggregationRule:
  clusterRoleSelectors:
  - matchLabels:
      app.kubernetes.io/component: reports-controller
      app.kubernetes.io/instance: kyverno
      app.kubernetes.io/part-of: kyverno
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: reports-controller
    app.kubernetes.io/instance: kyverno
    app.kubernetes.io/part-of: kyverno
    app.kubernetes.io/version: 3.1.1
  name: kyverno:reports-controller

Resulting clusterrole in one of our clusters (note the timestamps on the managed fields)

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: 'kyverno:reports-controller'
  uid: 435e6c86-269e-41a6-abc7-f621d70a0e3a
  resourceVersion: '845187637'
  creationTimestamp: '2023-08-10T11:22:40Z'
  labels:
    app.kubernetes.io/component: reports-controller
    app.kubernetes.io/instance: kyverno
    app.kubernetes.io/part-of: kyverno
    app.kubernetes.io/version: 3.1.1
    kustomize.toolkit.fluxcd.io/name: kyverno
    kustomize.toolkit.fluxcd.io/namespace: flux-system
  managedFields:
    - manager: clusterrole-aggregation-controller
      operation: Apply
      apiVersion: rbac.authorization.k8s.io/v1
      time: '2023-12-21T13:53:25Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:rules': {}
    - manager: kustomize-controller
      operation: Apply
      apiVersion: rbac.authorization.k8s.io/v1
      time: '2023-12-21T13:53:25Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:aggregationRule':
          'f:clusterRoleSelectors': {}
        'f:metadata':
          'f:labels':
            'f:app.kubernetes.io/component': {}
            'f:app.kubernetes.io/instance': {}
            'f:app.kubernetes.io/part-of': {}
            'f:app.kubernetes.io/version': {}
            'f:kustomize.toolkit.fluxcd.io/name': {}
            'f:kustomize.toolkit.fluxcd.io/namespace': {}
rules:
  - verbs:
      - get
      - list
      - watch
    apiGroups:
      - '*'
    resources:
      - '*'
  - verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
      - deletecollection
    apiGroups:
      - kyverno.io
    resources:
      - admissionreports
      - clusteradmissionreports
      - backgroundscanreports
      - clusterbackgroundscanreports
  - verbs:
      - create
      - delete
      - get
      - list
      - patch
      - update
      - watch
      - deletecollection
    apiGroups:
      - wgpolicyk8s.io
    resources:
      - policyreports
      - policyreports/status
      - clusterpolicyreports
      - clusterpolicyreports/status
  - verbs:
      - create
      - patch
    apiGroups:
      - ''
      - events.k8s.io
    resources:
      - events
aggregationRule:
  clusterRoleSelectors:
    - matchLabels:
        app.kubernetes.io/component: reports-controller
        app.kubernetes.io/instance: kyverno
        app.kubernetes.io/part-of: kyverno
@stefanprodan
Copy link
Member

As a workaround, you can add the following annotation for Flux to stop reconciling the cluster roles kustomize.toolkit.fluxcd.io/ssa: IfNotPresent.

I suspect this change fluxcd/pkg#658 is the root cause, the change @hiddeco made to SSA is to normalize all objects, and I guess that the rules filed is now added to the objects, resulting in a fully wipe of all existing rules.

Can you please post here the output of flux diff kustomization.

@erikgb
Copy link
Contributor Author

erikgb commented Dec 21, 2023

$ flux diff kustomization kyverno --path infrastructure/kyverno/default/
✓  Kustomization diffing...
► ClusterRole/kyverno:admission-controller drifted

rules
  ± type change from list to <nil>
    - - apiGroups:
        - admissionregistration.k8s.io
        resources:
        - mutatingwebhookconfigurations
        - validatingwebhookconfigurations
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        - rbac.authorization.k8s.io
        resources:
        - roles
        - clusterroles
        - rolebindings
        - clusterrolebindings
        verbs:
        - list
        - watch
      - apiGroups:
        - kyverno.io
        resources:
        - policies
        - policies/status
        - clusterpolicies
        - clusterpolicies/status
        - updaterequests
        - updaterequests/status
        - admissionreports
        - clusteradmissionreports
        - backgroundscanreports
        - clusterbackgroundscanreports
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        - wgpolicyk8s.io
        resources:
        - policyreports
        - policyreports/status
        - clusterpolicyreports
        - clusterpolicyreports/status
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        -
        - events.k8s.io
        resources:
        - events
        verbs:
        - create
        - update
        - patch
      - apiGroups:
        - authorization.k8s.io
        resources:
        - subjectaccessreviews
        verbs:
        - create
      - apiGroups:
        - "*"
        resources:
        - "*"
        verbs:
        - get
        - list
        - watch
    + <nil>

► ClusterRole/kyverno:cleanup-controller drifted

rules
  ± type change from list to <nil>
    - - apiGroups:
        -
        resources:
        - namespaces
        verbs:
        - delete
        - get
        - list
        - watch
      - apiGroups:
        - admissionregistration.k8s.io
        resources:
        - validatingwebhookconfigurations
        verbs:
        - create
        - delete
        - get
        - list
        - update
        - watch
      - apiGroups:
        -
        resources:
        - namespaces
        verbs:
        - get
        - list
        - watch
      - apiGroups:
        - kyverno.io
        resources:
        - clustercleanuppolicies
        - cleanuppolicies
        verbs:
        - list
        - watch
      - apiGroups:
        - kyverno.io
        resources:
        - clustercleanuppolicies/status
        - cleanuppolicies/status
        verbs:
        - update
      - apiGroups:
        -
        resources:
        - configmaps
        verbs:
        - get
        - list
        - watch
      - apiGroups:
        -
        - events.k8s.io
        resources:
        - events
        verbs:
        - create
        - patch
        - update
      - apiGroups:
        - authorization.k8s.io
        resources:
        - subjectaccessreviews
        verbs:
        - create
    + <nil>

► ClusterRole/kyverno:reports-controller drifted

rules
  ± type change from list to <nil>
    - - apiGroups:
        - "*"
        resources:
        - "*"
        verbs:
        - get
        - list
        - watch
      - apiGroups:
        - kyverno.io
        resources:
        - admissionreports
        - clusteradmissionreports
        - backgroundscanreports
        - clusterbackgroundscanreports
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        - wgpolicyk8s.io
        resources:
        - policyreports
        - policyreports/status
        - clusterpolicyreports
        - clusterpolicyreports/status
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        -
        - events.k8s.io
        resources:
        - events
        verbs:
        - create
        - patch
    + <nil>

► ClusterRole/kyverno:background-controller drifted

rules
  ± type change from list to <nil>
    - - apiGroups:
        - kyverno.io
        resources:
        - policies
        - clusterpolicies
        - policyexceptions
        - updaterequests
        - updaterequests/status
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        -
        resources:
        - namespaces
        - configmaps
        verbs:
        - get
        - list
        - watch
      - apiGroups:
        -
        - events.k8s.io
        resources:
        - events
        verbs:
        - create
        - get
        - list
        - patch
        - update
        - watch
      - apiGroups:
        -
        resources:
        - serviceaccounts
        verbs:
        - get
        - list
        - watch
      - apiGroups:
        -
        resources:
        - serviceaccounts
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        -
        resources:
        - namespaces
        - serviceaccounts
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        - rbac.authorization.k8s.io
        resources:
        - rolebindings
        verbs:
        - create
        - delete
        - get
        - list
        - patch
        - update
        - watch
        - deletecollection
      - apiGroups:
        - rbac.authorization.k8s.io
        resourceNames:
        - admin
        - edit
        - view
        resources:
        - clusterroles
        verbs:
        - bind
    + <nil>
⚠️ identified at least one change, exiting with non-zero exit code

@stefanprodan
Copy link
Member

stefanprodan commented Dec 21, 2023

Ok thanks for posting, we'll need to decide how to fix this, we can either skip normalisation for ClusterRoles but it may be that we have broken many other Kubernetes native kinds with optional fields that aren't marked as such with omitempty... Maybe the best way is to rollback the changes made in fluxcd/pkg#658.

@erikgb for now please add the kustomize.toolkit.fluxcd.io/ssa: IfNotPresent annotation, we'll discuss this issue next year.

@erikgb
Copy link
Contributor Author

erikgb commented Dec 21, 2023

Thanks @stefanprodan. I was a bit surprised to see that the rules field in ClusterRole didn't have omitempty in the json serialization markers, but there might be a reason for this?

https://github.com/kubernetes/kubernetes/blob/bfcf8d3966e86e65439069ddb3a027aa54045ef5/staging/src/k8s.io/api/rbac/v1/types.go#L183-L186

@stefanprodan stefanprodan added bug Something isn't working area/server-side-apply SSA related issues and pull requests labels Dec 21, 2023
@stefanprodan
Copy link
Member

Given that the rules field has the // +optional Go comment, I would say this is a bug in Kubernetes, they missed adding omitempty. I have no hope this will ever be fixed as RBAC is v1 GA. So we need to fix it in Flux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/server-side-apply SSA related issues and pull requests bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants