Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ACM-11453 Fix flaky subscription constraints not satisfiable condition #258

Conversation

JustinKuli
Copy link
Member

@JustinKuli JustinKuli commented May 29, 2024

See each commit for details

@JustinKuli
Copy link
Member Author

Weird flakes? 😭

Attempt 1

[FAIL] Test an objectDefinition with an invalid field [It] Fails when an invalid field is provided
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case23_invalid_field_test.go:76

• [FAILED] [9.295 seconds]
Test an objectDefinition with an invalid field [It] Fails when an invalid field is provided
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case23_invalid_field_test.go:32

  Timeline >>
  STEP: Creating the case23-invalid-field policy @ 06/04/24 18:48:09.458
  STEP: Verifying that the case23-invalid-field policy is noncompliant @ 06/04/24 18:48:09.601
  STEP: Verifying events do not continue to be created after the first violation for created objects @ 06/04/24 18:48:18.657
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case23_invalid_field_test.go:76 @ 06/04/24 18:48:18.716
  << Timeline

  [FAILED] Failed after 0.059s.
  Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case23_invalid_field_test.go:76 @ 06/04/24 18:48:18.716
Attempt 2

[FAIL] Testing OperatorPolicy Testing general OperatorPolicy mustnothave behavior [It] Should be NonCompliant and report resources when the operator is installed [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:1836

• [FAILED] [65.322 seconds]
Testing OperatorPolicy Testing general OperatorPolicy mustnothave behavior [It] Should be NonCompliant and report resources when the operator is installed [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:1824

  Timeline >>
  STEP: Waiting for a CRD to appear, which should indicate the operator is installing @ 06/04/24 20:11:28.273
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:1836 @ 06/04/24 20:12:28.274
  << Timeline

  [FAILED] Timed out after 60.001s.
  Expected
      <*unstructured.Unstructured | 0x0>: nil
  not to be nil
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:1836 @ 06/04/24 20:12:28.274
Attempt 3

[FAIL] Testing OperatorPolicy Testing templates in an OperatorPolicy [It] Should update the subscription after the configmap is updated [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3232

• [FAILED] [70.412 seconds]
Testing OperatorPolicy Testing templates in an OperatorPolicy [It] Should update the subscription after the configmap is updated [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3228

  Timeline >>
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3232 @ 06/04/24 21:57:05.453
  Debug info for failure.
  policy JSON: {
    "apiVersion": "policy.open-cluster-management.io/v1beta1",
    "kind": "OperatorPolicy",
    "metadata": {
      "annotations": {
        "policy.open-cluster-management.io/parent-policy-compliance-db-id": "124",
        "policy.open-cluster-management.io/policy-compliance-db-id": "64"
      },
      "creationTimestamp": "2024-06-04T21:55:57Z",
      "generation": 2,
      "name": "oppol-with-templates",
      "namespace": "managed",
      "ownerReferences": [
        {
          "apiVersion": "policy.open-cluster-management.io/v1",
          "kind": "Policy",
          "name": "parent-policy",
          "uid": "66fd926d-9e37-4cb1-8ace-6b82cb9e837d"
        }
      ],
      "resourceVersion": "11081",
      "uid": "de5f787c-80b3-4136-ab00-db8e40f00e77"
    },
    "spec": {
      "complianceType": "musthave",
      "operatorGroup": {
        "name": "scoped-operator-group",
        "namespace": "operator-policy-testns",
        "targetNamespaces": "{{ (fromConfigMap \"operator-policy-testns\" \"op-config\" \"namespaces\") | toLiteral }}"
      },
      "remediationAction": "enforce",
      "removalBehavior": {
        "clusterServiceVersions": "Delete",
        "customResourceDefinitions": "Keep",
        "operatorGroups": "DeleteIfUnused",
        "subscriptions": "Delete"
      },
      "severity": "medium",
      "subscription": {
        "channel": "{{ (lookup \"v1\" \"ConfigMap\" \"operator-policy-testns\" \"op-config\").data.channel }}",
        "name": "project-quay",
        "namespace": "operator-policy-testns",
        "source": "operatorhubio-catalog",
        "sourceNamespace": "olm",
        "startingCSV": "quay-operator.v3.10.0"
      },
      "upgradeApproval": "Automatic"
    },
    "status": {
      "compliant": "NonCompliant",
      "conditions": [
        {
          "lastTransitionTime": "2024-06-04T21:55:57Z",
          "message": "CatalogSource was found",
          "reason": "CatalogSourcesFound",
          "status": "False",
          "type": "CatalogSourcesUnhealthy"
        },
        {
          "lastTransitionTime": "2024-06-04T21:55:57Z",
          "message": "the ClusterServiceVersion required by the policy was not found",
          "reason": "ClusterServiceVersionMissing",
          "status": "False",
          "type": "ClusterServiceVersionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-04T21:56:05Z",
          "message": "NonCompliant; the policy spec is valid, the OperatorGroup matches what is required by the policy, constraints not satisfiable: refer to the Subscription for more details, there are no relevant InstallPlans in the namespace, the ClusterServiceVersion required by the policy was not found, no CRDs were found for the operator, there are no relevant deployments because the ClusterServiceVersion is missing, CatalogSource was found",
          "reason": "NonCompliant",
          "status": "False",
          "type": "Compliant"
        },
        {
          "lastTransitionTime": "2024-06-04T21:55:57Z",
          "message": "no CRDs were found for the operator",
          "reason": "RelevantCRDNotFound",
          "status": "True",
          "type": "CustomResourceDefinitionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-04T21:55:57Z",
          "message": "there are no relevant deployments because the ClusterServiceVersion is missing",
          "reason": "NoRelevantDeployments",
          "status": "True",
          "type": "DeploymentCompliant"
        },
        {
          "lastTransitionTime": "2024-06-04T21:55:57Z",
          "message": "there are no relevant InstallPlans in the namespace",
          "reason": "NoInstallPlansFound",
          "status": "True",
          "type": "InstallPlanCompliant"
        },
        {
          "lastTransitionTime": "2024-06-04T21:55:57Z",
          "message": "the OperatorGroup matches what is required by the policy",
          "reason": "OperatorGroupMatches",
          "status": "True",
          "type": "OperatorGroupCompliant"
        },
        {
          "lastTransitionTime": "2024-06-04T21:56:05Z",
          "message": "constraints not satisfiable: refer to the Subscription for more details",
          "reason": "ConstraintsNotSatisfiable",
          "status": "False",
          "type": "SubscriptionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-04T21:55:57Z",
          "message": "the policy spec is valid",
          "reason": "PolicyValidated",
          "status": "True",
          "type": "ValidPolicySpec"
        }
      ],
      "relatedObjects": [
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "CatalogSource",
            "metadata": {
              "name": "operatorhubio-catalog",
              "namespace": "olm"
            }
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "NonCompliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "ClusterServiceVersion",
            "metadata": {
              "name": "project-quay",
              "namespace": "operator-policy-testns"
            }
          },
          "reason": "Resource not found but should exist"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "apiextensions.k8s.io/v1",
            "kind": "CustomResourceDefinition",
            "metadata": {
              "name": "-"
            }
          },
          "reason": "No relevant CustomResourceDefinitions found"
        },
        {
          "compliant": "UnknownCompliancy",
          "object": {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "metadata": {
              "name": "-"
            }
          },
          "reason": "No relevant deployments found"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "InstallPlan",
            "metadata": {
              "name": "-",
              "namespace": "operator-policy-testns"
            }
          },
          "reason": "There are no relevant InstallPlans in this namespace"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1",
            "kind": "OperatorGroup",
            "metadata": {
              "name": "scoped-operator-group",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "createdByPolicy": true,
            "uid": "d12a9231-872c-4f75-8012-25f2aab1ac71"
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "NonCompliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "Subscription",
            "metadata": {
              "name": "project-quay",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "createdByPolicy": true,
            "uid": "70f0da5a-03b7-4ccc-a60c-4a4a9be46f0e"
          },
          "reason": "ConstraintsNotSatisfiable"
        }
      ],
      "resolvedSubscriptionLabel": "project-quay.operator-policy-testns",
      "subscriptionInterventionTime": "2024-06-04T21:56:35Z"
    }
  }
  wanted related objects: [{Object:{Kind:Subscription APIVersion:operators.coreos.com/v1alpha1 Metadata:{Name:project-quay Namespace:operator-policy-testns}} Compliant:Compliant Reason:Resource found as expected Properties:<nil>}]
  wanted condition: {Type:SubscriptionCompliant Status:True ObservedGeneration:0 LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:SubscriptionMatches Message:the Subscription matches what is required by the policy}

  << Timeline

  [FAILED] Timed out after 60.001s.
  The function passed to Eventually failed at /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:135 with:
  Expected
      <string>: NonCompliant
  to equal
      <string>: Compliant
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3232 @ 06/04/24 21:57:05.453

@JustinKuli
Copy link
Member Author

@mprahl this now has the change you requested in Slack, to only update a smaller part of the Subscription status.

@@ -216,14 +219,14 @@ func (r *OperatorPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Reque
errs = append(errs, err)
}

if err := r.Status().Update(ctx, policy); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we avoid always updating the status if nothing changed? This could increase API server hits a lot. Maybe a reflect.DeepEqual comparing before and after would suffice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the DeepEqual, I'm surprised controller-runtime (or whatever) wouldn't do that for us...

For extra context, the cause of this was the intervention timestamp not being in a condition. I felt like we were dealing with confusing cases where the Subscription status wasn't always updated correctly by OLM, I wanted to be more paranoid about making sure our status was accurate.

return mergedSub, nil, changed, nil
}

csvIdx := 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this always be 0 because of if len(relatedCSVs) != 1 {?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RelatedObjsOfKind gives back a map 😅 I thought it would be a good idea because then it's easy to update the condition in-place... I mostly stand by that, but it does make this usage look odd.

Usually index 0 is the CatalogSource.


updateStatus(policy, updatedCond("Subscription"), updatedObj(mergedSub))

return mergedSub, nil, true, nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should policy.Status.SubscriptionInterventionTime be set to empty after the update is successful and before returning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally I had this just apply the intervention once, then wait another 30 seconds again before trying again. What I found in testing was that about 5% of the time, OLM would then add a different condition, remove our update, and I think confuse itself... I still don't know what exactly was happening there, but the new implementation will now immediately re-apply the intervention in that case, as long as 10 seconds haven't passed. And it requires not clearing the timestamp until those 10 seconds are up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The weirdest thing looking back at it is that after it got into that weird situation of confusing itself, the controller would re-apply the intervention 30 seconds later, but OLM seemed to do the same weird thing consistently this time (not a 5% chance like I'd expect). It wasn't just a bad sync of timing, because I tried a different grace period and it would still happen.

mprahl
mprahl previously approved these changes Jun 5, 2024
@JustinKuli
Copy link
Member Author

After the changes suggested in the review, the failure was a repeat of one of the previous "flakes":

[FAIL] Testing OperatorPolicy Testing templates in an OperatorPolicy [It] Should update the subscription after the configmap is updated [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3232

• [FAILED] [70.415 seconds]
Testing OperatorPolicy Testing templates in an OperatorPolicy [It] Should update the subscription after the configmap is updated [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3228

  Timeline >>
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:3232 @ 06/05/24 17:39:10.763
  Debug info for failure.
  policy JSON: {
    "apiVersion": "policy.open-cluster-management.io/v1beta1",
    "kind": "OperatorPolicy",
    "metadata": {
      "annotations": {
        "policy.open-cluster-management.io/parent-policy-compliance-db-id": "124",
        "policy.open-cluster-management.io/policy-compliance-db-id": "64"
      },
      "creationTimestamp": "2024-06-05T17:38:02Z",
      "generation": 2,
      "name": "oppol-with-templates",
      "namespace": "managed",
      "ownerReferences": [
        {
          "apiVersion": "policy.open-cluster-management.io/v1",
          "kind": "Policy",
          "name": "parent-policy",
          "uid": "a8c66038-7e2a-49d0-9026-f121c68993bc"
        }
      ],
      "resourceVersion": "11428",
      "uid": "cc160742-28a2-435a-b0d8-d70541653621"
    },
    "spec": {
      "complianceType": "musthave",
      "operatorGroup": {
        "name": "scoped-operator-group",
        "namespace": "operator-policy-testns",
        "targetNamespaces": "{{ (fromConfigMap \"operator-policy-testns\" \"op-config\" \"namespaces\") | toLiteral }}"
      },
      "remediationAction": "enforce",
      "removalBehavior": {
        "clusterServiceVersions": "Delete",
        "customResourceDefinitions": "Keep",
        "operatorGroups": "DeleteIfUnused",
        "subscriptions": "Delete"
      },
      "severity": "medium",
      "subscription": {
        "channel": "{{ (lookup \"v1\" \"ConfigMap\" \"operator-policy-testns\" \"op-config\").data.channel }}",
        "name": "project-quay",
        "namespace": "operator-policy-testns",
        "source": "operatorhubio-catalog",
        "sourceNamespace": "olm",
        "startingCSV": "quay-operator.v3.10.0"
      },
      "upgradeApproval": "Automatic"
    },
    "status": {
      "compliant": "NonCompliant",
      "conditions": [
        {
          "lastTransitionTime": "2024-06-05T17:38:02Z",
          "message": "CatalogSource was found",
          "reason": "CatalogSourcesFound",
          "status": "False",
          "type": "CatalogSourcesUnhealthy"
        },
        {
          "lastTransitionTime": "2024-06-05T17:38:02Z",
          "message": "the ClusterServiceVersion required by the policy was not found",
          "reason": "ClusterServiceVersionMissing",
          "status": "False",
          "type": "ClusterServiceVersionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-05T17:38:10Z",
          "message": "NonCompliant; the policy spec is valid, the OperatorGroup matches what is required by the policy, constraints not satisfiable: refer to the Subscription for more details, there are no relevant InstallPlans in the namespace, the ClusterServiceVersion required by the policy was not found, no CRDs were found for the operator, there are no relevant deployments because the ClusterServiceVersion is missing, CatalogSource was found",
          "reason": "NonCompliant",
          "status": "False",
          "type": "Compliant"
        },
        {
          "lastTransitionTime": "2024-06-05T17:38:02Z",
          "message": "no CRDs were found for the operator",
          "reason": "RelevantCRDNotFound",
          "status": "True",
          "type": "CustomResourceDefinitionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-05T17:38:02Z",
          "message": "there are no relevant deployments because the ClusterServiceVersion is missing",
          "reason": "NoRelevantDeployments",
          "status": "True",
          "type": "DeploymentCompliant"
        },
        {
          "lastTransitionTime": "2024-06-05T17:38:02Z",
          "message": "there are no relevant InstallPlans in the namespace",
          "reason": "NoInstallPlansFound",
          "status": "True",
          "type": "InstallPlanCompliant"
        },
        {
          "lastTransitionTime": "2024-06-05T17:38:02Z",
          "message": "the OperatorGroup matches what is required by the policy",
          "reason": "OperatorGroupMatches",
          "status": "True",
          "type": "OperatorGroupCompliant"
        },
        {
          "lastTransitionTime": "2024-06-05T17:38:10Z",
          "message": "constraints not satisfiable: refer to the Subscription for more details",
          "reason": "ConstraintsNotSatisfiable",
          "status": "False",
          "type": "SubscriptionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-05T17:38:02Z",
          "message": "the policy spec is valid",
          "reason": "PolicyValidated",
          "status": "True",
          "type": "ValidPolicySpec"
        }
      ],
      "relatedObjects": [
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "CatalogSource",
            "metadata": {
              "name": "operatorhubio-catalog",
              "namespace": "olm"
            }
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "NonCompliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "ClusterServiceVersion",
            "metadata": {
              "name": "project-quay",
              "namespace": "operator-policy-testns"
            }
          },
          "reason": "Resource not found but should exist"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "apiextensions.k8s.io/v1",
            "kind": "CustomResourceDefinition",
            "metadata": {
              "name": "-"
            }
          },
          "reason": "No relevant CustomResourceDefinitions found"
        },
        {
          "compliant": "UnknownCompliancy",
          "object": {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "metadata": {
              "name": "-"
            }
          },
          "reason": "No relevant deployments found"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "InstallPlan",
            "metadata": {
              "name": "-",
              "namespace": "operator-policy-testns"
            }
          },
          "reason": "There are no relevant InstallPlans in this namespace"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1",
            "kind": "OperatorGroup",
            "metadata": {
              "name": "scoped-operator-group",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "createdByPolicy": true,
            "uid": "4cc6f8a8-06e2-424e-8243-c9be60af3f24"
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "NonCompliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "Subscription",
            "metadata": {
              "name": "project-quay",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "createdByPolicy": true,
            "uid": "3818180e-6dd1-4d37-ad03-4def2880f588"
          },
          "reason": "ConstraintsNotSatisfiable"
        }
      ],
      "resolvedSubscriptionLabel": "project-quay.operator-policy-testns",
      "subscriptionInterventionTime": "2024-06-05T17:38:40Z"
    }
  }
  wanted related objects: [{Object:{Kind:Subscription APIVersion:operators.coreos.com/v1alpha1 Metadata:{Name:project-quay Namespace:operator-policy-testns}} Compliant:Compliant Reason:Resource found as expected Properties:<nil>}]
  wanted condition: {Type:SubscriptionCompliant Status:True ObservedGeneration:0 LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:SubscriptionMatches Message:the Subscription matches what is required by the policy}

It seems like when the test updates the subscription, it sometimes leads to a ConstraintsNotSatisfiable condition. The fix for that situation (from this PR) did not work: it looks like OLM removes the label from the CSV, so it is no longer included in the operator policy's RelatedObjects.

The new change may be more reliable. I'll re-test it a few times.

@JustinKuli
Copy link
Member Author

First 2 runs passed completely. The third run had this during hosted mode tests, which is odd:

[FAIL] Testing OperatorPolicy Test CRD deletion delayed because of a finalizer [It] Initially behaves correctly as musthave [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:2663

• [FAILED] [25.972 seconds]
Testing OperatorPolicy Test CRD deletion delayed because of a finalizer [It] Initially behaves correctly as musthave [supports-hosted]
/home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:2647

  Timeline >>
  STEP: Creating the parent object @ 06/06/24 03:03:56.054
  STEP: Creating the child object with the owner reference @ 06/06/24 03:03:56.17
  STEP: Verifying the child object exists @ 06/06/24 03:03:56.175
  STEP: Waiting for a CRD to appear, which should indicate the operator is installing @ 06/06/24 03:03:56.244
  [FAILED] in [It] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:2663 @ 06/06/24 03:04:05.33
  [FAILED] in [AfterAll] - /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/e2e_suite_test.go:198 @ 06/06/24 03:04:05.395
  Debug info for failure.
  policy JSON: {
    "apiVersion": "policy.open-cluster-management.io/v1beta1",
    "kind": "OperatorPolicy",
    "metadata": {
      "annotations": {
        "policy.open-cluster-management.io/parent-policy-compliance-db-id": "124",
        "policy.open-cluster-management.io/policy-compliance-db-id": "64"
      },
      "creationTimestamp": "2024-06-06T03:03:56Z",
      "generation": 2,
      "name": "oppol-mustnothave",
      "namespace": "managed",
      "ownerReferences": [
        {
          "apiVersion": "policy.open-cluster-management.io/v1",
          "kind": "Policy",
          "name": "parent-policy",
          "uid": "bc171e75-8e02-424e-bef2-9a81c321fb1a"
        }
      ],
      "resourceVersion": "14945",
      "uid": "26356467-e6e0-423e-b8ef-43caf496a78a"
    },
    "spec": {
      "complianceConfig": {
        "catalogSourceUnhealthy": "Compliant",
        "deploymentsUnavailable": "NonCompliant",
        "upgradesAvailable": "Compliant"
      },
      "complianceType": "musthave",
      "remediationAction": "enforce",
      "removalBehavior": {
        "clusterServiceVersions": "Delete",
        "customResourceDefinitions": "Delete",
        "operatorGroups": "DeleteIfUnused",
        "subscriptions": "Delete"
      },
      "severity": "medium",
      "subscription": {
        "channel": "stable-3.10",
        "name": "project-quay",
        "namespace": "operator-policy-testns",
        "source": "operatorhubio-catalog",
        "sourceNamespace": "olm"
      },
      "upgradeApproval": "Automatic"
    },
    "status": {
      "compliant": "Compliant",
      "conditions": [
        {
          "lastTransitionTime": "2024-06-06T03:03:56Z",
          "message": "CatalogSource was found",
          "reason": "CatalogSourcesFound",
          "status": "False",
          "type": "CatalogSourcesUnhealthy"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:03Z",
          "message": "ClusterServiceVersion (quay-operator.v3.10.5) - install strategy completed with no errors",
          "reason": "InstallSucceeded",
          "status": "True",
          "type": "ClusterServiceVersionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:04Z",
          "message": "Compliant; the policy spec is valid, the OperatorGroup matches what is required by the policy, the Subscription matches what is required by the policy, no InstallPlans requiring approval were found, ClusterServiceVersion (quay-operator.v3.10.5) - install strategy completed with no errors, no CRDs were found for the operator, all operator Deployments have their minimum availability, CatalogSource was found",
          "reason": "Compliant",
          "status": "True",
          "type": "Compliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:04Z",
          "message": "no CRDs were found for the operator",
          "reason": "RelevantCRDNotFound",
          "status": "True",
          "type": "CustomResourceDefinitionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:03Z",
          "message": "all operator Deployments have their minimum availability",
          "reason": "DeploymentsAvailable",
          "status": "True",
          "type": "DeploymentCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:03Z",
          "message": "no InstallPlans requiring approval were found",
          "reason": "NoInstallPlansRequiringApproval",
          "status": "True",
          "type": "InstallPlanCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:03:56Z",
          "message": "the OperatorGroup matches what is required by the policy",
          "reason": "OperatorGroupMatches",
          "status": "True",
          "type": "OperatorGroupCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:03:56Z",
          "message": "the Subscription matches what is required by the policy",
          "reason": "SubscriptionMatches",
          "status": "True",
          "type": "SubscriptionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:03:56Z",
          "message": "the policy spec is valid",
          "reason": "PolicyValidated",
          "status": "True",
          "type": "ValidPolicySpec"
        }
      ],
      "relatedObjects": [
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "CatalogSource",
            "metadata": {
              "name": "operatorhubio-catalog",
              "namespace": "olm"
            }
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "ClusterServiceVersion",
            "metadata": {
              "name": "quay-operator.v3.10.5",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "uid": "a0a06fd6-d6ea-434a-835f-82e2d3e8b8b1"
          },
          "reason": "InstallSucceeded"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "apiextensions.k8s.io/v1",
            "kind": "CustomResourceDefinition",
            "metadata": {
              "name": "-"
            }
          },
          "reason": "No relevant CustomResourceDefinitions found"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "metadata": {
              "name": "quay-operator-tng",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "uid": "ac6adbef-cc5e-4f17-8746-168b0785c89b"
          },
          "reason": "Deployment Available"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "InstallPlan",
            "metadata": {
              "name": "install-nhv48",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "uid": "da65b36e-de75-47b6-8e0c-fe39d6f3d9bc"
          },
          "reason": "The InstallPlan is Complete"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1",
            "kind": "OperatorGroup",
            "metadata": {
              "name": "operator-policy-testns-gk9dl",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "createdByPolicy": true,
            "uid": "4870c535-3442-4cd6-836c-14403c7557aa"
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "Subscription",
            "metadata": {
              "name": "project-quay",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "createdByPolicy": true,
            "uid": "88f9886d-6b59-421c-bc74-c9d412921fef"
          },
          "reason": "Resource found as expected"
        }
      ],
      "resolvedSubscriptionLabel": "project-quay.operator-policy-testns"
    }
  }
  wanted related objects: [{Object:{Kind:CustomResourceDefinition APIVersion:apiextensions.k8s.io/v1 Metadata:{Name:quayregistries.quay.redhat.com Namespace:}} Compliant:Compliant Reason:Resource found as expected Properties:<nil>}]
  wanted condition: {Type:CustomResourceDefinitionCompliant Status:True ObservedGeneration:0 LastTransitionTime:0001-01-01 00:00:00 +0000 UTC Reason:RelevantCRDFound Message:there are CRDs present for the operator}

  Debug info for failure.
  policy JSON: {
    "apiVersion": "policy.open-cluster-management.io/v1beta1",
    "kind": "OperatorPolicy",
    "metadata": {
      "annotations": {
        "policy.open-cluster-management.io/parent-policy-compliance-db-id": "124",
        "policy.open-cluster-management.io/policy-compliance-db-id": "64"
      },
      "creationTimestamp": "[2024](https://github.com/open-cluster-management-io/config-policy-controller/actions/runs/9391860634/job/25871282234?pr=258#step:12:2025)-06-06T03:03:56Z",
      "generation": 2,
      "name": "oppol-mustnothave",
      "namespace": "managed",
      "ownerReferences": [
        {
          "apiVersion": "policy.open-cluster-management.io/v1",
          "kind": "Policy",
          "name": "parent-policy",
          "uid": "bc171e75-8e02-424e-bef2-9a81c321fb1a"
        }
      ],
      "resourceVersion": "14942",
      "uid": "26356467-e6e0-423e-b8ef-43caf496a78a"
    },
    "spec": {
      "complianceConfig": {
        "catalogSourceUnhealthy": "Compliant",
        "deploymentsUnavailable": "NonCompliant",
        "upgradesAvailable": "Compliant"
      },
      "complianceType": "musthave",
      "remediationAction": "enforce",
      "removalBehavior": {
        "clusterServiceVersions": "Delete",
        "customResourceDefinitions": "Delete",
        "operatorGroups": "DeleteIfUnused",
        "subscriptions": "Delete"
      },
      "severity": "medium",
      "subscription": {
        "channel": "stable-3.10",
        "name": "project-quay",
        "namespace": "operator-policy-testns",
        "source": "operatorhubio-catalog",
        "sourceNamespace": "olm"
      },
      "upgradeApproval": "Automatic"
    },
    "status": {
      "compliant": "Compliant",
      "conditions": [
        {
          "lastTransitionTime": "2024-06-06T03:03:56Z",
          "message": "CatalogSource was found",
          "reason": "CatalogSourcesFound",
          "status": "False",
          "type": "CatalogSourcesUnhealthy"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:03Z",
          "message": "ClusterServiceVersion (quay-operator.v3.10.5) - install strategy completed with no errors",
          "reason": "InstallSucceeded",
          "status": "True",
          "type": "ClusterServiceVersionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:03Z",
          "message": "Compliant; the policy spec is valid, the OperatorGroup matches what is required by the policy, the Subscription matches what is required by the policy, no InstallPlans requiring approval were found, ClusterServiceVersion (quay-operator.v3.10.5) - install strategy completed with no errors, there are CRDs present for the operator, all operator Deployments have their minimum availability, CatalogSource was found",
          "reason": "Compliant",
          "status": "True",
          "type": "Compliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:02Z",
          "message": "there are CRDs present for the operator",
          "reason": "RelevantCRDFound",
          "status": "True",
          "type": "CustomResourceDefinitionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:03Z",
          "message": "all operator Deployments have their minimum availability",
          "reason": "DeploymentsAvailable",
          "status": "True",
          "type": "DeploymentCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:04:03Z",
          "message": "no InstallPlans requiring approval were found",
          "reason": "NoInstallPlansRequiringApproval",
          "status": "True",
          "type": "InstallPlanCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:03:56Z",
          "message": "the OperatorGroup matches what is required by the policy",
          "reason": "OperatorGroupMatches",
          "status": "True",
          "type": "OperatorGroupCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:03:56Z",
          "message": "the Subscription matches what is required by the policy",
          "reason": "SubscriptionMatches",
          "status": "True",
          "type": "SubscriptionCompliant"
        },
        {
          "lastTransitionTime": "2024-06-06T03:03:56Z",
          "message": "the policy spec is valid",
          "reason": "PolicyValidated",
          "status": "True",
          "type": "ValidPolicySpec"
        }
      ],
      "relatedObjects": [
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "CatalogSource",
            "metadata": {
              "name": "operatorhubio-catalog",
              "namespace": "olm"
            }
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "ClusterServiceVersion",
            "metadata": {
              "name": "quay-operator.v3.10.5",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "uid": "a0a06fd6-d6ea-434a-835f-82e2d3e8b8b1"
          },
          "reason": "InstallSucceeded"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "apiextensions.k8s.io/v1",
            "kind": "CustomResourceDefinition",
            "metadata": {
              "name": "quayregistries.quay.redhat.com"
            }
          },
          "properties": {
            "uid": "57380d9f-48b4-487d-8e4a-f3cf9b9e75d5"
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "metadata": {
              "name": "quay-operator-tng",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "uid": "ac6adbef-cc5e-4f17-8746-168b0785c89b"
          },
          "reason": "Deployment Available"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "InstallPlan",
            "metadata": {
              "name": "install-nhv48",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "uid": "da65b36e-de75-47b6-8e0c-fe39d6f3d9bc"
          },
          "reason": "The InstallPlan is Complete"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1",
            "kind": "OperatorGroup",
            "metadata": {
              "name": "operator-policy-testns-gk9dl",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "createdByPolicy": true,
            "uid": "4870c535-3442-4cd6-836c-14403c7557aa"
          },
          "reason": "Resource found as expected"
        },
        {
          "compliant": "Compliant",
          "object": {
            "apiVersion": "operators.coreos.com/v1alpha1",
            "kind": "Subscription",
            "metadata": {
              "name": "project-quay",
              "namespace": "operator-policy-testns"
            }
          },
          "properties": {
            "createdByPolicy": true,
            "uid": "88f9886d-6b59-421c-bc74-c9d412921fef"
          },
          "reason": "Resource found as expected"
        }
      ],
      "resolvedSubscriptionLabel": "project-quay.operator-policy-testns"
    }
  }
  << Timeline

  [FAILED] Failed after 1.040s.
  The function passed to Consistently failed at /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:140 with:
  Expected
      <bool>: false
  to be true
  In [It] at: /home/runner/work/config-policy-controller/config-policy-controller/test/e2e/case38_install_operator_test.go:2663 @ 06/06/24 03:04:05.33

@@ -216,14 +221,16 @@ func (r *OperatorPolicyReconciler) Reconcile(ctx context.Context, req ctrl.Reque
errs = append(errs, err)
}

if !reflect.DeepEqual(policy.Status, originalStatus) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional optimization:

Suggested change
if !reflect.DeepEqual(policy.Status, originalStatus) {
if conditionChanged || !reflect.DeepEqual(policy.Status, originalStatus) {

mprahl
mprahl previously approved these changes Jun 6, 2024
@openshift-ci openshift-ci bot added the lgtm label Jun 6, 2024
@mprahl
Copy link
Member

mprahl commented Jun 6, 2024

@JustinKuli looks like you need to rebase. I'll approve it again afterwards.

Previously, the selector used to find OLM resources like InstallPlans
and CSVs was not exactly what OLM would use in situations where the name
and namespace of the operator was very long. It should now match the
behavior of OLM.

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
Some places were not using the logger provided in the Reconcile context,
which has some potentially helpful information. Logs were added to
record API calls like create/update/delete, and some debug logs were
added for potentially complex methods.

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
Previously, the controller needed to parse and sort the message
associated with the ConstraintsNotSatisfiable condition on the
Subscription, because the order of certain parts ot the message were
inconsistent. The intent was to not update the OperatorPolicy condition
when the actual situation on the cluster was unchanged.

In some situations, particularly when the subscription is deleted and
quickly recreated, the condition on the Subscription does not just have
parts in an inconsistent order: it can alternate between two different
clauses. To prevent constant updates to the OperatorPolicy status, it
now just reports a generic "conditions not satisfiable" condition, and
directs the user to the Subscription for more information. This also
reduces noise in the compliance events be removing the lists of operator
versions.

Refs:
 - https://issues.redhat.com/browse/ACM-11453

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
In some situations when actions are happening quickly, OLM can lose the
connection between the Subscription and the ClusterServiceVersion, which
leads to a "constraints not satisfiable" condition on the Subscription.
It can be hard to reproduce the exact situation we've seen rarely in our
tests, but manually deleting and immediately recreating the subscription
causes a similar situation.

In this change, in those situations, the controller will intervene after
30 seconds by updating the Subscription status directly. The
implementation allows for a 10 second window for the intervention,
during which the controller may update the status multiple times, to
address a case where OLM immediately overwrote the update. If the window
is missed, the controller may schedule another time. This is intended to
give time to OLM to potentially resolve the situation on its own.

Refs:
 - https://issues.redhat.com/browse/ACM-11453

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@openshift-ci openshift-ci bot added the lgtm label Jun 6, 2024
Copy link

openshift-ci bot commented Jun 6, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JustinKuli, mprahl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 4b0fd44 into open-cluster-management-io:main Jun 6, 2024
9 checks passed
JustinKuli added a commit to JustinKuli/governance-policy-addon-controller-1 that referenced this pull request Jun 7, 2024
Syncs changes from open-cluster-management-io/config-policy-controller#258

Refs:
 - https://issues.redhat.com/browse/ACM-11453

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
openshift-merge-bot bot pushed a commit to open-cluster-management-io/governance-policy-addon-controller that referenced this pull request Jun 7, 2024
Syncs changes from open-cluster-management-io/config-policy-controller#258

Refs:
 - https://issues.redhat.com/browse/ACM-11453

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
magic-mirror-bot bot pushed a commit to stolostron/governance-policy-addon-controller that referenced this pull request Jun 7, 2024
Syncs changes from open-cluster-management-io/config-policy-controller#258

Refs:
 - https://issues.redhat.com/browse/ACM-11453

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
(cherry picked from commit d89fa6dad5f88886ef1b602980fd27ecb6a71170)
magic-mirror-bot bot pushed a commit to stolostron/governance-policy-addon-controller that referenced this pull request Jun 7, 2024
Syncs changes from open-cluster-management-io/config-policy-controller#258

Refs:
 - https://issues.redhat.com/browse/ACM-11453

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
(cherry picked from commit d89fa6dad5f88886ef1b602980fd27ecb6a71170)
@JustinKuli JustinKuli deleted the 11453-help-sub-find-csv branch July 25, 2024 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants