Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MustNotHave mode for OperatorPolicy #222

Conversation

JustinKuli
Copy link
Member

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@JustinKuli JustinKuli marked this pull request as ready for review April 4, 2024 16:53
@openshift-ci openshift-ci bot requested review from dhaiducek and gparvin April 4, 2024 16:53
@JustinKuli JustinKuli requested review from mprahl, JeffeyL and zyjjay April 4, 2024 16:53
Also adjusts CSV reporting to handle when the CSV exists but the
subscription does not, by using a label that OLM adds to these objects.

The authorino operator was chosen because it has more than one CRD, but
not too many to make a concise test.

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
When the subscription has a ConstraintsNotSatisfiable condition, the
exact content of the message is not always consistent. We have added
other things to sort the message better, but it looks like we missed
this test which was still assuming a certain part was first.

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@JustinKuli
Copy link
Member Author

Tests passed twice, I'm finally feeling good about this 😆

//+kubebuilder:default=DeleteIfUnused
//+kubebuilder:validation:Enum=Keep;Delete;DeleteIfUnused
// Specifies whether to delete the OperatorGroup; defaults to 'DeleteIfUnused' which
// will only delete the OperatorGroup if there is not another Subscription using it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "Delete" means delete even though another sub uses this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it means "Delete, regardless of any consequences". Kind of like a mustnothave configuration policy.


//+kubebuilder:default=Delete
//+kubebuilder:validation:Enum=Keep;Delete
// Specifies whether to delete the ClusterServiceVersion; defaults to 'Delete'
Copy link
Contributor

@yiraeChristineKim yiraeChristineKim Apr 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add when this happens? maybe top of RemovalBehavior

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CRD description for removalBehavior, it says "RemovalBehavior defines what resources will be removed by enforced mustnothave policies ..."

Is that what you meant? Or what's another place to put it?

// Kind APIServiceDefinitions
APIServiceDefinitions RemovalAction `json:"apiServiceDefinitions,omitempty"`

// Future?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?!!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😆 the whole feature was a TODO ...

This type / option was something I thought would be good during the design, but I haven't found anyone asking for it, or any examples of it yet, so I want to hold off on it until we can understand it better.

// Kind APIServiceDefinitions
APIServiceDefinitions RemovalAction `json:"apiServiceDefinitions,omitempty"`

// Future?
Copy link
Member

@mprahl mprahl Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just remove this commented out code

@yiraeChristineKim yiraeChristineKim self-requested a review April 8, 2024 15:20
return nil, updateStatus(policy, keptCond("ClusterServiceVersion"), relatedCSVs...), nil
}

for i := range csvList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to have more than one CSV?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, there might be more than one CSV: I think if one fails it might stay around even while another one is created. However, it's not clear to me if/when OLM will remove the label from an old CSV. So I would say it's very likely there would only be one CSV in this list, but it should handle the case where there are more.

@@ -489,6 +560,17 @@ func subResFailedCond(subFailedCond operatorv1alpha1.SubscriptionCondition) meta
return cond
}

// irrelevantCond returns a Compliant condition, with a Reason like '____Irrelevant',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is NotApplicable a little better?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does sound better

@mprahl
Copy link
Member

mprahl commented Apr 8, 2024

@JustinKuli does the order of operations matter? To me, it seems like we should wait for the subscription to be fully deleted before deleting anything else so that we don't conflict with OLM. What do you think?

opPolName = "oppol-authorino"
)
BeforeAll(func() {
utils.Kubectl("delete", "crd", "--selector=olm.managed=true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be more specific?

@JustinKuli
Copy link
Member Author

@JustinKuli does the order of operations matter? To me, it seems like we should wait for the subscription to be fully deleted before deleting anything else so that we don't conflict with OLM. What do you think?

@mprahl that could be nice. I think that even if stuff was deleted in the wrong order, eventually this would get to the correct state, but it could be a lot of churn on the cluster: for example if the CSV was deleted before the Subscription was removed, OLM would probably re-create the CSV. The policy would reconcile several times and (probably) eventually get to a stable status.

This also makes me think that we might want to validate some relationships between the removalBehavior options: if Subscriptions is set to Keep, than it should probably be an error to set ClusterServiceVersions to Delete, since that would cause a loop with OLM.

return metav1.Condition{
Type: condType(kind),
Status: metav1.ConditionTrue,
Reason: kind + "Excluded",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this?

Suggested change
Reason: kind + "Excluded",
Reason: kind + "NotPresent",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this. "Included" is also now "Present" and the deleted and kept conditions are updated so that their messages are shorter and avoid the "checked by the mustnothave" language.

Type: condType(kind),
Status: metav1.ConditionTrue,
Reason: kind + "Excluded",
Message: "the " + kind + " checked by the mustnothave policy was not found",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can get straight to the point. The user should know this is desired based on the compliance type they set.

Suggested change
Message: "the " + kind + " checked by the mustnothave policy was not found",
Message: "the " + kind + " is not present",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Enforced mustnohave policies will delete things based on the new
RemovalBehavior field. In inform mode, resources that would be removed
by an enforced policy will be causes for NonCompliance. This gives
control to users in order to prevent unintended side effects.

Refs:
 - https://issues.redhat.com/browse/ACM-9287

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
Copy link
Member

@mprahl mprahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold for other reviews and assuming that another PR will change deletes to be guarded by if a deletion timestamp is not set.

Copy link

openshift-ci bot commented Apr 8, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JustinKuli, mprahl

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@@ -600,13 +680,33 @@ func buildCSVCond(csv *operatorv1alpha1.ClusterServiceVersion) metav1.Condition
}
}

// noCSVCond is a NonCompliant condition with Reason 'RelevantCSVNotFound'
var noCSVCond = metav1.Condition{
Type: csvConditionType,
Status: metav1.ConditionFalse,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something, but wouldn't this imply that the reason "RelevantCSVNotFound" is false here given the status of the condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditions and reasons can be confusing, and that's not limited to just our use of them. I think the status goes with the type, not the reason.

So I read them like <Type>? <Status>, because <Reason>. Extra info: <Message>

So in this case, "ClusterServiceVersionCompliant? False, because RelevantCSVNotFound. Extra info: A relevant installed ClusterServiceVersion could not be found"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that makes sense.

//+kubebuilder:default=Keep
//+kubebuilder:validation:Enum=Keep;Delete
// Specifies whether to delete any CustomResourceDefinitions associated with the operator;
// defaults to 'Keep' because deleting them should be done deliberately
CRDs RemovalAction `json:"customResourceDefinitions,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these CRDs refer to CRDs created by OLM as a prerequisite for installing the desired operator?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are the CRDs created by OLM, as defined in the PackageManifest/ClusterServiceVersion for the operator.

@JustinKuli
Copy link
Member Author

/unhold

@openshift-merge-bot openshift-merge-bot bot merged commit 1711136 into open-cluster-management-io:main Apr 10, 2024
9 checks passed
JustinKuli added a commit to stolostron/governance-policy-framework that referenced this pull request Apr 10, 2024
This was added to the compliance message in
open-cluster-management-io/config-policy-controller#222

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
openshift-merge-bot bot pushed a commit to stolostron/governance-policy-framework that referenced this pull request Apr 12, 2024
This was added to the compliance message in
open-cluster-management-io/config-policy-controller#222

Signed-off-by: Justin Kulikauskas <jkulikau@redhat.com>
@JustinKuli JustinKuli deleted the 9287-mustnothave branch April 15, 2024 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants