Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(catsrc) set status reason/message on incorrect polling interval #2447

Conversation

anik120
Copy link
Contributor

@anik120 anik120 commented Nov 12, 2021

This PR sets the status reason as InvalidIntervalError and a status message
if updateStrategy.RegistryPoll.ParsingError is set for the catsrc.

Description of the change:

Motivation for the change:

Reviewer Checklist

  • Implementation matches the proposed design, or proposal is updated to match implementation
  • Sufficient unit test coverage
  • Sufficient end-to-end test coverage
  • Docs updated or added to /doc
  • Commit messages sensible and descriptive

@anik120
Copy link
Contributor Author

anik120 commented Nov 12, 2021

/hold for operator-framework/api#169 to merge

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 12, 2021
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch from 8674db3 to 87b6978 Compare November 12, 2021 22:42
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 21, 2021
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch from 87b6978 to 3ee3bff Compare March 7, 2022 16:50
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 7, 2022
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch 3 times, most recently from d5c43b1 to 086b087 Compare March 7, 2022 17:01
if out.Spec.UpdateStrategy.RegistryPoll != nil {
if out.Spec.UpdateStrategy.RegistryPoll.ParsingError != "" {
out.SetError(v1alpha1.CatalogSourceIntervalInvalidError, fmt.Errorf(out.Spec.UpdateStrategy.RegistryPoll.ParsingError))
if _, err := o.client.OperatorsV1alpha1().CatalogSources(out.GetNamespace()).UpdateStatus(context.TODO(), out, metav1.UpdateOptions{}); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this UpdateStatus call required every catalogsource reconciliation loop if the ParsingError is not blank? Maybe we should only do the status update once? Is there any way to check if that status is set before attempting the update call?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check here before going through with the update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe someone can clarify, but I'm not sure that we want to always flip the status.reason to CatalogSourceIntervalInvalidError from another status in the case where a parsing error was found on the catalog. This would cause us to stomp our other catalog status updates and potentially constantly update the status.

I think what we want is to provide some kind of long-lived indication that the catalog did have a invalid interval at some point (when it was created), which was changed to the default, with a message indicating what happened. WDYT about using a condition on the status instead of updating the status.Reason directly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exdx so

Good news: The way this is currently implemented, there's no chance of us stomping over a previously set reason. Because every single time we've called setError before this, it has been followed by return. So only when that error is resolved, do we move on to the next step, and the next, etc, until we reach the final segment where we check for the polling error and setError again if there is an error. Only this time, we don't return.

Bad news: It does make sense to set a condition for this actually now that you bring it up. BUT, when I set out to implement that, I realized that it'd need a change in api again. Here's a rough implementation anik120@e134397
This is definitely much more cleaner, but now that we've released an of/api version with the existing changes, do these changes translate to breaking changes? cc: @benluddy @joelanford

The silver lining is, with the current implementation, when the CatalogSource is edited to fix the error code, the reason is reset, so any future insertion (if any) of setError after these lines will have to worry about stomping over the message this PR is setting, but it seems unlikely we'll add anymore setErrors after these lines 🤷🏽

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying, since we are checking the polling error in this final sequence of events, it may not necessarily be overwritten. But if some other condition earlier in the chain is met, we update the status, and then when it's resolved we will update here. This particular status (invalid polling interval) doesn't feel like a level-based error, if that makes sense. I'm still a little worried about the potential for hotlooping

Do we ever clear the spec.UpdateStrategy.RegistryPoll.ParsingError field? Because it feels like strange behavior to always update the status to point to an invalid catalog error, even if the polling error has been resolved (by defaulting) and the catalog is no longer invalid.

Also, I believe we can simply set

syncError = fmt.Errorf(out.Spec.UpdateStrategy.RegistryPoll.ParsingError)
out.SetError(v1alpha1.CatalogSourceIntervalInvalidError, syncError)
return

in the code and not have to call UpdateStatus() directly, the control loop will update the catalog for us

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@exdx so the spec.UpdateStrategy.RegistryPoll.ParsingError does clear when we unmarshall the incoming object here. I added a test to verify that https://github.com/operator-framework/operator-lifecycle-manager/pull/2447/files#diff-65a3cf2eeb27e70487e6159bd540e1d057e3bc79bea13e4c874a147d6f147469R1195

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing if this test passes that'll take care of the hotlooping concerns too.

@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch from 086b087 to a30a87c Compare March 8, 2022 16:29
@anik120
Copy link
Contributor Author

anik120 commented Mar 8, 2022

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 8, 2022
@openshift-ci openshift-ci bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 16, 2022
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch from a30a87c to 53d8405 Compare April 6, 2022 17:15
@openshift-ci openshift-ci bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 6, 2022
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch from 53d8405 to c300081 Compare April 6, 2022 17:16
test/e2e/catalog_e2e_test.go Outdated Show resolved Hide resolved
@exdx
Copy link
Member

exdx commented Apr 6, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 6, 2022
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch from c300081 to 0df128b Compare April 6, 2022 18:05
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Apr 6, 2022
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch 2 times, most recently from 5702424 to 003982c Compare April 6, 2022 18:48
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch 2 times, most recently from 9a9dfa5 to 07ef80d Compare April 7, 2022 16:41
This PR sets the status reason as InvalidIntervalError and a status message
if updateStrategy.RegistryPoll.ParsingError is set for the catsrc.

Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>
Signed-off-by: Anik Bhattacharjee <anikbhattacharya93@gmail.com>
@anik120 anik120 force-pushed the catsrc-incorrect-polling-interval branch from 07ef80d to 03c2e6b Compare April 7, 2022 16:46
@exdx
Copy link
Member

exdx commented Apr 7, 2022

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 7, 2022
Copy link
Member

@dinhxuanvu dinhxuanvu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link

openshift-ci bot commented Apr 7, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: anik120, dinhxuanvu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 7, 2022
@openshift-merge-robot openshift-merge-robot merged commit 63d8e3b into operator-framework:master Apr 7, 2022
@perdasilva perdasilva mentioned this pull request Apr 8, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants