-
-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore irrelevant handlers' states in superseding causes (e.g. deletion-during-creation, update-during-resume, etc) #606
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Dec 9, 2020
5328e69
to
16cbb77
Compare
nolar
added a commit
that referenced
this pull request
Dec 9, 2020
**This is a backported fix from #606 for 0.28.x hotfix.** This is important only for the states restored from a storage with handlers mentioned: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if a deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers, and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a roll back, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case (#601): ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation, and the beginning of the deletion — each with its own numbers. ``` Sleeping was interrupted by new changes, 3.3 seconds left. Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: * Creation/update/resume → deletion (when marked for deletion). * Deletion → creation/update/resume (when unmarked for deletion). Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.
c93b7c0
to
484cfc1
Compare
nolar
added a commit
that referenced
this pull request
Dec 9, 2020
**This is a backported fix from #606 for 0.28.x hotfix.** This is important only for the states restored from a storage with handlers mentioned: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if a deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers, and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a roll back, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case (#601): ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation, and the beginning of the deletion — each with its own numbers. ``` Sleeping was interrupted by new changes, 3.3 seconds left. Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: creation/update/resume → deletion (when marked for deletion). Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself. A transition back from deletion to creation/update/resume is impossible in practice, but would work otherwise.
484cfc1
to
3712675
Compare
This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case: ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers. ``` Creation is in progress: … Sleeping for 4.866814 seconds for the delayed handlers. … Sleeping was interrupted by new changes, 3.3 seconds left. Creation is superseded by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: * creation/update/resume → deletion (when marked for deletion, the pending handlers are abandoned). * resume → update (when changed; the resuming handlers are mixed into the updating process). Other transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself. A transition back from deletion to creation/update/resume is impossible in practice, but would work otherwise.
3712675
to
a416f68
Compare
fixes #601 |
This was referenced Mar 30, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.
This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for
.done
&.delays
— these ones must be calculated only for the handlers currently in scope.For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed.
Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states.
For compatibility, if the purpose is not stored (
None
), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back toNone
by the old versions of Kopf. This is fine.An example case (#601):
Here, on the object deletion, the state will be restored with 4 handlers:
create_fn1
&create_fn2
&delete_fn1
&delete_fn2
. Sincecreate_fn2
was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the.done
from achievingTrue
— while it is irrelevant anymore, and onlydelete_fn1
&delete_fn2
should be considered. However, all four should be purged from annotations when done.Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers.
Note that only some transitions are possible due to external reasons, not all of them:
Other transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.
A transition back from deletion to creation/update/resume is impossible in practice but would work otherwise.