-
-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore irrelevant handlers in processing state calculations (backported for 0.28.x) #607
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nolar
added a commit
that referenced
this pull request
Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.** This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case (#601): ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers. ``` Creation is in progress: … Sleeping for 4.866814 seconds for the delayed handlers. … Sleeping was interrupted by new changes, 3.3 seconds left. Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: * Creation/update/resume → deletion (when marked for deletion). * Deletion → creation/update/resume (when unmarked for deletion). Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.
ad4b5a5
to
f909be6
Compare
nolar
added a commit
that referenced
this pull request
Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.** This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case (#601): ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers. ``` Creation is in progress: … Sleeping for 4.866814 seconds for the delayed handlers. … Sleeping was interrupted by new changes, 3.3 seconds left. Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: * Creation/update/resume → deletion (when marked for deletion). * Deletion → creation/update/resume (when unmarked for deletion). Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.
nolar
added a commit
that referenced
this pull request
Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.** This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case (#601): ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers. ``` Creation is in progress: … Sleeping for 4.866814 seconds for the delayed handlers. … Sleeping was interrupted by new changes, 3.3 seconds left. Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: * Creation/update/resume → deletion (when marked for deletion). * Deletion → creation/update/resume (when unmarked for deletion). Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.
nolar
added a commit
that referenced
this pull request
Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.** This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case (#601): ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers. ``` Creation is in progress: … Sleeping for 4.866814 seconds for the delayed handlers. … Sleeping was interrupted by new changes, 3.3 seconds left. Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: creation/update/resume → deletion (when marked for deletion). Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself. A transition back from deletion to creation/update/resume is impossible in practice, but would work otherwise.
f909be6
to
c350e9c
Compare
nolar
added a commit
that referenced
this pull request
Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.** This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case (#601): ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers. ``` Creation is in progress: … Sleeping for 4.866814 seconds for the delayed handlers. … Sleeping was interrupted by new changes, 3.3 seconds left. Creation is superseded by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: * creation/update/resume → deletion (when marked for deletion). * resume → update (when changed; the resuming handlers are mixed into the updating process). Other transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself. A transition back from deletion to creation/update/resume is impossible in practice, but would work otherwise.
c350e9c
to
d898c70
Compare
This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope. For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed. Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states. For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine. --- An example case: ```python import kopf @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn1(**_): pass @kopf.on.create('zalando.org', 'v1', 'kopfexamples') def create_fn2(**_): raise kopf.TemporaryError("temporary failure", delay=5) @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn1(**_): pass @kopf.on.delete('zalando.org', 'v1', 'kopfexamples') def delete_fn2(**_): pass ``` ``` kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml ``` Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done. Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers. ``` Creation is in progress: … Sleeping for 4.866814 seconds for the delayed handlers. … Sleeping was interrupted by new changes, 3.3 seconds left. Creation is superseded by deletion: 1 succeeded; 0 failed; 1 left. Deletion is in progress: … … Deletion is processed: 2 succeeded; 0 failed. ``` --- Note that only some transitions are possible due to external reasons, not all of them: * creation/update/resume → deletion (when marked for deletion, the pending handlers are abandoned). * resume → update (when changed; the resuming handlers are mixed into the updating process). Other transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself. A transition back from deletion to creation/update/resume is impossible in practice, but would work otherwise.
d898c70
to
a84617a
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A backport of #606 to 0.28.x maintenance branch — minimally sufficient to fix the issue #601, with some extra improvements removed (e.g. logging) to avoid changing the existing behaviour in a hotfix release.