Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore irrelevant handlers in processing state calculations (backported for 0.28.x) #607

Merged
merged 1 commit into from
Dec 10, 2020

Conversation

nolar
Copy link
Owner

@nolar nolar commented Dec 9, 2020

A backport of #606 to 0.28.x maintenance branch — minimally sufficient to fix the issue #601, with some extra improvements removed (e.g. logging) to avoid changing the existing behaviour in a hotfix release.

@nolar nolar added the bug Something isn't working label Dec 9, 2020
@nolar nolar changed the title Classify handler- and processing states by purpose Ignore irrelevant handlers in processing state calculations (backported for 0.28.x) Dec 9, 2020
nolar added a commit that referenced this pull request Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.**

This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope.

For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed.

Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states.

For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine.

---

An example case (#601):

```python
import kopf

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn1(**_):
    pass

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn2(**_):
    raise kopf.TemporaryError("temporary failure", delay=5)

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn1(**_):
    pass

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn2(**_):
    pass
```

```
kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml
```

Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done.

Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers.

```
Creation is in progress: …
Sleeping for 4.866814 seconds for the delayed handlers.
…
Sleeping was interrupted by new changes, 3.3 seconds left.
Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left.
Deletion is in progress: …
…
Deletion is processed: 2 succeeded; 0 failed.
```

---

Note that only some transitions are possible due to external reasons, not all of them:

* Creation/update/resume → deletion (when marked for deletion).
* Deletion → creation/update/resume (when unmarked for deletion).

Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.
@nolar nolar force-pushed the irrelevant-handlers-backported branch from ad4b5a5 to f909be6 Compare December 9, 2020 17:45
nolar added a commit that referenced this pull request Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.**

This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope.

For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed.

Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states.

For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine.

---

An example case (#601):

```python
import kopf

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn1(**_):
    pass

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn2(**_):
    raise kopf.TemporaryError("temporary failure", delay=5)

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn1(**_):
    pass

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn2(**_):
    pass
```

```
kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml
```

Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done.

Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers.

```
Creation is in progress: …
Sleeping for 4.866814 seconds for the delayed handlers.
…
Sleeping was interrupted by new changes, 3.3 seconds left.
Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left.
Deletion is in progress: …
…
Deletion is processed: 2 succeeded; 0 failed.
```

---

Note that only some transitions are possible due to external reasons, not all of them:

* Creation/update/resume → deletion (when marked for deletion).
* Deletion → creation/update/resume (when unmarked for deletion).

Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.
nolar added a commit that referenced this pull request Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.**

This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope.

For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed.

Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states.

For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine.

---

An example case (#601):

```python
import kopf

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn1(**_):
    pass

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn2(**_):
    raise kopf.TemporaryError("temporary failure", delay=5)

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn1(**_):
    pass

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn2(**_):
    pass
```

```
kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml
```

Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done.

Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers.

```
Creation is in progress: …
Sleeping for 4.866814 seconds for the delayed handlers.
…
Sleeping was interrupted by new changes, 3.3 seconds left.
Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left.
Deletion is in progress: …
…
Deletion is processed: 2 succeeded; 0 failed.
```

---

Note that only some transitions are possible due to external reasons, not all of them:

* Creation/update/resume → deletion (when marked for deletion).
* Deletion → creation/update/resume (when unmarked for deletion).

Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.
nolar added a commit that referenced this pull request Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.**

This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope.

For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed.

Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states.

For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine.

---

An example case (#601):

```python
import kopf

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn1(**_):
    pass

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn2(**_):
    raise kopf.TemporaryError("temporary failure", delay=5)

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn1(**_):
    pass

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn2(**_):
    pass
```

```
kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml
```

Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done.

Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers.

```
Creation is in progress: …
Sleeping for 4.866814 seconds for the delayed handlers.
…
Sleeping was interrupted by new changes, 3.3 seconds left.
Creation is interrupted by deletion: 1 succeeded; 0 failed; 1 left.
Deletion is in progress: …
…
Deletion is processed: 2 succeeded; 0 failed.
```

---

Note that only some transitions are possible due to external reasons, not all of them: creation/update/resume → deletion (when marked for deletion).

Sudden transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.

A transition back from deletion to creation/update/resume is impossible in practice, but would work otherwise.
@nolar nolar force-pushed the irrelevant-handlers-backported branch from f909be6 to c350e9c Compare December 9, 2020 19:41
nolar added a commit that referenced this pull request Dec 9, 2020
**This is the proper fix for #601 on the main branch. The 0.28.x hotfix is in #607.**

This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope.

For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed.

Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states.

For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine.

---

An example case (#601):

```python
import kopf

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn1(**_):
    pass

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn2(**_):
    raise kopf.TemporaryError("temporary failure", delay=5)

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn1(**_):
    pass

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn2(**_):
    pass
```

```
kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml
```

Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done.

Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers.

```
Creation is in progress: …
Sleeping for 4.866814 seconds for the delayed handlers.
…
Sleeping was interrupted by new changes, 3.3 seconds left.
Creation is superseded by deletion: 1 succeeded; 0 failed; 1 left.
Deletion is in progress: …
…
Deletion is processed: 2 succeeded; 0 failed.
```

---

Note that only some transitions are possible due to external reasons, not all of them:

* creation/update/resume → deletion (when marked for deletion).
* resume → update (when changed; the resuming handlers are mixed into the updating process).

Other transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.

A transition back from deletion to creation/update/resume is impossible in practice, but would work otherwise.
@nolar nolar force-pushed the irrelevant-handlers-backported branch from c350e9c to d898c70 Compare December 10, 2020 20:01
This fix is important only for the states restored from storage with specific handlers referenced: those handlers should be taken into account when purging the state (#557), but not for `.done` & `.delays` — these ones must be calculated only for the handlers currently in scope.

For log messages with counts of succeeded and failed handlers (#540), only the handlers relevant to the current cause should be counted. However, if there are irrelevant handlers, it means that one cause was interrupted (intercepted, overridden) by another: e.g., if deletion has happened while a creation/update/resume was being processed.

Technically, now, without the fix, this merges two unrelated causes and their handlers, since, from Kopf's point of view, the irrelevant handlers look the same as the relevant but filtered out handlers (e.g. by labels). To work around this, we introduce "purposes" of the handlers and persist them in the handlers' states.

For compatibility, if the purpose is not stored (`None`), treat it as a catch-all value, thus reproducing the same behaviour as before the fix. This will only be used if the operator is upgraded in the middle of the processing (e.g. due to temporary errors or intentional delays). All new handling cycles will be stored with the purpose. In case of a rollback, the purpose will be ignored, but also overwritten back to `None` by the old versions of Kopf. This is fine.

---

An example case:

```python
import kopf

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn1(**_):
    pass

@kopf.on.create('zalando.org', 'v1', 'kopfexamples')
def create_fn2(**_):
    raise kopf.TemporaryError("temporary failure", delay=5)

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn1(**_):
    pass

@kopf.on.delete('zalando.org', 'v1', 'kopfexamples')
def delete_fn2(**_):
    pass
```

```
kubectl create -f examples/obj.yaml && sleep 2 && kubectl delete -f examples/obj.yaml
```

Here, on the object deletion, the state will be restored with 4 handlers: `create_fn1` & `create_fn2`& `delete_fn1` & `delete_fn2`. Since `create_fn2` was delayed and not finished, then, without the fix, it would yield its own delay and will prevent the `.done` from achieving `True` — while it is irrelevant anymore, and only `delete_fn1` & `delete_fn2` should be considered. However, all four should be purged from annotations when done.

Now, with this fix, the logs will report interruption of creation and the beginning of deletion — each with its own numbers.

```
Creation is in progress: …
Sleeping for 4.866814 seconds for the delayed handlers.
…
Sleeping was interrupted by new changes, 3.3 seconds left.
Creation is superseded by deletion: 1 succeeded; 0 failed; 1 left.
Deletion is in progress: …
…
Deletion is processed: 2 succeeded; 0 failed.
```

---

Note that only some transitions are possible due to external reasons, not all of them:

* creation/update/resume → deletion (when marked for deletion, the pending handlers are abandoned).
* resume → update (when changed; the resuming handlers are mixed into the updating process).

Other transitions within the creation/update/resume group are impossible due to their nature, except when initiated by the framework itself.

A transition back from deletion to creation/update/resume is impossible in practice, but would work otherwise.
@nolar nolar force-pushed the irrelevant-handlers-backported branch from d898c70 to a84617a Compare December 10, 2020 20:06
@nolar nolar marked this pull request as ready for review December 10, 2020 20:19
@nolar nolar merged commit c247b68 into release/0.28 Dec 10, 2020
@nolar nolar deleted the irrelevant-handlers-backported branch December 10, 2020 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant