Reset watch-streams for the freeze mode, relist on unfreezing #257

nolar · 2019-11-25T12:37:43Z

Redesign operator freezes to stop/resume watching instead of ignoring events.

Issue : #255, indirectly #51 #116

Background

The peerings and the freezes were made to prevent multiple operators handling the same resources in the same cluster, even if those operators run outside of the cluster. One of the use-cases was for the development/debugging of the operators on the developer's machine. But also the normal operations with 2+ operators, which would otherwise conflict with each other's changes.

Description

Before this PR: When an operator is frozen due to presence of another operator of the same or higher priority, it ignores the events, while continuing watching for the resource changes normally.

As a consequence of that approach, in some cases (see below), it ignores the listing/watching events on the startup, and does not react to the objects' changes/creations/deletions that happened before or during the initial freeze (of up to 60 seconds after the startup). Only the new changes happening after the unfreeze are handled.

For example, when the previous operator's pod is SIGKILL'ed and restarted, the old operator's process does not remove itself from the peering resource (no chance given), so the new operator's process can see a ghost of the old operator's process for up to 60 seconds. And since it has the same priority, it goes to the frozen mode until then.

Such per-event ignoring also creates a lot of log noise for every ignored event.

With this PR, the "freeze-mode" is redesigned to a better approach:

Instead of ignoring the events but continuing to watch the resource, the watch streams are closed at the same moment as the freeze mode is turned on (and the watching connections are closed).

When the freeze mode is turned off, the watch-streams are reconnected by re-listing the resources, as if the operator has just started.

This ensures that no objects remain unnoticed for long times just because their changes happened during the "freeze mode" for false technical reasons (ghost operators' records in the peering). In the worst case, the reactions will be delayed until the ghost operators' records will expire (up to 60 seconds), but the reaction will happen normally after that. In most cases, however, it will lead to no-operation.

This also removes the log noise.

Types of Changes

New feature (non-breaking change which adds functionality)
Bug fix (non-breaking change which fixes an issue)

Review

List of tasks the reviewer must do to review the PR

Tests

zincr · 2019-11-25T12:37:54Z

🤖 zincr found 0 problems , 0 warnings

✅ Large Commits
✅ Approvals
✅ Specification
✅ Dependency Licensing

zincr · 2019-11-25T12:37:54Z

🤖 zincr found 1 problem , 1 warning

❌ Approvals
ℹ️ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below

Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

✅ Approved by PR author @nolar
❌ 1 additional approval needed

Large Commits

Checks all commits for large additions to a single file. Large commits should be reviewed more carefully for potential copyright and licensing issues

This file contains a substantial change, please review to determine if the change comes from an external source and if there are any copyright or licensing issues to be aware of

ℹ️ kopf/structs/primitives.py had +100 lines of code changed in a single commit
Please review this file to determine the source of this change
ℹ️ tests/primitives/test_toggles.py had +100 lines of code changed in a single commit
Please review this file to determine the source of this change

nolar added the enhancement New feature or request label Nov 25, 2019

nolar force-pushed the freeze-resets branch from 506fdd0 to 1628d04 Compare December 8, 2019 12:37

Reset the watch-streams instead of suppressing events in freeze-mode

ec2c152

nolar force-pushed the freeze-resets branch from 1628d04 to ec2c152 Compare December 12, 2019 13:33

nolar marked this pull request as ready for review December 12, 2019 13:35

nolar requested a review from samurang87 as a code owner December 12, 2019 13:35

nolar added 2 commits December 16, 2019 15:16

Merge branch 'master' into freeze-resets

9d65642

Merge branch 'master' into freeze-resets

222d5f3

samurang87 approved these changes Dec 19, 2019

View reviewed changes

nolar merged commit d74a063 into zalando-incubator:master Dec 19, 2019

nolar deleted the freeze-resets branch December 19, 2019 13:45

kopf-archiver bot mentioned this pull request Aug 19, 2020

[PR] Reset watch-streams for the freeze mode, relist on unfreezing nolar/kopf#257

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reset watch-streams for the freeze mode, relist on unfreezing #257

Reset watch-streams for the freeze mode, relist on unfreezing #257

nolar commented Nov 25, 2019 •

edited

Loading

zincr bot commented Nov 25, 2019 •

edited

Loading

zincr bot commented Nov 25, 2019

Reset watch-streams for the freeze mode, relist on unfreezing #257

Reset watch-streams for the freeze mode, relist on unfreezing #257

Conversation

nolar commented Nov 25, 2019 • edited Loading

Background

Description

Types of Changes

Review

zincr bot commented Nov 25, 2019 • edited Loading

🤖 zincr found 0 problems , 0 warnings

zincr bot commented Nov 25, 2019

🤖 zincr found 1 problem , 1 warning

Approvals

Large Commits

nolar commented Nov 25, 2019 •

edited

Loading

zincr bot commented Nov 25, 2019 •

edited

Loading