Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update observedGeneration in reconcileStatus flow instead of reconcileSpec flow #996

Merged
merged 6 commits into from
Feb 7, 2025

Conversation

shreyas-s-rao
Copy link
Contributor

@shreyas-s-rao shreyas-s-rao commented Feb 4, 2025

How to categorize this PR?

/area usability
/kind impediment

What this PR does / why we need it:
Follow-up to #987

This PR enhances the reconciliation flow such that observedGeneration is updated in reconcileStatus flow instead of reconcileSpec flow.

Reason: Upon a spec reconciliation request (spec change or gardener.cloud/operation: reconcile annotation, depending on the reconcile strategy), the observedGeneration was updated in reconcileSpec flow, while the other status fields like conditions, sts-related fields, etc were updated later in reconcileStatus flow. After the user updates the spec, and waits for the spec changes to be reflected in the etcd cluster (which are accurately reflected by conditions), it can so happen that the observedGeneration is updated first, followed by a gap before the conditions are updated. This can lead to instances when the user watches the status and assumes that the spec has been fully reconciled and pods are updated with latest changes, because the previous conditions still applied while the observed generation got updated.

To avoid this, the observedGeneration will now be updated along with the rest of the status fields, in one single status patch call, so that at any given point of time, the user can check both status.observedGeneration and status.conditions to know whether the spec changes have been fully rolled out.

Additionally, the operation annotation is conditionally removed at the end of both spec and status reconciliation, to ensure that any failure in either spec or status reconciliation will result in a fresh spec reconciliation upon requeue, and will not get missed. This will ensure that the observedGeneration gets correctly updated, even if druid restarts in between reconciliation flow.

Which issue(s) this PR fixes:
Fixes #985

Special notes for your reviewer:
/invite @unmarshall
/assign @unmarshall
/cc @timuthy

Release note:

Updation of `status.observedGeneration` and optional removal of the `gardener.cloud/operation: reconcile` annotation on the Etcd resource are now executed after the reconciliation of the Etcd status, to depict accurate state of the cluster at any given point in time. Users must wait for the `status.observedGeneration` field to be updated (and additionally for the removal of the `gardener.cloud/operation: reconcile` annotation is CLI flag `enable-spec-auto-reconcile` is set to false) to confirm completion of reconciliation.

@shreyas-s-rao shreyas-s-rao added this to the v0.27.0 milestone Feb 4, 2025
@shreyas-s-rao shreyas-s-rao requested a review from a team as a code owner February 4, 2025 17:08
@gardener-robot gardener-robot added the needs/review Needs review label Feb 4, 2025
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 4, 2025
@gardener-robot gardener-robot added area/usability Usability related kind/impediment Something that impedes developers, operators, users or others in their work size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Feb 4, 2025
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Feb 4, 2025
@shreyas-s-rao shreyas-s-rao added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed ok-to-test Indicates a non-member PR verified by an org member that is safe to test. labels Feb 4, 2025
@seshachalam-yv
Copy link
Contributor

/assign

@gardener-robot gardener-robot added size/l Size of pull request is large (see gardener-robot robot/bots/size.py) needs/second-opinion Needs second review by someone else and removed size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Feb 5, 2025
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 5, 2025
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 5, 2025
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Feb 5, 2025
@shreyas-s-rao
Copy link
Contributor Author

@seshachalam-yv thanks a lot for your inputs. As discussed, I have made the following changes to streamline the reconciliation flow. Changes made in 97282ed:

  1. Call canReconcileSpec only once, to ensure that only one event is emitted when suspend-reconcile key is detected. Additionally, in the case where a new Etcd resource is created with operation-annotation set, computing canReconcileSpec only at the beginning ensures that it returns true, as opposed to recalculating it after updating observedGeneration, in which case it returns false and the operation annotation would never be removed.
  2. Use operatorContext to pass information such as result of canReconcileSpec and shortCircuitSpecReconcile (both bool values), to avoid recalculating these within each of the other reconciliation steps like reconcileStatus and removeOperationAnnotation
  3. Add helper functions with tests, along with some missing tests for other existing helper functions.

internal/controller/etcd/reconcile_status.go Outdated Show resolved Hide resolved
internal/controller/etcd/reconciler.go Outdated Show resolved Hide resolved
internal/utils/miscellaneous.go Outdated Show resolved Hide resolved
internal/utils/miscellaneous.go Outdated Show resolved Hide resolved
internal/utils/miscellaneous_test.go Show resolved Hide resolved
@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Feb 6, 2025
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 6, 2025
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 6, 2025
…ata` and `GetBoolValueOrDefault()` function
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 6, 2025
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 6, 2025
…lt()` function to `GetBoolValueOrError()` to not swallow errors
@gardener-robot-ci-3 gardener-robot-ci-3 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Feb 7, 2025
… which runs `updateObservedGeneration()` and `removeOPerationAnnotation()`
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 7, 2025
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 7, 2025
Copy link
Contributor

@unmarshall unmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/changes Needs (more) changes needs/review Needs review needs/second-opinion Needs second review by someone else labels Feb 7, 2025
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 7, 2025
@gardener-robot gardener-robot added needs/second-opinion Needs second review by someone else and removed reviewed/lgtm Has approval for merging labels Feb 7, 2025
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Feb 7, 2025
@shreyas-s-rao shreyas-s-rao merged commit bd06f2c into gardener:master Feb 7, 2025
13 checks passed
@shreyas-s-rao shreyas-s-rao deleted the fix/status-update branch February 7, 2025 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/usability Usability related kind/impediment Something that impedes developers, operators, users or others in their work needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/second-opinion Needs second review by someone else size/l Size of pull request is large (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clients potentially miss etcd spec roll outs
7 participants