Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

☂️-issue for Server-Side Apply in Gardener #4122

Closed
timebertt opened this issue May 27, 2021 · 5 comments
Closed

☂️-issue for Server-Side Apply in Gardener #4122

timebertt opened this issue May 27, 2021 · 5 comments
Labels
area/dev-productivity Developer productivity related (how to improve development) area/scalability Scalability related kind/enhancement Enhancement, improvement, extension kind/impediment Something that impedes developers, operators, users or others in their work lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@timebertt
Copy link
Member

How to categorize this issue?

/area dev-productivity scalability
/kind impediment

From #4083:

[...] Server-Side Apply. We identified many places, especially in the Gardenlet code, that will benefit from SSA compared to Json-Merge/Strategic-Merge patches that are used today. Less conflicts, less network transfers, less API calls, lower code complexity, etc.. Other components in the Gardener landscape, e.g. Etcd-Druid or provider extensions, will benefit from SSA to at least the same extent.

While investigating how and under which circumstances SSA can be leveraged in Gardener as part of #4027, we got aware of the following major problems that will most probably prevent us from leveraging SSA as originally intended. @timuthy and I agreed on summarizing them in an issue for future reference.

  • removing/unsetting fields in custom resources via apply that were set during a previous apply is buggy (ref "null" values are dropped from CustomResources kubernetes/kubernetes#101036, Field deletion doesn't work well in SSA kubernetes-sigs/structured-merge-diff#171). The bug was fixed in v1.21 and backported to v1.{18,19,20}, though we would heavily depend on it once gardener uses SSA to deploy/reconcile extension resources (e.g. Infrastructure) in the seed.
    • We probably don't want to add additional requirements for specific patch versions for the seed's k8s versions (e.g. require at least v1.18.19), so this effectively prevents us from properly using SSA for this use case.
  • switching from the typical CreateOrUpdate-logic to SSA in a controller doesn't seem to be straight-forward for resources that were created before SSA was enabled (resources that don't have field ownership tracked in metadata.managedFields yet).
    • The problem is, that the first apply will add the before-first-apply field manager, that owns all fields that were present before the apply. Once the new field manager (e.g. gardenlet) wants to remove a field, it will only give up ownership but the field will still not be removed, as it is still co-owned by the before-first-apply manager.
    • In order to workaround this issue and take over sole ownership of all fields during the migration, we need to take ownership of at least one field before the first apply
    • e.g. take ownership of timestamp annotation:
    $ k patch infra infra --type=merge -p '{"metadata":{"managedFields": [{"apiVersion":"extensions.gardener.cloud/v1alpha1","fieldsType":"FieldsV1","fieldsV1":{"f:metadata":{"f:annotations":{"f:gardener.cloud/timestamp":{}}}},"manager":"gardenlet","operation":"Update","time":"2021-05-27T13:03:49Z"}]}}'
    
    • and then apply the full spec, which will make the new manager owner of all spec fields

There are also other minor problems with SSA, that won't hinder us from using it, but are worth keeping in mind:

@timebertt timebertt added the kind/enhancement Enhancement, improvement, extension label May 27, 2021
@gardener-robot gardener-robot added area/dev-productivity Developer productivity related (how to improve development) area/scalability Scalability related kind/impediment Something that impedes developers, operators, users or others in their work labels May 27, 2021
@timebertt
Copy link
Member Author

So far there's no decision whether to drop the idea of using SSA in gardener or pursue it further. However, the list above shows that we have to be very conscious about when we can properly leverage it.
We still want to understand the before-first-apply manager and the reasoning behind it, so maybe this provides us with some more arguments for taking a decision here.

timebertt added a commit to timebertt/gardener that referenced this issue May 31, 2021
For the reasons explained in gardener#4057, MergePatchOrCreate is not safe to use
without reading objects from the API server / cache first. Ideally we would
switch to server-side apply to simplify such operations on controlled objects
(e.g. extension resources), though we cannot leverage it for now because of
several issues described in gardener#4122.
rfranzke pushed a commit that referenced this issue Jun 1, 2021
* Cleanup unused/dead code

* Make EXPECTPatch easier to debug

* Refactor extensions helper funcs

This prefactoring performs several changes in the extension helper funcs:
- the funcs directly work on the passed in objects instead of creating
  new objects via separately passed newObjFuncs. Passed objects are
  expected to be filled with the latest state the controller/component
  applied/observed/retrieved, but at least namespace and name.
  This way, controllers can keep working on a single in-memory object
  instead of retrieving the same object over and over again.
- WaitUntilExtensionObjectReady additionally checks that the added
  timestamp annotation is present on the object to prevent false early
  exits when reading stale data (e.g. from a cached client).
- makes naming more consistent and clearer

This provides the following benefits:
- makes the funcs ready to be used with cached clients, as they are able
  to properly handle stale data now
- controllers/components automatically have the latest state of the
  objects present without retrieving them again from the API server.
  This will ease further refactorings and allows to safely use
  patches in a lot of cases instead of CreateOrUpdate (+RetryOnConflict).
- the funcs are easier to use and comprehend: only the desired object
  needs to be passed, no need to pass newObjFuncs, namespace and name
- the funcs are more similar to other helper funcs (e.g. CreateOrUpdate)
  and the mechanisms employed in e.g. c-r clients
- no need to pass a client.Object to postReadyFuncs which needs to be
  casted again

* Make BackupBucket controller ready for cached client

* Make BackupEntry controller/component ready for cached client

* Make ContainerRuntime component ready for cached client

* Make ControlPlane component ready for cached client

* Make Extension component ready for cached client

* Make Infrastructure component ready for cached client

* Make Network component ready for cached client

* Make OperatingSystemConfig component ready for cached client

* Make Worker component ready for cached client

* Fix failing unit test

* Address review feedback

* Switch from MergePatchOrCreate to GetAndCreateOrMergePatch

For the reasons explained in #4057, MergePatchOrCreate is not safe to use
without reading objects from the API server / cache first. Ideally we would
switch to server-side apply to simplify such operations on controlled objects
(e.g. extension resources), though we cannot leverage it for now because of
several issues described in #4122.

* Remove *PatchOrCreate funcs

As described in #4057, *PatchOrCreate is easy to use incorrectly.
Remove *PatchOrCreate funcs in order to minimize the risk of misuse.
There might still be good use cases for this mechanism, though they
are limited. If there is such a good use case, we can use it also
without a helper func.
@timebertt
Copy link
Member Author

There is an upstream issue about the migration from CSA/CreateOrUpdate to SSA: kubernetes/kubernetes#99003

@gardener-robot gardener-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 29, 2021
@gardener-ci-robot
Copy link
Contributor

The Gardener project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed
    You can:
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close

/lifecycle rotten

@gardener-prow gardener-prow bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 30, 2022
krgostev pushed a commit to krgostev/gardener that referenced this issue Apr 21, 2022
* Cleanup unused/dead code

* Make EXPECTPatch easier to debug

* Refactor extensions helper funcs

This prefactoring performs several changes in the extension helper funcs:
- the funcs directly work on the passed in objects instead of creating
  new objects via separately passed newObjFuncs. Passed objects are
  expected to be filled with the latest state the controller/component
  applied/observed/retrieved, but at least namespace and name.
  This way, controllers can keep working on a single in-memory object
  instead of retrieving the same object over and over again.
- WaitUntilExtensionObjectReady additionally checks that the added
  timestamp annotation is present on the object to prevent false early
  exits when reading stale data (e.g. from a cached client).
- makes naming more consistent and clearer

This provides the following benefits:
- makes the funcs ready to be used with cached clients, as they are able
  to properly handle stale data now
- controllers/components automatically have the latest state of the
  objects present without retrieving them again from the API server.
  This will ease further refactorings and allows to safely use
  patches in a lot of cases instead of CreateOrUpdate (+RetryOnConflict).
- the funcs are easier to use and comprehend: only the desired object
  needs to be passed, no need to pass newObjFuncs, namespace and name
- the funcs are more similar to other helper funcs (e.g. CreateOrUpdate)
  and the mechanisms employed in e.g. c-r clients
- no need to pass a client.Object to postReadyFuncs which needs to be
  casted again

* Make BackupBucket controller ready for cached client

* Make BackupEntry controller/component ready for cached client

* Make ContainerRuntime component ready for cached client

* Make ControlPlane component ready for cached client

* Make Extension component ready for cached client

* Make Infrastructure component ready for cached client

* Make Network component ready for cached client

* Make OperatingSystemConfig component ready for cached client

* Make Worker component ready for cached client

* Fix failing unit test

* Address review feedback

* Switch from MergePatchOrCreate to GetAndCreateOrMergePatch

For the reasons explained in gardener#4057, MergePatchOrCreate is not safe to use
without reading objects from the API server / cache first. Ideally we would
switch to server-side apply to simplify such operations on controlled objects
(e.g. extension resources), though we cannot leverage it for now because of
several issues described in gardener#4122.

* Remove *PatchOrCreate funcs

As described in gardener#4057, *PatchOrCreate is easy to use incorrectly.
Remove *PatchOrCreate funcs in order to minimize the risk of misuse.
There might still be good use cases for this mechanism, though they
are limited. If there is such a good use case, we can use it also
without a helper func.
@gardener-ci-robot
Copy link
Contributor

The Gardener project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten

/close

@gardener-prow
Copy link
Contributor

gardener-prow bot commented Apr 29, 2022

@gardener-ci-robot: Closing this issue.

In response to this:

The Gardener project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gardener-prow gardener-prow bot closed this as completed Apr 29, 2022
krgostev pushed a commit to krgostev/gardener that referenced this issue Jul 5, 2022
* Cleanup unused/dead code

* Make EXPECTPatch easier to debug

* Refactor extensions helper funcs

This prefactoring performs several changes in the extension helper funcs:
- the funcs directly work on the passed in objects instead of creating
  new objects via separately passed newObjFuncs. Passed objects are
  expected to be filled with the latest state the controller/component
  applied/observed/retrieved, but at least namespace and name.
  This way, controllers can keep working on a single in-memory object
  instead of retrieving the same object over and over again.
- WaitUntilExtensionObjectReady additionally checks that the added
  timestamp annotation is present on the object to prevent false early
  exits when reading stale data (e.g. from a cached client).
- makes naming more consistent and clearer

This provides the following benefits:
- makes the funcs ready to be used with cached clients, as they are able
  to properly handle stale data now
- controllers/components automatically have the latest state of the
  objects present without retrieving them again from the API server.
  This will ease further refactorings and allows to safely use
  patches in a lot of cases instead of CreateOrUpdate (+RetryOnConflict).
- the funcs are easier to use and comprehend: only the desired object
  needs to be passed, no need to pass newObjFuncs, namespace and name
- the funcs are more similar to other helper funcs (e.g. CreateOrUpdate)
  and the mechanisms employed in e.g. c-r clients
- no need to pass a client.Object to postReadyFuncs which needs to be
  casted again

* Make BackupBucket controller ready for cached client

* Make BackupEntry controller/component ready for cached client

* Make ContainerRuntime component ready for cached client

* Make ControlPlane component ready for cached client

* Make Extension component ready for cached client

* Make Infrastructure component ready for cached client

* Make Network component ready for cached client

* Make OperatingSystemConfig component ready for cached client

* Make Worker component ready for cached client

* Fix failing unit test

* Address review feedback

* Switch from MergePatchOrCreate to GetAndCreateOrMergePatch

For the reasons explained in gardener#4057, MergePatchOrCreate is not safe to use
without reading objects from the API server / cache first. Ideally we would
switch to server-side apply to simplify such operations on controlled objects
(e.g. extension resources), though we cannot leverage it for now because of
several issues described in gardener#4122.

* Remove *PatchOrCreate funcs

As described in gardener#4057, *PatchOrCreate is easy to use incorrectly.
Remove *PatchOrCreate funcs in order to minimize the risk of misuse.
There might still be good use cases for this mechanism, though they
are limited. If there is such a good use case, we can use it also
without a helper func.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dev-productivity Developer productivity related (how to improve development) area/scalability Scalability related kind/enhancement Enhancement, improvement, extension kind/impediment Something that impedes developers, operators, users or others in their work lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

3 participants