Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

namespace scoped soft finalizers #114

Open
philbrookes opened this issue Jul 14, 2022 · 9 comments
Open

namespace scoped soft finalizers #114

philbrookes opened this issue Jul 14, 2022 · 9 comments
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@philbrookes
Copy link

philbrookes commented Jul 14, 2022

Is your feature request related to a problem? Please describe.
While investigating workload migration using the advanced scheduling introduced in 0.5.0, I found that soft finalizers can be used (in 0.6.0) to allow me to schedule the removal of resources from the losing cluster, to allow a graceful migration with no downtime.

However with further thought we've realised that the pods running from the deployment could easily be relying on resources that we are unaware of (e.g. a secret, or a CR for a database operator, etc.) and although the deployment, service and ingress are still migrating gracefully the pod itself will crash when the resources it relies on are deleted ungracefully from the losing cluster.

Describe the solution you'd like
An implementation of the soft finalizers at a namespace level.

As the namespace is the unit of currency for the advanced scheduling feature, and when a namespace is rescheduled all workload resources inside that namespace will need to move together; it seems that the most common use-case of soft finalizers will be to gracefully move ALL of the resources within a namespace to a new workloadcluster. Using namespace scoped softfinalizers allows us to do this gracefully, instead of via the individual resources.

Describe alternatives you've considered
The user could be educated to set the resources relied on, as owners of the deployment, thereby causing the workloadcluster to prevent tidying up those resources until the deployment is removed.

Additional context

Cluster Finalizer (Soft Finalizer): https://github.com/kcp-dev/kcp/blob/4d74a085a82affafba7f6d91818d0f0c6953e1d4/pkg/apis/workload/v1alpha1/types.go#L46-L56

@philbrookes
Copy link
Author

ping @jmprusi @maleck13

@sttts sttts added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 19, 2022
@ncdc
Copy link
Member

ncdc commented Jul 19, 2022

@philbrookes @jmprusi could please provide a reference to what a "soft finalizer" is?

@ncdc ncdc changed the title namespace scoped softfinalizers namespace scoped soft finalizers Jul 19, 2022
@sttts
Copy link
Member

sttts commented Jul 19, 2022

@sttts sttts added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 19, 2022
@jmprusi
Copy link
Member

jmprusi commented Jul 19, 2022

@ncdc updated the issue with additional context on the Soft finalizers.

@philbrookes
Copy link
Author

philbrookes commented Aug 10, 2022

== example scenario

  • Admin creates some synctargets and assigns them to various locations (loc1, loc2)
  • Admin employs the GLBC to manage DNS on the applications.
  • Admin creates a "developers" workspace for developers in his org and creates a placement there which uses loc1
  • Developer uses the "developers" workspace
  • Developer creates application using ingress, service and deployments
  • Developer also creates deployments for some custom operators, which allow the creation of AWS resources such as Redis, MySql, Postgres and S3 instances
  • Developer creates several AWS resources using the CRs for these operators.
  • Admin changes the placement to use loc2
  • The GLBC ensures that the Ingresses, services, and deployments are deleted from the loc1 synctargets gracefully to ensure that there is no downtime
  • As GLBC is not aware of every type of CRD that might be created or need to be migrated the resources created for the custom operator are deleted immediately
  • The pods in the loc1 synctarget are still trying to run (as the workload has not completed migration yet) but are crashlooping as the resources created for AWS operator have been deleted too early.

@philbrookes
Copy link
Author

@philbrookes to set up a call to discuss potential implementations.

@davidfestal
Copy link
Member

I'm interested in attending the call.

@philbrookes
Copy link
Author

philbrookes commented Aug 18, 2022

Discussed and came to a decision:
The "soft finalizers" or "Sync Finalizers" will be added to namespaces, and syncers last check before deleting this item, will be to check if there are any non-empty "Sync Finalizers" related to that sync target on the object's namespace before deletion.

@philbrookes will look to contribute this to KCP

@mjudeikis
Copy link
Contributor

/transfer-issue contrib-tmc

@kcp-ci-bot kcp-ci-bot transferred this issue from kcp-dev/kcp Nov 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/feature Categorizes issue or PR as related to a new feature.
Projects
Status: Backlog
Development

No branches or pull requests

6 participants