Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider stabilizing the API and moving to v1 and beyond #2327

Closed
nathanperkins opened this issue May 15, 2023 · 15 comments
Closed

Consider stabilizing the API and moving to v1 and beyond #2327

nathanperkins opened this issue May 15, 2023 · 15 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@nathanperkins
Copy link

nathanperkins commented May 15, 2023

tl;dr: I would like controller-runtime to consider whether the API is stable enough to lock in with V1, and any further changes would bump to V2 and beyond.

controller-runtime is a widely used module that is fundamental to a lot of projects worldwide. Our team alone maintains dozens of controller-runtime reconcilers and uses the client.Client almost exclusively. We have been so appreciative of this project for enabling us to create controllers with consistent patterns across many use cases.

We have run into problems in the past (and seemingly future) where breaking changes to API (mostly function signatures) have made the upgrade story difficult.

One recent example is when context.Context was added to the signature of MapFunc. And there are proposals to add error to MapFunc as well (edit: that would add a new option, so not breaking).

These changes are huge improvements to issues we have had using MapFuncs. But, they have an unfortunate side effect of breaking upgrades. When we are importing modules which use the old controller-runtime with the old func signature, it is impossible for us to upgrade until our dependencies upgrade as well.

Some of our project are well-behind on controller-runtime upgrades simply because the tangled dependency upgrade story makes it too difficult.

Golang provides the V2+ module mechanism that allows for making breaking API changes as major version bumps and allowing downstream projects to control their upgrade story. Locking in the current API to v1 and making changes in v2+ would be of great benefit to downstream customers.

The main benefit is providing a smoother upgrade to downstream projects when the API changes. Another benefit is that it opens up the project to make breaking changes in a manageable way. So in a way, it would actually enable the project to make API changes that have been left behind because of upgrade scares.

BTW, I don't think the API necessarily needs to be 100% stable to start using v1+. It's perfectly fine to bump major versions relatively frequently and there's no requirement to backport fixes to old major versions.

@nathanperkins
Copy link
Author

nathanperkins commented May 15, 2023

There is a tool which can assist with checking for incompatible APIs and suggesting new major version bump.

https://pkg.go.dev/golang.org/x/exp/cmd/gorelease

This could be added to CI to warn about breaking changes and ensure that they are desired. Then, it would ensure that the major version is bumped accordingly.

@alvaroaleman
Copy link
Member

alvaroaleman commented May 16, 2023

Hi @nathanperkins,

At this point, there are no plans to do that. There are a couple of reasons for that:

  • The various k8s.io dependencies we pull in deliberately decided to stay on 0.X versions and we are doing the same. We usually bump them once every minor release on our end and breaking changes in there might end up surfacing through to controller-runtime users
  • There is still a lot of work to do in controller-runtime and a lot of that work will end up entailing breaking changes, like for example enhancing the cache, providing better support for multi-cluster setups or even the example of the fallible handlers you mentioned
  • The project is very short-staffed. As a maintainer, I don't even get around to all the changes I personally need, not to speak of changes that I don't. We have many feature requests and very few ppl actually doing any of the work to make those happen. Starting to maintain multiple releases would make that situation worse
  • I personally think that even for consumers the whole module v1+ system isn't great, because it entails that updating a dependency requires to change all import paths rather than just changing a go.mod entry when updating a dependency that includes breaking changes. The absolute only case where this can not be avoided is big monorepos and even in there, you can have multiple submodules to be able to use multiple versions of a dependency

@nathanperkins
Copy link
Author

nathanperkins commented May 16, 2023

Thank you for the response! :)

The various k8s.io dependencies we pull in deliberately decided to stay on 0.X versions and we are doing the same. We usually bump them once every minor release on our end and breaking changes in there might end up surfacing through to controller-runtime users

Doesn't the API surface of controller-runtime generally hide the API surface of its dependencies? Other than the API types themselves (which generally maintain backwards compatibility), I would think the usage of k8s.io is mostly internal. If not, maybe the controller-runtime APIs could be adjusted to make them more internal?

There is still a lot of work to do in controller-runtime and a lot of that work will end up entailing breaking changes, like for example #2261, providing #2207 or even the example of the #1996

The API doesn't have to be finalized or 100% stable before moving to v1+. It could be that the releases we consider "minor" right now (which actually contain breaking API changes) would become major version changes. The only difference is that downstream projects would have to update the go.mod import, which is easy to update with tooling.

The project is very short-staffed. As a maintainer, I don't even get around to all the changes I personally need, not to speak of changes that I don't. We have many feature requests and very few ppl actually doing any of the work to make those happen. Starting to maintain multiple releases would make that situation worse.

That's totally fair and I certainly don't want want to impose extra work that isn't helpful to others, especially if they can't get the things they want done. If the project is willing to do this, I could try to get bandwidth to help.

Just as it is right now, I don't think there is an obligation to maintain multiple releases. The major version bump would happen where you normally perform a minor version bump, for releases that make some API change. But the project could support it the same as they do right now.

From my perspective, implementing this means bumping to v1 in the next release and implementing a CI job which can show breaking API changes and suggest that a major version bump is necessary.

I personally think that even for consumers the whole module v1+ system isn't great, because it entails that updating a dependency requires to change all import paths rather than just changing a go.mod entry when updating a dependency that includes breaking changes. The absolute only case where this can not be avoided is big monorepos and even in there, you can have multiple submodules to be able to use multiple versions of a dependency.

Updating all the import paths is unfortunate, but it's very easy to automate with tooling. The status quo is much more difficult where a project can have its dependencies broken by trying an automated update which should be safe.

It's not just a problem for monorepos. When my project imports controller-runtime v0.x and also imports some other project which uses v0.x-1, and there are breaking changes between the two, it's not something that's possible for us to resolve without coordinating with the other project. They have to fix it or we have to fork it and fix it ourselves.

We have multiple projects in separate repos and we have to coordinate the controller-runtime upgrades in lock step whenever there are breaking API issues (which isn't obvious because they are hidden behind minor version bumps).

Let me know what you think, if you don't think this would ever or could ever be implemented, feel free to close it.

@sbueringer
Copy link
Member

sbueringer commented May 17, 2023

The various k8s.io dependencies we pull in deliberately decided to stay on 0.X versions and we are doing the same. We usually bump them once every minor release on our end and breaking changes in there might end up surfacing through to controller-runtime users

Doesn't the API surface of controller-runtime generally hide the API surface of its dependencies? Other than the API types themselves (which generally maintain backwards compatibility), I would think the usage of k8s.io is mostly internal. If not, maybe the controller-runtime APIs could be adjusted to make them more internal?

I think even if our usage of k8s.io is just internal and not surfaced to users of controller-runtime we have the problem that controller-runtime would have to be compatible with / compile with various versions of k8s.io/* dependencies.

Some examples:

  • We had an issue where controller-runtime v0.14.x (which depends on client-go 0.26) was not compatible with client-go 0.27: 🐛 Add cross-version compatibility with client-go 1.27 #2223
    • In this case it could be fixed in a way that controller-runtime v0.15.x is (probably) compatible with client-go 0.27 and 0.26
  • There was a change in k8s.io/apimachinery 0.27 which deprecated wait utils and introduced new ones: kubernetes/apimachinery@831cf05
    • While we are not surfacing those to controller-runtime users we are using them internally so CR has to compile with k8s.io/apimachinery. In this case the old funcs were only deprecated and not directly removed. But this still limits the k8s.io/apimachinery version we can compatible with. In this case we started using the new utils, which means k8s.io/apimachinery 0.27+ will be required for CR v0.15

I think even if we use v1/v2/.. in CR we run into problems if various CR versions are only compatible with specific k8s.io/apimachinery, k8s.io/client-go, ... versions. As a user of v1/v2/... you have to pick one specific k8s.io/client-go, k8s.io/apimachinery/... version and all CR versions you're using would have to be compatible with that version.

So even assuming we don't expose those dependencies to our users I'm not sure if it's possible to make the CR internal usage independent of specific minor versions of those dependencies.

@troy0820
Copy link
Member

troy0820 commented Sep 6, 2023

/kind feature support

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/support Categorizes issue or PR as a support question. labels Sep 6, 2023
@nathanperkins
Copy link
Author

nathanperkins commented Nov 27, 2023

This issue is a lot more serious with the webhook.Validator change in 0.15. Our module using controller-runtime 0.15 can't import APIs which use 0.14 or before which have implemented webhook.Validator on their types.

Breaking changes are usually less of a problem due to the compilation only including the imported module code which is actually used, but API types are the most common thing that is needed to import.

This essentially fragments the community across the breaking change boundary where <0.14 and >=0.15 cannot import each other.

#2596

@liubog2008
Copy link

I feel this project does not break compatibility accidentally but just not care about the compatibility between two versions.
For #2596, add a new interface is more reasonable but we choose a bad way to do this change.

@troy0820
Copy link
Member

troy0820 commented Dec 1, 2023

I feel this project does not break compatibility accidentally but just not care about the compatibility between two versions.
For #2596, add a new interface is more reasonable but we choose a bad way to do this change.

This PR #2014 was where the webhook validator interface was introduced, explains why the discussion was to introduce a breaking change rather than supporting both.

This issue is what brought the feature to that conclusion #1896 (comment)

I don't disagree that this may be difficult to track and even migrate to newer versions of controller-runtime with taking into account all the changes, is an exercise left to the consumer. I believe controller-runtime tries to do a great job with backporting important things to older versions to support n-2. However, dependencies around the use of this library will be the lowest common denominator with how you approach relying on things that do import controller-runtime.

e.g. Using cluster-api (which also uses controller runtime) will put you at whatever version they are using when you are building something relying on their package.

@alvaroaleman
Copy link
Member

Just to clarify this once again, the really important point is in here: #2327 (comment)

I think even if our usage of k8s.io is just internal and not surfaced to users of controller-runtime we have the problem that controller-runtime would have to be compatible with / compile with various versions of k8s.io/* dependencies.

For as long as the upstream k8s libs doesn't use major versions it would be pointless for controller-runtime to do so, as it will only ever be compatible with one version of the upstream k8s libs. So it is not possible for us to release major versions that can be imported simultaneously, regardless of what we do or do not want. And releasing major versions without that property is pointless. Hence, there really is nothing to consider here.

@liubog2008
Copy link

I got it, the root cause is some packages of k8s.io are also not stable enough.

@sbueringer
Copy link
Member

@nathanperkins In Cluster API we recently dropped all dependencies to CR in our API packages: kubernetes-sigs/cluster-api#9011

Note: This is not possible today for API versions that implement our conversion interfaces (but these are usually not the latest API version)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 4, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Apr 3, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. kind/support Categorizes issue or PR as a support question. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests

7 participants