Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use internal versions of Tekton CRDs in reconcilers #5504

Open
lbernick opened this issue Sep 15, 2022 · 23 comments
Open

Use internal versions of Tekton CRDs in reconcilers #5504

lbernick opened this issue Sep 15, 2022 · 23 comments
Assignees
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@lbernick
Copy link
Member

Currently, we use the "enable-api-fields" feature flag to allow users to opt into features at different stability levels, as described in tep-0033.

We have discussed moving towards maintaining multiple versions of CRDs in the future (for example, a v1 version with stable features and a v2alpha1 version with stable and alpha features), but the main reason we don't do this is because it creates a large maintenance burden for contributors.

This issue can be used to brainstorm ways to make it easier to maintain multiple API versions of our CRDs.
Some ideas that have already come up:

  • move as much code as possible out of the apis package, such as validation functions, so that it doesn't need to be maintained in multiple places
  • create a CRD version that's stored but not served, and all other CRD versions are converted to this one. this would be the version used in the reconciler.
  • encouraging users to use the dynamic client instead of the tekton versioned client. however, we'd still need to support all versions of the CRD, so I don't think this is really easier from a contributor effort perspective.
  • some kind of tooling to be able to easily view the diff between pipeline version packages?
@lbernick lbernick changed the title Making it easier to create multiple API versions of Tekton CRDs Improve API versioning Oct 11, 2022
@lbernick
Copy link
Member Author

@dibyom has written up a proposal for a new API versioning strategy: https://docs.google.com/document/d/15vD5vbK5UKCDFMoOqOwEc_NYACthtCZJlTJiiEEoljY

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 9, 2023
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 8, 2023
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lbernick lbernick reopened this Mar 10, 2023
@lbernick lbernick removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Mar 10, 2023
@lbernick
Copy link
Member Author

Looking at the scheduler-plugins repo, they've found a way to register "internal versions" of the types their controller operates on (example: https://github.com/kubernetes-sigs/scheduler-plugins/blob/7da686c01cf3bc5d8fe852334698468e046bb000/apis/config/register.go#L26), and then all the versioned types are automatically converted into the unversioned internal type.

@dibyom I know you originally suggested something like this and it seems like it's actually possible which is very cool! We'd have to change our storage version over to this "unversioned" type, but once we do this, I would guess we wouldn't have to do it again (e.g. if we want to create a v2 api). @tektoncd/core-maintainers thoughts on this?

FYI @JeromeJu

@dibyom
Copy link
Member

dibyom commented Mar 10, 2023

If this is possible with knative/pkg without too much trouble then I'd vote yes. It would tremendously simplify the conversions between versions (and also allow us to prototype much faster with new experimental apiVersions!)

@JeromeJu
Copy link
Member

JeromeJu commented Mar 10, 2023

Looks very cool indeed! thanks @lbernick 🆒

It would tremendously simplify the conversions between versions

could we elaborate how this is going to be simplify workload for conversion? or are we referring to the future work of implementing the. conversions? since this felt to me a shift of the storage version that's been converting to

another question in mind is how are CRDs/ features deprecated yet not-removed shall be handled ie. ClusterTask? would it be an implicit requiring this to happen after those being removed?

@lbernick
Copy link
Member Author

could we elaborate how this is going to be simplify workload for conversion? or are we referring to the future work of implementing the. conversions? since this felt to me a shift of the storage version that's been converting to

My guess is we will probably have the "internal" type be the same as the v1 types we've defined, and reuse the existing conversion functions we have (although we might just need to update their signatures).

another question in mind is how are CRDs/ features deprecated yet not-removed shall be handled ie. ClusterTask? would it be an implicit requiring this to happen after those being removed?

We might be able to do this independently for each CRD (not sure). For ClusterTask, we can probably just leave that in v1beta1 since we won't be adding new versions.

We might have to experiment a bit to see how this will work; I could be wrong about these points.

@vdemeester
Copy link
Member

vdemeester commented Mar 13, 2023

Super nice, but quick note on this. Because this would be the storage version, we would to have to be very careful about the change we do in there 👼🏼, and this might lead to a "relatively" big object (go struct I mean) because we are bringing with us a lot of "deprecated" fields that are there for "backward compatibilities". But I do agree, we need to experiment a bit with it 👼🏼

We might be able to do this independently for each CRD (not sure). For ClusterTask, we can probably just leave that in v1beta1 since we won't be adding new versions.

Yes, we can definitely do this on a CRD basis.

@dibyom
Copy link
Member

dibyom commented Mar 14, 2023

yeah we definitely need to experiment to see the viability of the is approach before deciding if we can actually do this.

@lbernick
Copy link
Member Author

lbernick commented May 8, 2023

I decided to look into how k8s handles these problems in a bit more detail.

They have a small number of served versions of the API, and the storage version is the latest stable version. They have an internal version of their API, which is not serialized, that their code operates on.

Defaulting is done when a versioned CRD is created or updated, but validation is done on the internal version of the API rather than the versioned versions. Conversion code is mostly autogenerated.

If they need to add a new unstable feature, they backport it to the stored version as well, aka the latest stable version, gating it behind the feature flag. This is so that objects can successfully round-trip between API versions. They have done round-tripping in the past via annotations, the way we do currently, but found it to be too error-prone (slack convo), so they decided to add the unstable fields to stable apis instead. They tombstone fields but never remove them. I asked why they don't make the internal version the storage version, and the answer is that this is strictly more work than what they do currently, and is just another version to handle (since the latest stable version contains all the fields anyway).

Here's what I think we should do:

  • Swap over to v1 as our storage version, which we are doing anyway (Release v1 versions of Task, TaskRun, Pipeline, and PipelineRun CRDs #5541), and have the reconciler code operate on v1 structs.
  • Create an internal version of Task and swap our reconciler code over to use the internal version, leaving v1 as the stored version. (I'm suggesting swapping reconciler code to v1 first and then swapping to the internal version because the v1 swap is more urgent and will allow us to deprecate/stop maintaining v1beta1, while we can take more time with the internal version swap since it can have the same schema as v1.)
  • If that goes well, we can swap reconciler code for the other CRDs

Optional follow-ups:

  • Decide if we want to use the internal CRD versions as stored versions instead of v1. (I did some quick experimentation and verified that the storage version of a CRD doesn't need to be served.)
  • If conversion via annotations becomes a problem, consider whether we want to add unstable fields to stable apis instead, the way k8s does. For now, conversion via annotations seems to be working mostly fine, unless annotations get too large.
  • Look into autogenerating conversion code.
  • Consider whether we want to do validation on the internal api instead of the versioned apis.

@dibyom
Copy link
Member

dibyom commented May 17, 2023

(I did some quick experimentation and verified that the storage version of a CRD doesn't need to be served.)

Awesome! Out of curiosity, what does the served field signify then?

@vdemeester
Copy link
Member

@dibyom served means it's "visible" to clients / users, and thus kubectl. If it's not served, you, as a "client" cannot query it.

@dibyom
Copy link
Member

dibyom commented May 25, 2023

@dibyom served means it's "visible" to clients / users, and thus kubectl. If it's not served, you, as a "client" cannot query it.

But the aren't the clients/informers used in the reconciler also considered "clients" or users? How does the reconciler get to fetch/update/watch for changes but not other clients?

@lbernick lbernick changed the title Improve API versioning Use internal versions of Tekton CRDs in reconcilers Jun 13, 2023
@Yongxuanzhang
Copy link
Member

/assign

@Yongxuanzhang
Copy link
Member

Yongxuanzhang commented Jul 25, 2023

Hi @tektoncd/core-maintainers, I want to update our proposal of approaching this issue:

  1. Pull in conversion-gen,
  2. Copy v1 Task to pkg/apis/pipeline and generate conversion code between internal version and v1beta1, v1.
  3. Replace reconciler code to use internal task instead of v1 task
  4. Update v1 and v1beta1 validation functions to convert it to internal version and then call internal Task's validation

For more details and alternative discussions, please take a look at this doc
We're looking forward to any suggestions and feedback. Thanks!

@vdemeester
Copy link
Member

vdemeester commented Jul 25, 2023

@Yongxuanzhang essentially, the proposal is to

  • no new version in CRD (aka internal version is just "in-code")
  • put all "logic" into the internal object (by logic I mean, validation, defaulting …)
  • make reconciler work on internal object
    • but initially fetches whatever is stored (today v1) and directly converting to internal…
    • … work based of internal and …
    • … convert back to v1 when we mutate the object (update status mainly)
    • *we would just have to "wrap" reconcilerkind and some other function to go from v1-> internal or internal->v1
  • use conversion-gen to generate conversion from versioned api to an internal object, so that we don't have to maintan that code ourselves

Am I right ?

@Yongxuanzhang
Copy link
Member

@Yongxuanzhang essentially, the proposal is to

  • no new version in CRD (aka internal version is just "in-code")

  • put all "logic" into the internal object (by logic I mean, validation, defaulting …)

  • make reconciler work on internal object

    • but initially fetches whatever is stored (today v1) and directly converting to internal…
    • … work based of internal and …
    • … convert back to v1 when we mutate the object (update status mainly)
    • *we would just have to "wrap" reconcilerkind and some other function to go from v1-> internal or internal->v1
  • use conversion-gen to generate conversion from versioned api to an internal object, so that we don't have to maintan that code ourselves

Am I right ?

Yes. That's right, but I think we don't do defaulting for internal version.

@vdemeester
Copy link
Member

Yes. That's right, but I think we don't do defaulting for internal version.

Alright, I think it could be nice (we need to validate it works well) 👍🏼.

@Yongxuanzhang
Copy link
Member

Hi @tektoncd/catalog-maintainers, @tektoncd/catlin-collaborators. Just want to raise your attention and see if there are any concerns regarding the internal version proposal?
The proposal should not have any user facing changes, the impact for Pipeline developers would be:

  1. We need to add changes to both internal version and external version (e.g. v1)
  2. We need to implement conversions between internal version and external versions, in this proposal we use conversion-gen, but custom conversions may be needed in the future.

I will document all of the impacts into developer doc when implementing this feature.

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2023
@Yongxuanzhang Yongxuanzhang removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2023
@vdemeester
Copy link
Member

/lifecycle frozen

@tekton-robot tekton-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Done
Status: In Progress
Development

No branches or pull requests

6 participants