Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Per Feature Feature Flags #5632

Closed
Tracked by #6966
dibyom opened this issue Oct 11, 2022 · 14 comments
Closed
Tracked by #6966

Per Feature Feature Flags #5632

dibyom opened this issue Oct 11, 2022 · 14 comments
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@dibyom
Copy link
Member

dibyom commented Oct 11, 2022

Feature request

TEP-033 proposed an enable-api-fields flag for features that require an API field. This allows us to group and enable/disable all "alpha"/"beta" features together.

An alternative approach is having a way to enable/disable each feature by via its own flag (such as the approach taken by Kubernetes feature gates)

This issue is for tracking experimentation for per feature feature gates.

Use case

  1. A user would like to enable a specific alpha feature but not any other alpha features
@dibyom dibyom added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 11, 2022
@dibyom
Copy link
Member Author

dibyom commented Oct 11, 2022

One though I just had - per TEP-033, fields that are behind an alpha/beta flag can be gradually transitioned from alpha to beta to GA. For features that are not controlled by a field, TEP-033 says they should have their own flag. But I do not think it is possible to move them gradually from alpha to beta to GA and instead they must either be enabled or disabled within a particular API version

@Yongxuanzhang
Copy link
Member

comment for tracking this issue

@Yongxuanzhang
Copy link
Member

I encounter this issue when implementing trusted resources in #5581
We have a feature flag ResourceVerificationMode to let users choose from skip, enforce and warn. So if it is set to skip the feature will not be enabled.
This is an alpha feature and it may not be ideal to let users enable 2 flags (ResourceVerificationMode and enable-api-fields).

@dibyom
Copy link
Member Author

dibyom commented Oct 20, 2022

@chuangw6 also mentioned the same thing for his ConfigSource status field proposal

@chuangw6
Copy link
Member

We have a feature flag ResourceVerificationMode to let users choose from skip, enforce and warn. So if it is set to skip the feature will not be enabled.

Thanks @dibyom. This is a good idea.
#5670 is the implementation that adds a dedicated feature flag to control the provenance field in taskrun/pipelinerun status.

@afrittoli
Copy link
Member

To give extra background, the reason we decided to have a single flag instead of per-feature ones is the amount of testing that that would require, because while features are (often) clearly separated from one another at the API level, they may be more intertwined in the controller code - so enabling one feature out of 10 may behave differently from enabling all 10.

The alternative to feature flags (whether per feature or just one) would be to create v2alphaN APIs and v2betaN APIs.
Multiple features would be included in each alpha and beta version, similar to the way features are grouped under one flag today.

About k8s feature flags, I may be wrong but I think they mainly relate to back-end behaviour, even though there is no clear distinction documented like for Tekton.

If we decide to have feature-specific flags for all features, we shall document the test strategy. I believe each feature should then come with a dedicated set of e2e tests (like today) to be executed in an environment with only that feature enabled.
We could replace the current alpha test with an env where all features in alpha state are enabled.

@dibyom
Copy link
Member Author

dibyom commented Nov 4, 2022

To give extra background, the reason we decided to have a single flag instead of per-feature ones is the amount of testing that that would require, because while features are (often) clearly separated from one another at the API level, they may be more intertwined in the controller code - so enabling one feature out of 10 may behave differently from enabling all 10.

I agree, the testing story is complicated but I don't think that by itself should be why we should only have a single flag. The most common use case for alpha features is to try out a single alpha feature, not all of them altogether.

Feature flags implementations being intertwined is not great either - wherever possible a feature flag should only control the behavior of that specific feature and nothing else. I agree this can be tricky - K8s checks for this sort of thing in their Production Readiness Reviews - we could do something similar and have some guidelines to specifically check for this when adding a new feature gate.

From a testing standpoint I think a reasonable option is to test will all feature flags on and all of them off (like we do today). We can document this and users who turn on only a subset of the flags can be responsible for their own testing.

The alternative to feature flags (whether per feature or just one) would be to create v2alphaN APIs and v2betaN APIs.
Multiple features would be included in each alpha and beta version, similar to the way features are grouped under one flag today.

In theory yes, but I think the complexity involved in maintaining N active API versions is far greater than just having feature flags (e.g. there can only be one api version that is stored so that version has to be able to represent info from all N API versions)

About k8s feature flags, I may be wrong but I think they mainly relate to back-end behaviour, even though there is no clear distinction documented like for Tekton.

I think many of the feature gates are for backend features but there also user facing ones such as the PodSecurity one that gates the new PodSecurityAdmission feature which replaces PodSecurityPolicies

If we decide to have feature-specific flags for all features, we shall document the test strategy. I believe each feature should then come with a dedicated set of e2e tests (like today) to be executed in an environment with only that feature enabled.

💯 . I'm not sure if it is realistic to test every combination of flags but we can definitely clearly state what we test and users who use a different subset of flags can be responsible for their own testing. That being said, I do think we should do our best to make sure a feature gate is only scoped to that particular feature.

We could replace the current alpha test with an env where all features in alpha state are enabled.

Agreed!

@tekton-robot
Copy link
Collaborator

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 2, 2023
@tekton-robot
Copy link
Collaborator

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle rotten

Send feedback to tektoncd/plumbing.

@tekton-robot tekton-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 4, 2023
@tekton-robot
Copy link
Collaborator

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

@tekton-robot
Copy link
Collaborator

@tekton-robot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen with a justification.
Mark the issue as fresh with /remove-lifecycle rotten with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/close

Send feedback to tektoncd/plumbing.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@dibyom dibyom reopened this May 3, 2023
@dibyom
Copy link
Member Author

dibyom commented May 3, 2023

Reopening this in light of #6592 - we should consider whether per feature flags can help prevent issues such as #6607

@dibyom dibyom removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label May 3, 2023
@lbernick lbernick added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label May 4, 2023
@lbernick
Copy link
Member

lbernick commented May 8, 2023

I wanted to share this doc with some thoughts from a k8s maintainer on some of the issues they've run into w/ per-feature flags. A lot of it is related to rollbacks which I think we've been less concerned about than k8s has. To make rollbacks easier, they drop unsupported fields rather than rejecting CRDs containing them, as we do. I still think we should implement per-feature flags, but this doc could help us understand what issues we might run into.

The issues are basically:

  • adding version validation in implementation in addition to apiserver operations (this has caused problems for us as well with "enable-api-fields", see Versioned validation of referenced Pipelines/Tasks #6616)
  • concerns about rolling back from a more stable implementation of a feature to a less stable implementation of a feature
  • different versions of k8s or values of feature flags between apiserver and node (not relevant for us)
  • minimal feedback on alpha features

@JeromeJu
Copy link
Member

Looks like we can also close this with the completion of #7090 at v0.53.0 milestone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
Status: Done
Development

No branches or pull requests

7 participants