Skip to content

Conversation

@HirazawaUi
Copy link
Contributor

@HirazawaUi HirazawaUi commented Aug 26, 2024

  • One-line PR description: promote feature gate to beta
  • Issue link:
  • Other comments:

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 26, 2024
@k8s-ci-robot k8s-ci-robot requested a review from dchen1107 August 26, 2024 15:42
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Aug 26, 2024
@k8s-ci-robot k8s-ci-robot requested a review from mrunalp August 26, 2024 15:42
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 26, 2024
@HirazawaUi HirazawaUi force-pushed the promote-4396-to-beta branch from 26ff8ac to 1090b25 Compare August 26, 2024 15:44
@HirazawaUi
Copy link
Contributor Author

/cc @thockin @jpbetz

@k8s-ci-robot k8s-ci-robot requested review from jpbetz and thockin August 26, 2024 15:44
@HirazawaUi
Copy link
Contributor Author

/cc @BenTheElder

###### How can a rollout or rollback fail? Can it impact already running workloads?

When a feature gate is closed, already running workloads are not affected in any way, but update fields for workload will cause the workload to fail.
When a feature gate is disabled, already running workloads are not affected in any way, but update fields for workload will cause the workload to fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What fields would change in an update for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any update to the workload's fields results in failure because pods need to be recreate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed this in the KEP but I don't understand how that is related.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we update any field of workloads that use an environment variable with relaxed validation when disabling feature gate, the workload might fail to recreate pods or ReplicaSets because it cannot pass the validation logic of the Apiserver.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So you are saying that if you have expanded environment variable support and then you downgrade it could be possible to start failing validation.

Your phrase above seems to be more general than what you expanded here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, unfortunately, true of any feature on pod - if you enable, use in Deployment, disable, then the ".. or already in use" logic doesn't matter much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used more friendly words to describe it :)

@thockin thockin changed the title KEP-4396: Promote to beta KEP-4369: Promote to beta Sep 10, 2024
@thockin thockin changed the title KEP-4369: Promote to beta KEP-4369: Promote to beta (allow ~all ASCII characters in env vars) Sep 10, 2024
@thockin
Copy link
Member

thockin commented Sep 16, 2024

Thanks!

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 16, 2024
@thockin thockin assigned jpbetz and SergeyKanzhelev and unassigned thockin Sep 16, 2024
@thockin
Copy link
Member

thockin commented Sep 16, 2024

This really needs a sig-node approver

@HirazawaUi
Copy link
Contributor Author

/cc @kubernetes/sig-node-leads Could anyone review and approve it?

@SergeyKanzhelev
Copy link
Member

/cc @kubernetes/sig-node-leads Could anyone review and approve it?

We will discuss KEPs and assign approvers at today's meeting that starts in a few minutes

@kannon92
Copy link
Contributor

/assign @mrunalp

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 19, 2024
Copy link
Member

@SergeyKanzhelev SergeyKanzhelev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 19, 2024
@HirazawaUi
Copy link
Contributor Author

/cc @jpbetz Could you review for it as PRR reviewer? It has already been approved by sig-node.

Copy link
Contributor

@jpbetz jpbetz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments about ratcheting validation for downgrade/rollback. LGTM for PRR once those are addressed.

###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?

If close the feature gate, already running workloads will not be affected in any way,
If disable the feature gate, already running workloads will not be affected in any way,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also updated the above "Downgrade" section ("users need to reset their environment variables for special characters to normal characters."). I was expecting it to say something like "After downgrade, environment variables containing special characters will continue to work as expected, but any writes to resources to add or change environment variables must set the environment variable names to only use normal characters."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested.

###### How can a rollout or rollback fail? Can it impact already running workloads?

When a feature gate is closed, already running workloads are not affected in any way, but update fields for workload will cause the workload to fail.
When the feature gate is disabled, workloads that are already running will not be affected. However, if user update the workloads, they may fail to recreate pods or ReplicaSets due to failing the Apiserver's validation logic, which could cause the workloads to fail.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a environment variable has special characters, and the cluster is rolled back, do updates that change other fields (but that do not modify the fields containing environment variables) fail? I am expecting updates like this to be allowed.

The trick here is usually to modify the validation rules to compare the old value with the new value for updates, and allow the less restrictive rule (use of special characters in this situation) so long as the field value is not changing. This way, only controllers that are actually changing the environment fields are at risk of being impacted by a rollback.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified as suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for my hindsight. I suddenly realized that we can't achieve this. Modifying any controller will delete the old pods and create new ones, and we can't retrieve the spec of the old pods during pod creation validation. So, after disabling the feature gate, we won't be able to update other fields of the workload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I follow. That limitation makes sense to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for confirming! @HirazawaUi I think what is implemented should be OK than


```
kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.spec.containers[].env[]?.name | test("^[a-zA-Z_][a-zA-Z0-9_]*$") | not) | [.metadata.namespace, .metadata.name, .spec.containers[].env[]?.name] | @tsv'
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including a command!

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 19, 2024
@HirazawaUi
Copy link
Contributor Author

ping @jpbetz

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 29, 2024
@jpbetz
Copy link
Contributor

jpbetz commented Sep 29, 2024

/approve
For PRR

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: HirazawaUi, jpbetz, SergeyKanzhelev, thockin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@HirazawaUi
Copy link
Contributor Author

@SergeyKanzhelev Could you lgtm for it again?

@thockin
Copy link
Member

thockin commented Sep 29, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit e85182b into kubernetes:master Sep 29, 2024
@k8s-ci-robot k8s-ci-robot added this to the v1.32 milestone Sep 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

Development

Successfully merging this pull request may close these issues.

7 participants