@@ -265,8 +265,9 @@ Until then, we will cover all the scenerios with e2e tests
265265
266266#### Alpha -> Beta Graduation
267267
268- * Addresses feedback from alpha testers
269268* Sufficient E2E and unit testing
269+ * Adding [ Windows node level test] ( https://github.com/kubernetes/kubernetes/pull/129938 ) , which will include the gracefulshutdown case.
270+ * [ Enabling the test in CAPZ cluster] ( https://github.com/kubernetes-sigs/windows-testing/pull/506 )
270271
271272#### Beta -> GA Graduation
272273
292293This section must be completed when targeting alpha to a release.
293294-->
294295
295- ###### How can this feature be enabled / disabled in a live cluster?
296+ * ** How can this feature be enabled / disabled in a live cluster?**
296297
297298- [X] Feature gate (also fill in values in ` kep.yaml ` )
298299 - Feature gate name: ` WindowsGracefulNodeShutdown `
@@ -301,58 +302,55 @@ This section must be completed when targeting alpha to a release.
301302 - Describe the mechanism:
302303 - Will enabling / disabling the feature require downtime of the control
303304 plane?
304- No
305+ - No
305306 - Will enabling / disabling the feature require downtime or reprovisioning
306307 of a node?
307- yes (will require restart of kubelet)
308+ - yes (will require restart of kubelet)
308309
309- ###### Does enabling the feature change any default behavior?
310+ * ** Does enabling the feature change any default behavior?**
310311
311- The main behavior change is that during a node shutdown, pods running on the
312+ * The main behavior change is that during a node shutdown, pods running on the
312313node will be terminated gracefully.
313314
314- ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
315+ * ** Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?**
315316
316- Yes, the feature can be disabled by either disabling the feature gate, or
317+ * Yes, the feature can be disabled by either disabling the feature gate, or
317318 setting ` kubeletConfig.ShutdownGracePeriod ` to 0 seconds.
318319
319- ###### What happens if we reenable the feature if it was previously rolled back?
320+ * ** What happens if we reenable the feature if it was previously rolled back?**
320321
321- Kubelet will attempt to perform graceful termination of pods during a
322- node shutdown.
322+ * Kubelet will attempt to perform graceful termination of pods during a
323+ node shutdown.
323324
324- ###### Are there any tests for feature enablement/disablement?
325+ * ** Are there any tests for feature enablement/disablement?**
325326
326- The e2e framework does not currently support enabling or disabling feature
327- gates.
328- We have e2e tests to cover the feature when it is enabled and some predefined
329- setting.
330- Will add node level integration tests when the node level test framework is available for Windows node
327+ * The e2e framework does not currently support enabling or disabling feature
328+ gates. We have e2e tests to cover the feature when it is enabled and some predefined
329+ setting. Will add node level integration tests when the node level test framework is
330+ available for Windows node
331331
332332### Rollout, Upgrade and Rollback Planning
333333
334334<!--
335335This section must be completed when targeting beta to a release.
336336-->
337337
338- ###### How can a rollout or rollback fail? Can it impact already running workloads?
338+ * ** How can a rollout or rollback fail? Can it impact already running workloads?**
339339
340- It wil not impact running workloads during rollout/rollback.
340+ * It wil not impact running workloads during rollout/rollback.
341341
342- ###### What specific metrics should inform a rollback?
342+ * ** What specific metrics should inform a rollback?**
343343
344- n/a
345-
346- The failure of the roll out will behave like disbling this feature, operators can check the kubelet log to get more specific info.
344+ * The failure of the roll out will behave like disbling this feature, operators can check the kubelet log to get more specific info.
347345ex: ` The windows node graceful shutdown has not been enabled, the reasons are xxx `
348346
349- ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
347+ * ** Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
350348
351- This is basically how all features work so upgrade and downgrade apply as normal.
349+ * The feature is part of kubelet config so updating kubelet config should enable/disable the feature; upgrade/downgrade is N/A
352350
353- ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
351+ * ** Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?**
354352
355- No
353+ * No
356354
357355### Monitoring Requirements
358356
@@ -363,11 +361,11 @@ For GA, this section is required: approvers should be able to confirm the
363361previous answers based on experience in the field.
364362-->
365363
366- ###### How can an operator determine if the feature is in use by workloads?
364+ * ** How can an operator determine if the feature is in use by workloads?**
367365
368- Check if the feature gate and kubelet config settings are enabled on a node.
366+ * Check if the feature gate and kubelet config settings are enabled on a node.
369367
370- ###### How can someone using this feature know that it is working for their instance?
368+ * ** How can someone using this feature know that it is working for their instance?**
371369
372370- [ ] Events
373371 - Event Reason:
@@ -377,36 +375,36 @@ Check if the feature gate and kubelet config settings are enabled on a node.
377375- [X] Other (treat as last resort)
378376 - Details: Pod.Status.Message, Pod.Status.Reason
379377
380- ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
378+ * ** What are the reasonable SLOs (Service Level Objectives) for the enhancement?**
381379
382- n/a
380+ * n/a
383381
384- ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
382+ * ** What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?**
385383
386384<!--
387385Pick one more of these and delete the rest.
388386-->
389387
390- - [ ] Metrics
391- - Metric name:
388+ - [x ] Metrics
389+ - Metric name: GracefulShutdownStartTime, GracefulShutdownEndTime
392390 - [ Optional] Aggregation method:
393- - Components exposing the metric:
394- - [X ] Other (treat as last resort)
391+ - Components exposing the metric: Kubelet
392+ - [x ] Other (treat as last resort)
395393 - Details: The operator can get the service health information from the logs
396394
397- ###### Are there any missing metrics that would be useful to have to improve observability of this feature?
395+ * ** Are there any missing metrics that would be useful to have to improve observability of this feature?**
398396
399- n/a
397+ * n/a
400398
401399### Dependencies
402400
403401<!--
404402This section must be completed when targeting beta to a release.
405403-->
406404
407- ###### Does this feature depend on any specific services running in the cluster?
405+ * ** Does this feature depend on any specific services running in the cluster?**
408406
409- No, this feature doesn't depend on any specific services running the cluster.
407+ * No, this feature doesn't depend on any specific services running the cluster.
410408
411409### Scalability
412410
@@ -420,33 +418,33 @@ For GA, this section is required: approvers should be able to confirm the
420418previous answers based on experience in the field.
421419-->
422420
423- ###### Will enabling / using this feature result in any new API calls?
421+ * ** Will enabling / using this feature result in any new API calls?**
424422
425- No
423+ * No
426424
427- ###### Will enabling / using this feature result in introducing new API types?
425+ * ** Will enabling / using this feature result in introducing new API types?**
428426
429- No
427+ * No
430428
431- ###### Will enabling / using this feature result in any new calls to the cloud provider?
429+ * ** Will enabling / using this feature result in any new calls to the cloud provider?**
432430
433- No
431+ * No
434432
435- ###### Will enabling / using this feature result in increasing size or count of the existing API objects?
433+ * ** Will enabling / using this feature result in increasing size or count of the existing API objects?**
436434
437- No
435+ * No
438436
439- ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
437+ * ** Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?**
440438
441- No
439+ * No
442440
443- ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
441+ * ** Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?**
444442
445- No
443+ * No
446444
447- ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
445+ * ** Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?**
448446
449- No
447+ * No
450448
451449### Troubleshooting
452450
@@ -461,17 +459,17 @@ splitting it into a dedicated `Playbook` document (potentially with some monitor
461459details). For now, we leave it here.
462460-->
463461
464- ###### How does this feature react if the API server and/or etcd is unavailable?
462+ * ** How does this feature react if the API server and/or etcd is unavailable?**
465463
466- The feature does not depend on the API server / etcd.
464+ * The feature does not depend on the API server / etcd.
467465
468- ###### What are other known failure modes?
466+ * ** What are other known failure modes?**
469467
470- n/a
468+ * n/a
471469
472- ###### What steps should be taken if SLOs are not being met to determine the problem?
470+ * ** What steps should be taken if SLOs are not being met to determine the problem?**
473471
474- n/a
472+ * n/a
475473
476474## Implementation History
477475
0 commit comments