Replicas update one-by-one regardless of rolling update parameter maxUnavailable #315

xgchena · 2025-01-09T23:50:10Z

What happened:

As the subject, when maxUnavailable is set to 2 or more, the rolling update still refresh replicas one by one.

What you expected to happen:

The rolling update should honor the maxUnavailable parameter and work as documented.

How to reproduce it (as minimally and precisely as possible):

Preparation:

See the Environment section for the setup.
The test app is the vllm example plus the following changes,
- Change replicas from 2 to 5
- Add the following rolling strategy, note the maxUnavailable parameter is set to 2
```
rolloutStrategy:
  type: RollingUpdate
  rollingUpdateConfiguration:
    maxUnavailable: 2
```

Steps to reproduce:

Install the test app
When the app is up and running, make a change to the lws manifest (e.g. add a dummy env var to both leader and worker template), then apply the new manifest
Monitor the pods status change.

Expected result: the pod groups should be updated by 2, i.e. index 4 and 3 are updated, then index 2 and 1, and finally index 0
Actual result: the pod groups are updated one by one from index 4 to 0.

Anything else we need to know?:

The problem is that the rolling update option maxUnavailable depends on a StatefulSet feature in alpha state (FEATURE STATE: Kubernetes v1.24 [alpha]), and alpha features are not enabled in production stage of cloud providers. For example,

EKS. Copied from EKS extended support FAQs. Currently there is no way for users to customize the feature gates, see a feature request open since 2019.

Which Kubernetes features are supported by Amazon EKS?
Amazon EKS supports all generally available (GA) features of the Kubernetes API. Starting with Kubernetes version 1.24, new beta APIs aren’t enabled in clusters by default. However, previously existing beta APIs and new versions of existing beta APIs continue to be enabled by default. Alpha features aren’t supported.

AKS: Copied from Unsupported alpha and beta Kubernetes features. Similarly, no public approach for users to enable alpha features, see this open feature request.

Unsupported alpha and beta Kubernetes features
AKS only supports stable and beta features within the upstream Kubernetes project. Unless otherwise documented, AKS doesn't support any alpha feature that is available in the upstream Kubernetes project.

GKE: Copied from Alpha Kubernetes features in GKE. Not an GKE expert but I feel it not a good practice to use Alpha clusters as production environment.

Alpha Kubernetes features in GKE
Alpha Kubernetes features are disabled by default in all GKE clusters. GKE might enable a specific alpha feature in a specific control plane version.
To enable all alpha Kubernetes features, create an alpha Standard cluster.
Warning: Alpha clusters are intended for experimental purposes and not for production workloads.

The maxUnavailable support is critical for real world use cases like large model deployment. For example, for a cluster with 80 model replicas, if updating a single replica takes 20 minutes, resulting in a total update time of over one day. But if maxUnavailable is working, setting it to 20%, the total update time would reduce to less than two hours.

If the MaxUnavailableStatefulSet feature is not going GA soon, an idea to mitigate the problem is adding a polyfill: porting the related StaetfulSet controller logic to LWS controller. Is the idea reasonable to you?

Environment:

Kubernetes version (use kubectl version): v1.31.3
LWS version (use git describe --tags --dirty --always): v0.3.0-8-ga4c468e
Cloud provider or hardware configuration: AWS EKS (server version v1.31.3-eks-56e63d8), node instance type g4dn.2xlarge
OS (e.g: cat /etc/os-release): Amazon Linux 2
Kernel (e.g. uname -a): 5.10.218
Install tools: N/A
Others: N/A

The text was updated successfully, but these errors were encountered:

kerthcet · 2025-01-21T11:35:56Z

Yes, enable the maxUnavailable is the default requirement for lws, but as you mentioned, this is blocked with kubernetes/enhancements#961 for a long time.

If the MaxUnavailableStatefulSet feature is not going GA soon, an idea to mitigate the problem is adding a polyfill: porting the related StaetfulSet controller logic to LWS controller.

We rely on statefulset's reconciliation logic, so I personally think this is infeasible if we do not touch the statefulset.

ahg-g · 2025-01-27T05:03:08Z

@kerthcet can we update the docs to clearly indicate this dependency?

xgchena · 2025-02-01T00:48:40Z

Thanks Kante. By "polyfill" I mean after reconciling the leader statefulset, the partition is correctly set, then leaderworkerset_controller can clone the StaetfulSet controller logic to loop through the replicas and then delete any stale pods. LWS can do a nil check of MaxUnavailable of the leader statefulset spec to see if the feature is enabled or not, then apply/skip the polyfill accordingly. This way LWS MaxUnavailable support won't be blocked by the Statefulset feature state.

kerthcet · 2025-02-07T06:37:49Z

Yes, it's doable by copy-paste the code, but I don't think we'll go in this way, instead we should try to push the feature to beta and ga finally. The feature is blocked mostly because no volunteers.

But if we see more requirements on this, we may consider next. Thanks anyway, I'll update the doc first, but let's keep this issue open.

kerthcet · 2025-02-07T06:38:36Z

/remove-kind bug
/kind feature

Make this a feature instead.

kerthcet · 2025-02-07T08:02:06Z

@kerthcet can we update the docs to clearly indicate this dependency?

PTAL #362

xgchena added the kind/bug Categorizes issue or PR as related to a bug. label Jan 9, 2025

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 7, 2025

kerthcet mentioned this issue Feb 7, 2025

Add feature gate introduction for MaxUnavailableStatefulSet #362

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replicas update one-by-one regardless of rolling update parameter maxUnavailable #315

Replicas update one-by-one regardless of rolling update parameter maxUnavailable #315

xgchena commented Jan 9, 2025

kerthcet commented Jan 21, 2025

ahg-g commented Jan 27, 2025

xgchena commented Feb 1, 2025

kerthcet commented Feb 7, 2025

kerthcet commented Feb 7, 2025

kerthcet commented Feb 7, 2025

Replicas update one-by-one regardless of rolling update parameter maxUnavailable #315

Replicas update one-by-one regardless of rolling update parameter maxUnavailable #315

Comments

xgchena commented Jan 9, 2025

kerthcet commented Jan 21, 2025

ahg-g commented Jan 27, 2025

xgchena commented Feb 1, 2025

kerthcet commented Feb 7, 2025

kerthcet commented Feb 7, 2025

kerthcet commented Feb 7, 2025