Multi-zone / Max-in-flight #1306

andrew-edgar · 2021-04-28T15:48:50Z

Working on how multi-zone and max-in-flight support should be done in quarks-operator and quarks-statefulset.

@manno @jandubois @rj1 I have been testing a couple of things with regards to “max-in-flight”. There were “several” issues that will never get solved until K8S supports a similar functionality as “maxUnavailable” in deployments.

I wanted to give you an update on what tests and functionality I have done.

I have 2 pending PRs for mutli-zone support. (change the naming of instance_groups in multizone to use real zone name #1303 and Change the naming of stateful sets with multi zones to match the real zone name quarks-statefulset#65). The PRs changed from using “zX” in the stateful set to using “zonename“. The purpose of that was to enable our use case of having a single control plane and multiple “shard (minion)” clusters. We are planning to have each shard being a single “zone” but required to use the “multi-zone” functionality to ensure no conflicting cell_ids and naming across clusters. This functionality was working but I would prefer that you do NOT yet merge those PRs as my work on max-in-flight piggy backed on this change and did need some tweaking.
I have another change I’m planning to PR that uses topologySpreadContraints to distribute across “azs” inside a K8S cluster. This would mean we would “not” use the multi-zone config which created multiple stateful sets as this feature causes the pods to correctly spread across the zones without having to require multiple statefulsets. This piece is independant and I’m scoping it to be added to the statfulset ONLY if there is no “zone” for the instance group
now back to max-in-flight 🙂. This is a “hard” problem and we went about solving it by using “azs” in the instance_group definition. What we do now though is have “duplicate” azs. so the azs in the instance_group looks like this …

azs:
  - eu-de-1
  - eu-de-1
  - eu-de-1

With some tweaks to the “zones” support that exists now. I can cause 3 statefulsets to be created, for example for diego-cell I now get 3 statefulsets named …

diego-cell-eu-de-1-0
diego-cell-eu-de-1-1
diego-cell-eu-de-1-2

And if I have multiple instances (3) in each statefulset I now get pod naming like this …

diego-cell-eu-de-1-0-0
diego-cell-eu-de-1-0-1
diego-cell-eu-de-1-0-2
diego-cell-eu-de-1-1-0
diego-cell-eu-de-1-1-1
diego-cell-eu-de-1-1-2
diego-cell-eu-de-1-2-0
diego-cell-eu-de-1-2-1
diego-cell-eu-de-1-2-2

Leaving the thoughts of canaries behind. We now can see 3 “cells” being created/updated in parallel and the names are all different.

This is quite a conceptual change and I have not tested going from an old version of quarks to a new version using an update but on a “new” deploy this does look and work perfectly to allow our use case.

From a kubecf perspective we have in the helm chart the “multi-zone” feature which in theory when false will remove all “azs” from all instance_groups. I would like to have a hybrid model where “some” instance groups are multizone (to allow for max-in-flight) like “doppler” but some NOT multizone like diego-api (which would then use topologySpreadConstraints.

So long story short ... Should I submit the changes in PRs? What is the feeling from the dev team about merging these changes and will they be accepted.

All the code changes I have tested with have good "tests" developed to prove that they will do what is expected.

Thanks,

Andrew

The text was updated successfully, but these errors were encountered:

cf-gitbot · 2021-04-28T15:48:53Z

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/177950101

The labels on this github issue will be updated when the story is started.

andrew-edgar added the enhancement New feature or request label Apr 28, 2021

cf-gitbot added the unscheduled label Apr 28, 2021

andrew-edgar mentioned this issue May 25, 2021

change the naming of instance_groups in multizone to use real zone name #1303

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-zone / Max-in-flight #1306

Multi-zone / Max-in-flight #1306

andrew-edgar commented Apr 28, 2021 •

edited

Loading

cf-gitbot commented Apr 28, 2021

Multi-zone / Max-in-flight #1306

Multi-zone / Max-in-flight #1306

Comments

andrew-edgar commented Apr 28, 2021 • edited Loading

cf-gitbot commented Apr 28, 2021

andrew-edgar commented Apr 28, 2021 •

edited

Loading