Refactor nodes orchestration to rely on StatefulSets #1463

sebgl · 2019-08-02T10:04:58Z

Merge the statefulset-refactoring feature branch into master.

Change the way Elasticsearch nodes are orchestrated to rely on StatefulSets instead of managing pods.

The implementation is not complete yet. In particular:

it ignores any change budget considerations
if a StatefulSet needs to be replaced, the entire new sset will be added (with all replicas) before the old one is shut down (budget ignored)
it does not support grow and shrink, but does rolling upgrades by default
it does not remove PVCs, which can be a problem when creating/deleting multiple times the same stack: data volumes from the first stack will be reused for the second one
it probably has some orchestration bugs :)
it is missing a bit of refactoring and unit testing

However, it seems to be good enough to spawn stacks, apply any mutation, and run E2E tests.

Relates #1299.

This facilitates a move to StatefulSets where the mounted secrets must be the same between all the Pods in the same StatefulSet

Store transport keys and certificates in a single shared secret.

…1205) Build a set of StatefulSets matching the ES CRD NodeSpec, and reconcile them instead of reconciling pods. This is a very first iteration. Bootstrapping an ES cluster seems to work fine. Upgrading this cluster (and everything related to zen1/zen2) does not. Main bits of this change: - Require the name field in the NodeSpec, to derive the StatefulSet name - Build a StatefulSet per NodeSpec, along with a headless service - Label the StatefulSet with a hash of its template, for comparison purpose - Create a single Config secret per StatefulSet (instead of per pod) - Label the pod template with a hash of the config (before zen2 initial master nodes), to rotate pods on config changes - Reconcile StatefulSet, service and config in the default driver, commenting out most of the existing pod reconciliation logic The code is left in a rather dirty state: I had to make a few changes deep down in functions that build pod spec and config. These changes bubble up pretty high, to changes calculations and the default driver. Most of the default driver code and the mutation package is probably doomed, I chose to keep it there commented instead of deleting it for now, since it I think it may help in the short-term refactoring ("where does that logic live in the new code? here? OK I can remove."). Let me know if it still deserves more cleanup. To be refactored and improved in follow-up commits.

Add support for scaling a StatefulSet up (just update sset replicas) or down (migrate data away then remove nodes when ready). Which also adds support for renaming a StatefulSet, by creating the new sset, then slowly remove the existing one with data migration. It ignores any zen1/zen2 consideration for now (works fine with zen2 in most cases). It ignores any change budget consideration for now. Unit tests and E2E tests are missing, but a refactoring is to be expected to handle zen2/zen2 and the changeBudget. I'd prefer waiting for this refactoring to happen before dealing with large unit testing. Consider this as still work in progress. The PR also modifies the way pods and ssets are watched by the Elasticsearch controller.

* Support for APM server configuration (#1181) * Add a config section to the APM server configuration * APM: Add support for keystore * Factorize ElasticsearchAuthSettings * Update dev setup doc + fix GKE bootstrap script (#1203) * Update dev setup doc + fix GKE bootstrap script * Update wording of container registry authentication * Ensure disks removal after removing cluster in GKE (#1163) * Update gke-cluster.sh * Implement cleanup for unused disks in GCP * Update Makefile * Update CI jobs to do proper cleanup * Normalize the raw config when creating canonical configs (#1208) This aims at counteracting the difference between JSON centric serialization and the use of YAML as the serialization format in canonical config. If not normalizing numeric values like 1 will differ when comparing configs as JSON deserializes integer numbers to float64 and YAML to uint64. * Homogenize logs (#1168) * Don't run tests if only docs are changed (#1216) * Update Jenkinsfile * Simplify notOnlyDocs() * Update Jenkinsfile * Push snapshot ECK release on successful PR build (#1184) * Update makefile's to support snapshots * Add snapshot releases to Jenkins pipelines * Cleanup * Rename RELEASE to USE_ELASTIC_DOCKER_REGISTRY * Update Jenkinsfile * Add a note on EKS inbound traffic & validating webhook (#1211) EKS users must explicitly enable communication from the k8s control plane and nodes port 443 in order for the control plane to reach the validating webhook. Should help with #896. * Update PodSpec with Hostname from PVC when re-using (#1204) * Bind the Debug HTTP server to localhost by default (#1220) * Run e2e tests against custom Docker image (#1135) * Add implementation * Update makefile's * Update Makefile * Rename Jenkisnfile * Fix review comments * Update e2e-custom.yml * Update e2e-custom.yml * Return deploy-all-in-one to normal * Delete GKE cluster only if changes not in docs (#1223) * Add operator version to resources (#1224) * Warn if unsupported distribution (#1228) The operator only works with the official ES distributions to enable the security available with the basic (free), gold and platinum licenses in order to ensure that all clusters launched are secured by default. A check is done in the prepare-fs script by looking at the existence of the Elastic License. If not present, the script exit with a custom exit code. Then the ES reconcilation loop sends an event of type warning if it detects that a prepare-fs init container terminated with this exit code. * Document Elasticsearch update strategy change budget & groups (#1210) Add documentation for the `updateStrategy` section of the Elasticsearch spec. It documents how (and why) `changeBudget` and `groups` are used by ECK, and how both settings can be specified by the user.

Handle cluster rolling upgrades with StatefulSets. Relies on manually updating the rollingUpdate.Partition index of a sset when we're ready to restart the ES node. For now, it only handles 1 node restart at a time, mostly because we rely on cluster health green checks. It also introduces 2 additional concepts: StatefulSet updates expectations, based on UID + Generation: there are certain ES calls we don't want to make if our StatefulSet cache is out of date "lazy" ES client calls: some ES calls we do to handle the upgrade cannot rely on out-of-date information from observers. We need fresh ES information on the state of the cluster before eg. re-enabling shard allocations. However we don't want to perform these calls at every single reconciliation. I see this as a "stop-gap" implementation before we go on with a larger refactoring of sync/async ES calls. This is still WIP since many things are still missing: changeBudget considerations. zen1/zen2 handling. Unit tests, everywhere. I still think a large portion of this code may be refactored in the short-term to account for change budget & zen1/zen2. Keeping unit tests for a later PR. E2E tests, kept for a later PR.

With many conflicts fixed.

Merge master - with conflicts fixed

Merge master into statefulset-refactoring

* Use the setvmmaxmapcount initcontainer by default in E2E tests (#1300) Let's keep our default defaults :) The setting is disabled explicitly for E2E tests where we enable a restricted security context. * Add docs for plugins, custom configuration files and secure settings (#1298) * Allow license secret webhook to fail (#1301) Webhooks on core k8s objects are just too debilitating in case our webhook service fails. This sets the failure policy for the secret webhook to ignore to strike a balance between UX (immediate feedback) and keeping the users k8s cluster in a working state. Also we have an additional validation run on controller level so this does not allow circumventing our validation logic. * Revert "Use the setvmmaxmapcount initcontainer by default in E2E tests (#1300)" (#1302) This reverts commit fff1526. This commit is breaking our E2E tests chain, which deploy a PodSecurityPolicy by default. Any privileged init container will not work. I'll open an issue for a longer-term fix to properly handle this. * Update quickstart (#1307) * Update the name of the secret for the elastic user * Bump the Elastic Stack version from 7.1.0 to 7.2.0 * Change Kibana readiness endpoint to return a 200 OK (#1309) The previous endpoint returned an http code 302. While this is fine for Kubernetes, some derived systems like GCP LoadBalancers mimic the container readiness check for their own readiness check. Except GCP Loadbalancers only work with status 200. It's not up to us to adapt GCP LoadBalancers to K8s, but this is a fairly trivial fix. * Fix pod_forwarder to support two part DNS names, adjust e2e http_client (#1297) * Fix pod_forwarder to support two part DNS names, adjust e2e http_client url * Revert removing .svc in e2e http_client * [DOC] Resources management and volume claim template (#1252) * Add resources and persistent volume templates documentation * Ignore resources reconciled by older controllers (#1286) * Document PodDisruptionBudget section of the ES spec (#1306) * Document PodDisruptionBudget section of the ES spec I suspect this might slightly change in the feature depending on how we handle the readiness check, so I'm keeping this doc minimal for now: * what is a PDB, briefly (with a link) * default PDB we apply * how to set a different PDB * how to disable the default PDB * Move version out from Makefile (#1312) * Add release note generation tool (#1314) * no external dependencies * inspects PRs by version label * generates structured release notes in asciidoc grouped by type label * Add console output to standalone apm sample (#1321) * Update Quickstart to 0.9.0 (#1317) * Update doc (#1319) * Update persistent storage section * Update kibana localhost url to use https * Update k8s resources names in accessing-services doc * Mention SSL browser warning * Fix bulleted list * Add CI job for nightly builds (#1248) * Move version to a file * Add CI implementation * Update VERSION * Depend on another PR for moving out version from Makefile * Update Jenkinsfile * Don't build and push operator image in bootstrap-gke (#1332) We don't need to do that anymore, since we don't use an init container based on the operator image. * Remove Docker image publishing from devops-ci (#1339) * Suppress output of certain commands from Makefile (#1342) * Document how to disable TLS (#1341) * Use new credentials for Docker registry (#1346) * Workaround controller-runtime webhook upsert bug (#1337) * Fix docs build on PR job (#1351) * Fix docs build on PR job * Cleanup workspace before doing other steps * APM: remove "output" element and add elasticsearchRef (#1345) * Don't rely on buggy metaObject Kind (#1324) * Don't rely on buggy metaObject Kind A bug in our client implementation may clear the object's Kind on certain scenarios. See kubernetes-sigs/controller-runtime#406. Let's avoid that by fixing a constant Kind returned by a method call on the resource.

Merge master into statefulset-refactoring

Change the way we setup zen1 and zen2 settings to adapt to StatefulSets upscale and downscale. Also improve StatefulSet downscale code, which is moved to its own file, and add a few helper functions around pods and master nodes. In a nutshell: set minimum_master_nodes (zen1) and initial_master_nodes (zen2) in the configuration of new nodes to create, corresponding to the expected nodes that will be created. The setting is automagically updated in existing nodes configuration , where it is ignored until nodes restart. update minimum_master_nodes (zen1) and voting_config_exclusions (zen2) whenever we remove a node. For now, we don't (yet) do any change on rolling upgrades, which are limited to a single node at a time at the moment. This only impacts sset scale up/down.

* Add mandatory NodeSpec name * Rename apm samples resources to something unique Otherwise, volumes of resources from other tests are reused for this one. Should be solved once we remove PVCs in #1288.

Refactor the main driver and ES pod creation logic, towards simplification of the codebase. Main changes: StatefulSet and PodSpec creation logic lives in the NodeSpec package No more catch-all version/common.go. The only difference between versions for now is explicitly zen1 vs. zen2. No more pod package (with conflicting variable/package name) No more nested indirection from driver -> defaultDriver -> NewPodSpecParams -> podSpec(func, func, func, func) -> buildPodTemplateSpec(func, func, func, func). Default driver code attempts to be minimalistic (while still being a sequential list of function calls) No more "generic resources" since that was only for the external service anyway Remove deprecated code in mutation, pvc, pod package (a lot of the comparison/CalculateChanges/PVC reuse/PodWithConfig/PodSpecContext mechanism we had before) * Move supported version check from driver to version pkg * Simplify driver creation * Extract driver nodes reconciliation to its own file * Move expectations to the common pkg and add unit tests * Move configmap reconciliation to its own file * Move DynamicEnvVars to defaults pkg and use it in more places * Extract es env vars away from the version pkg * Move keystore init container params away from the main driver * Move sset build logic into the nodespec pkg * Move readiness probe in the nodespec pkg * Move some es pod consts into the nodespec pkg * Refactor pod template build * Remove concept of generic resources since used for a single service * Remove deprecated pod creation + PVC reuse code * Remove most of the mutation package (RIP) * Remove the PVC package (RIP) * Remove PodWithConfig and PodSpecContext (yay!) * Get rid of the pod pkg (yay!) * Fix supported versions initialization * Add missing license header * Remove deprecated pv reuse e2e test * Remove a bunch of unused functions * Make linter happy with appendAssign * Fix downward env defaults in apm unit tests * Move PodsByName to the utils pkg * Fix typo * Fix reference to the keystore initcontainer and add a unit test * Remove dead code * Address PR review

artemnikitin · 2019-08-02T19:16:35Z

@elasticmachine, run elasticsearch-ci/docs

To cleanup resources and avoid any PVC reuse for different E2E stacks, let's remove PVCs at the end of E2E tests. This should probably be done by the operator directly, see #1288.

sebgl · 2019-08-05T07:56:41Z

@anyasabo I think the samples close to the code (config/samples) are updated already. Are you referring to examples in our documentation?

david-kow

Few comments and questions.

operators/pkg/apis/elasticsearch/v1alpha1/elasticsearch_types.go

operators/pkg/controller/elasticsearch/certificates/transport/pod_secret.go

operators/pkg/controller/elasticsearch/driver/nodes.go

operators/pkg/controller/elasticsearch/nodespec/resources.go

operators/pkg/controller/elasticsearch/driver/driver.go

operators/pkg/controller/elasticsearch/driver/upgrade.go

operators/test/e2e/test/elasticsearch/steps_deletion.go

operators/pkg/controller/elasticsearch/driver/nodes.go

operators/pkg/controller/elasticsearch/version/zen1/compatibility_test.go

nkvoll and others added 25 commits July 5, 2019 16:15

Store transport keys and certificates in a single shared secret.

3b24cd3

This facilitates a move to StatefulSets where the mounted secrets must be the same between all the Pods in the same StatefulSet

Merge pull request #1198 from nkvoll/no-pod-specific-transport-secret

b48109c

Store transport keys and certificates in a single shared secret.

Add sset upgrade expectations based on Generation

bb7e511

Improve es client routing allocation settings calls

eda6051

Add a helper struct to lazily call the ES cluster for upgrade needs

242411c

Add StatefulSet helper functions

a08537c

Get rid of the previous expectations implementation

89c2078

Handle StatefulSet rolling upgrades

ee2974d

Merge branch 'master' into statefulset-refactoring

26936d7

With many conflicts fixed.

Fix linter warnings

443cde7

Merge pull request #1266 from sebgl/sset-branch-conflicts

39293d7

Merge master - with conflicts fixed

Merge branch 'master' into statefulset-refactoring

2e405b2

Merge pull request #1296 from sebgl/sset-master-merge

67b4429

Merge master into statefulset-refactoring

sset: small optimization/fix (#1322)

c0194f6

Merge branch 'master' into statefulset-refactoring

01585a3

Merge pull request #1439 from sebgl/merge-master

b8ccff1

Merge master into statefulset-refactoring

Merge branch 'master' into statefulset-refactoring

403d2ac

Fix E2E tests to work with StatefulSets (#1459)

1fce57a

* Add mandatory NodeSpec name * Rename apm samples resources to something unique Otherwise, volumes of resources from other tests are reused for this one. Should be solved once we remove PVCs in #1288.

sebgl added v1.0.0-beta1 >breaking labels Aug 2, 2019

sebgl mentioned this pull request Aug 2, 2019

Meta-issue: StatefulSet refactoring #1299

Closed

18 tasks

sebgl added >enhancement Enhancement of existing functionality >refactoring labels Aug 2, 2019

sebgl added 4 commits August 5, 2019 09:43

Remove PVCs at the end of E2E tests (#1464)

1b89f47

To cleanup resources and avoid any PVC reuse for different E2E stacks, let's remove PVCs at the end of E2E tests. This should probably be done by the operator directly, see #1288.

Remove local-volume es sample

612f3a6

Improve log fields names

b29dd50

Fix min version comment

4294ca7

Merge branch 'master' into statefulset-refactoring

25d20f1

david-kow reviewed Aug 5, 2019

View reviewed changes

operators/pkg/controller/elasticsearch/driver/upgrade.go Outdated Show resolved Hide resolved

operators/pkg/controller/elasticsearch/driver/upgrade.go Show resolved Hide resolved

sebgl added 10 commits August 5, 2019 10:34

Fix wrong keystore command from merge conflict

63e3664

Fix missing dependency in Gopkg.lock

c7952b8

Remove pod name creation helpers

a85d067

Remove useless services in transport certs generation

57187c1

Minor fixes in transport secrets logs and comments

478e791

Update scheduledUpgrades only once per iteration

f87f4b4

Remove basename comparisons in E2E tests

8402c7b

Run make generate for missing comment

b55bc08

Work with the list of filtered ssets to update

bef2867

Merge branch 'master' into statefulset-refactoring

6dce278

david-kow reviewed Aug 6, 2019

View reviewed changes

operators/test/e2e/test/elasticsearch/steps_deletion.go Show resolved Hide resolved

Merge branch 'master' into statefulset-refactoring

7b8b2c1

pebrc reviewed Aug 6, 2019

View reviewed changes

operators/pkg/controller/elasticsearch/driver/nodes.go Outdated Show resolved Hide resolved

david-kow mentioned this pull request Aug 6, 2019

Should we rename nodes in Elasticsearch spec #1488

Closed

pebrc approved these changes Aug 6, 2019

View reviewed changes

operators/pkg/controller/elasticsearch/version/zen1/compatibility_test.go Outdated Show resolved Hide resolved

operators/pkg/controller/elasticsearch/version/zen1/compatibility_test.go Outdated Show resolved Hide resolved

sebgl added 2 commits August 7, 2019 09:22

Clarify expectations comment

ff405dc

Fix unit test name to match function name

f013f24

sebgl merged commit ab310df into master Aug 7, 2019

pebrc removed >enhancement Enhancement of existing functionality >refactoring labels Oct 9, 2019

sebgl deleted the statefulset-refactoring branch February 12, 2020 08:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor nodes orchestration to rely on StatefulSets #1463

Refactor nodes orchestration to rely on StatefulSets #1463

sebgl commented Aug 2, 2019

artemnikitin commented Aug 2, 2019

sebgl commented Aug 5, 2019 •

edited

Loading

david-kow left a comment

Refactor nodes orchestration to rely on StatefulSets #1463

Refactor nodes orchestration to rely on StatefulSets #1463

Conversation

sebgl commented Aug 2, 2019

artemnikitin commented Aug 2, 2019

sebgl commented Aug 5, 2019 • edited Loading

david-kow left a comment

Choose a reason for hiding this comment

sebgl commented Aug 5, 2019 •

edited

Loading