Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOT: Fixes drain timeout issues on retires #548

Merged

Conversation

prashanth26
Copy link
Contributor

What this PR does / why we need it:
The OOT implementation reverted the logic that was required to track time between drain retries. This PR fixes it. Refer previous code here - https://github.com/gardener/machine-controller-manager/blob/master/pkg/controller/machine.go#L741.

Which issue(s) this PR fixes:
Fixes #547

Special notes for your reviewer:
The PR also contains minor nits.

Release note:

OOT: Fixes drain timeout issues on retires

@prashanth26 prashanth26 requested review from ggaurav10 and a team as code owners November 1, 2020 10:10
@gardener-robot gardener-robot added needs/review Needs review size/s Size of pull request is small (see gardener-robot robot/bots/size.py) labels Nov 1, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 1, 2020
@prashanth26 prashanth26 added the kind/regression Bug that hit us already in the past and that is reappearing/requires a proper solution label Nov 1, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Nov 1, 2020
@prashanth26
Copy link
Contributor Author

/touch

Copy link
Member

@hardikdr hardikdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Nov 2, 2020
@prashanth26 prashanth26 merged commit 7096a47 into gardener:master Nov 2, 2020
@prashanth26 prashanth26 deleted the regression/drain-failure-timeout branch May 2, 2021 07:44
einfachnuralex added a commit to stackitcloud/machine-controller-manager that referenced this pull request Aug 20, 2021
* Prepare Next Dev Cycle v0.35.0-dev

* Wait also for CSI PVs during serial eviction

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* add servergroup option

* [azure] Verify tags

Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>

* Allow deletion of machines where modify instance fails on deletion

* [OOT] Re-enabled support for bootstrap token injection

Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>

* Azure MachineSet: VM ScaleSet or AvailabilitySet

Add generic MachineSet configuration in AzureMachineClass.
The machineSet field can be used to configure either a VMO or an AvailabilitySet for a MachineClass.

The availabilitySets field will be kept for a while but is now deprecated in favour of the machineset field.
The Azure compute sdk is also updated to support VMSS/VMO.

* Introduce constant backoff while enqueuing machine objects on creation and deletion failures

Co-authored-by: Prashanth <prashanth@sap.com>

* Bugfix: Finalizers are added by default on MC

* Added event handlers for MachCls

Fixed minor typo

* Removed multiple machineClass adds when pointing to same class.kind

* Fix typo in documentation

The documentation was pointing to the wrong branch. This change fixes it.

* Improve documentation with FAQ

Co-authored-by: Prashanth <prashanth@sap.com>
Co-authored-by: Samarth Deyagond <samarth.deyagond@sap.com>

* Enable NetworkUnavailable as node condition for machine health (gardener#543)

Add NetworkUnavailable as node condition to check while the machine is unhealthy

* Removing excessive spaces in FAQ

ToC links in the "Basics" section were broken due to this.

* OOT: Fixes drain failure timeout issues (gardener#548)

* OpenStack driver support to create VMs in a given subnet.

* Adapt integration tests to handle possibly orphaned resources

* Fix the OpenAPI validation of the machine-deployment crd

* Reduced default drain timeout to 2hours

* Bugfix: maxEvictRetries derived from drainTimeout

* OOT: Enqueue machine only when conditions change

* Split retry into Short Medium & Long periods

* Delete the orphan resources

* Backoff on machine creation/deletion faliures.

- Introdcued CrashLoopBackOff on machine creation failure

* Use listers to get the machine-object during reconciliation

* Allow migration to continue when machineClass with same name as ProviderMachineClass is found (gardener#559)

* Release v0.35.0

* Prepare Next Dev Cycle v0.36.0-dev

* Fix Azure machine deletion if resource group is gone

* Add FAQ for paused field

* Set Machine Phase to Terminating & make drain calculations based on deletionTimestamp (gardener#564)

* Fixes issues with machines being force drained

The machines were being force drained during race conditions due to PR gardener#492.
This commit fixes that by
	- Make the drain calculation based on deletion timestamp
	- Reverted logic from PR gardener#492 to first set machine Termination phase

* Update pkg/util/time/time_test.go

Fixed typo

Co-authored-by: Hardik Dodiya <hardik.dodiya@sap.com>

* Update docker images to use latest gcr copy

  - Updated from golang:1.13.5 to eu.gcr.io/gardener-project/3rd/golang:1.15.5
  - Updated from alpine:3.11.2 to eu.gcr.io/gardener-project/3rd/alpine:3.12.1

* Update docs/FAQ.md

Co-authored-by: Dieter Guendisch <dieter.guendisch@sap.com>

* Prevent panic when machine template hash length is < 5

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* In-tree drivers work with credentials data map

They are no longer working with the whole secret because they don't need it. All of them are evaluating the data map only.

* Introduce `.spec.credentialsSecretRef` for in-tree machine classes

* Introduce `.spec.credentialsSecretRef` for out-of-tree machine class

* Support Gardener credentials data keys for Alicloud, AWS, Azure, GCP

* bump aws sdk version to v1.23.13

* Migration note for OOT providers (gardener#581)

* Updated Readme.md

* Release v0.36.0

* Prepare Next Dev Cycle v0.37.0-dev

* Update pull_request_template.md (gardener#582)

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* Correcting the variable substitution

* docforge manifest (gardener#587)

* Check for misconfigured PDBs on Node drain and set proper error description

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* Fix for issue gardener#583 with UT

* Release v0.37.0

* Prepare next Dev Cycle v0.38.0-dev

* add component-descriptor-trait to head-update job

* Update eu.gcr.io/gardener-project/cc/job-image-golang to 0.12.0

* Update eu.gcr.io/gardener-project/cc/job-image-golang to 0.10.0

* Update eu.gcr.io/gardener-project/cc/job-image-golang to 0.11.0

* Run update before fetching on CI image (gardener#596)

* Improved NIC creation and deletion logic

- NIC creation now, adopts any existing NICs with matching name
- NIC deletion confirms deletion by performing GET on deletion
- Improved creation logs to give more details
- Revendored additional libraries

* Release v0.38.0

* Prepare next Dev Cycle v0.39.0-dev

* Updated links to various MCM OOT providers

* Fixed minor typo

* Updated link for metal stack

* skip drain readOnlyFileSystem logic and test case added (gardener#605)

Co-authored-by: Samarth Deyagond <samarth.deyagond@sap.com>

* Improved log details to include node name and provider ID (gardener#607)

Co-authored-by: Prashanth <prashanth@sap.com>

* Fixed formatting directive error

* Fix panic when secretRef cannot be found (gardener#609)

It would assign nil to secretRef, and this would panic when trying
to log out the original namespace/name.

* Fixes issue adding on finalizers on existing mach (gardener#611)

* Release v0.39.0

* Feature openstack volume type & dualstack customs

Fix podNetworkCidrs

* add openstack volumeType for machine

Co-authored-by: Maximilian Geberl <maximilian.geberl@stackit.de>

updated VERSION after rebase

updated driver_openstack.go

changed order of applied services

added allowed address pairs

Change make to einfachnuralex

Split podnetwork and run update port n times

changed version and repository

added more v6 stuff

* VERSION v0.39.0-ske-1

Co-authored-by: gardener-robot-ci-3 <gardener.ci.user3@gmail.com>
Co-authored-by: ialidzhikov <i.alidjikov@gmail.com>
Co-authored-by: Hardik Dodiya <hardik.dodiya@sap.com>
Co-authored-by: Konstantinos Angelopoulos <konstantinos.angelopoulos@sap.com>
Co-authored-by: Andrey Klimentyev <andrey.klimentyev@flant.com>
Co-authored-by: prashanth26 <prashanth@sap.com>
Co-authored-by: dkistner <dominickistner@yahoo.de>
Co-authored-by: Samarth Deyagond <samarth.deyagond@sap.com>
Co-authored-by: Anthony Comtois <anthony.comtois@sky.uk>
Co-authored-by: Dmitry Shurupov <dmitry.shurupov@flant.com>
Co-authored-by: Thomas Buchner <thomas.buchner@sap.com>
Co-authored-by: Kistner, Dominic <dominic.kistner@sap.com>
Co-authored-by: Dieter Guendisch <dieter.guendisch@sap.com>
Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>
Co-authored-by: 郑佳金 <zhengjiajin@pingcap.com>
Co-authored-by: gardener-robot-ci-2 <gardener.ci.user2@gmail.com>
Co-authored-by: Samarth Deyagond <deyagondsamarth@gmail.com>
Co-authored-by: Georgi Pavlov <georgi.pavlov@sap.com>
Co-authored-by: Andreas Burger <andreas.burger@sap.com>
Co-authored-by: gardener-robot-ci-1 <gardener.ci.user@gmail.com>
Co-authored-by: Himanshu Sharma <79965161+himanshu-kun@users.noreply.github.com>
Co-authored-by: James Ravn <james@r-vn.org>
Co-authored-by: Manuel Ganter <manuel.ganter@hotmail.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/regression Bug that hit us already in the past and that is reappearing/requires a proper solution needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/s Size of pull request is small (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OOT: Drain failure cases don't seem to respect drain timeouts
5 participants