Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow migration to continue when machineClass with same name as Provi… #559

Merged

Conversation

prashanth26
Copy link
Contributor

@prashanth26 prashanth26 commented Nov 16, 2020

…derMachineClass is found

What this PR does / why we need it:
I observed a rare occurrence (only 1 cluster until now, tried to look through the code couldn't understand how this could occur) of a machine class migration failing for a shoot where the new machine class was created with the expected data, however, the machine objects were referring to an old non-existing ProviderMachineClass. This PR fixes such race conditions.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
I have added unit tests to check for these cases as well.

Release note:

Allow migration to continue when ProviderMachineClass is missing but MachineClass with the same name as ProviderMachineClass is found. Updates Machine object references to the MachineClass.

@prashanth26 prashanth26 requested review from ggaurav10 and a team as code owners November 16, 2020 08:23
@gardener-robot gardener-robot added needs/review Needs review size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) labels Nov 16, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Nov 16, 2020
@prashanth26
Copy link
Contributor Author

/priority blocker
/needs review

@gardener-robot gardener-robot added the priority/blocker Needs to be resolved now, because it breaks the service label Nov 17, 2020
Copy link
Member

@hardikdr hardikdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a minor comment, ignorable.

@gardener-robot gardener-robot removed the needs/review Needs review label Nov 17, 2020
@prashanth26 prashanth26 force-pushed the oot/allow-migration-with-same-mcname branch from 1fff9aa to f650ae0 Compare November 17, 2020 17:51
@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 17, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 17, 2020
@prashanth26 prashanth26 merged commit 6832b92 into gardener:master Nov 17, 2020
@prashanth26 prashanth26 deleted the oot/allow-migration-with-same-mcname branch May 2, 2021 07:44
einfachnuralex added a commit to stackitcloud/machine-controller-manager that referenced this pull request Aug 20, 2021
* Prepare Next Dev Cycle v0.35.0-dev

* Wait also for CSI PVs during serial eviction

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* add servergroup option

* [azure] Verify tags

Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>

* Allow deletion of machines where modify instance fails on deletion

* [OOT] Re-enabled support for bootstrap token injection

Signed-off-by: Andrey Klimentyev <andrey.klimentyev@flant.com>

* Azure MachineSet: VM ScaleSet or AvailabilitySet

Add generic MachineSet configuration in AzureMachineClass.
The machineSet field can be used to configure either a VMO or an AvailabilitySet for a MachineClass.

The availabilitySets field will be kept for a while but is now deprecated in favour of the machineset field.
The Azure compute sdk is also updated to support VMSS/VMO.

* Introduce constant backoff while enqueuing machine objects on creation and deletion failures

Co-authored-by: Prashanth <prashanth@sap.com>

* Bugfix: Finalizers are added by default on MC

* Added event handlers for MachCls

Fixed minor typo

* Removed multiple machineClass adds when pointing to same class.kind

* Fix typo in documentation

The documentation was pointing to the wrong branch. This change fixes it.

* Improve documentation with FAQ

Co-authored-by: Prashanth <prashanth@sap.com>
Co-authored-by: Samarth Deyagond <samarth.deyagond@sap.com>

* Enable NetworkUnavailable as node condition for machine health (gardener#543)

Add NetworkUnavailable as node condition to check while the machine is unhealthy

* Removing excessive spaces in FAQ

ToC links in the "Basics" section were broken due to this.

* OOT: Fixes drain failure timeout issues (gardener#548)

* OpenStack driver support to create VMs in a given subnet.

* Adapt integration tests to handle possibly orphaned resources

* Fix the OpenAPI validation of the machine-deployment crd

* Reduced default drain timeout to 2hours

* Bugfix: maxEvictRetries derived from drainTimeout

* OOT: Enqueue machine only when conditions change

* Split retry into Short Medium & Long periods

* Delete the orphan resources

* Backoff on machine creation/deletion faliures.

- Introdcued CrashLoopBackOff on machine creation failure

* Use listers to get the machine-object during reconciliation

* Allow migration to continue when machineClass with same name as ProviderMachineClass is found (gardener#559)

* Release v0.35.0

* Prepare Next Dev Cycle v0.36.0-dev

* Fix Azure machine deletion if resource group is gone

* Add FAQ for paused field

* Set Machine Phase to Terminating & make drain calculations based on deletionTimestamp (gardener#564)

* Fixes issues with machines being force drained

The machines were being force drained during race conditions due to PR gardener#492.
This commit fixes that by
	- Make the drain calculation based on deletion timestamp
	- Reverted logic from PR gardener#492 to first set machine Termination phase

* Update pkg/util/time/time_test.go

Fixed typo

Co-authored-by: Hardik Dodiya <hardik.dodiya@sap.com>

* Update docker images to use latest gcr copy

  - Updated from golang:1.13.5 to eu.gcr.io/gardener-project/3rd/golang:1.15.5
  - Updated from alpine:3.11.2 to eu.gcr.io/gardener-project/3rd/alpine:3.12.1

* Update docs/FAQ.md

Co-authored-by: Dieter Guendisch <dieter.guendisch@sap.com>

* Prevent panic when machine template hash length is < 5

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* In-tree drivers work with credentials data map

They are no longer working with the whole secret because they don't need it. All of them are evaluating the data map only.

* Introduce `.spec.credentialsSecretRef` for in-tree machine classes

* Introduce `.spec.credentialsSecretRef` for out-of-tree machine class

* Support Gardener credentials data keys for Alicloud, AWS, Azure, GCP

* bump aws sdk version to v1.23.13

* Migration note for OOT providers (gardener#581)

* Updated Readme.md

* Release v0.36.0

* Prepare Next Dev Cycle v0.37.0-dev

* Update pull_request_template.md (gardener#582)

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* Correcting the variable substitution

* docforge manifest (gardener#587)

* Check for misconfigured PDBs on Node drain and set proper error description

Signed-off-by: ialidzhikov <i.alidjikov@gmail.com>

* Fix for issue gardener#583 with UT

* Release v0.37.0

* Prepare next Dev Cycle v0.38.0-dev

* add component-descriptor-trait to head-update job

* Update eu.gcr.io/gardener-project/cc/job-image-golang to 0.12.0

* Update eu.gcr.io/gardener-project/cc/job-image-golang to 0.10.0

* Update eu.gcr.io/gardener-project/cc/job-image-golang to 0.11.0

* Run update before fetching on CI image (gardener#596)

* Improved NIC creation and deletion logic

- NIC creation now, adopts any existing NICs with matching name
- NIC deletion confirms deletion by performing GET on deletion
- Improved creation logs to give more details
- Revendored additional libraries

* Release v0.38.0

* Prepare next Dev Cycle v0.39.0-dev

* Updated links to various MCM OOT providers

* Fixed minor typo

* Updated link for metal stack

* skip drain readOnlyFileSystem logic and test case added (gardener#605)

Co-authored-by: Samarth Deyagond <samarth.deyagond@sap.com>

* Improved log details to include node name and provider ID (gardener#607)

Co-authored-by: Prashanth <prashanth@sap.com>

* Fixed formatting directive error

* Fix panic when secretRef cannot be found (gardener#609)

It would assign nil to secretRef, and this would panic when trying
to log out the original namespace/name.

* Fixes issue adding on finalizers on existing mach (gardener#611)

* Release v0.39.0

* Feature openstack volume type & dualstack customs

Fix podNetworkCidrs

* add openstack volumeType for machine

Co-authored-by: Maximilian Geberl <maximilian.geberl@stackit.de>

updated VERSION after rebase

updated driver_openstack.go

changed order of applied services

added allowed address pairs

Change make to einfachnuralex

Split podnetwork and run update port n times

changed version and repository

added more v6 stuff

* VERSION v0.39.0-ske-1

Co-authored-by: gardener-robot-ci-3 <gardener.ci.user3@gmail.com>
Co-authored-by: ialidzhikov <i.alidjikov@gmail.com>
Co-authored-by: Hardik Dodiya <hardik.dodiya@sap.com>
Co-authored-by: Konstantinos Angelopoulos <konstantinos.angelopoulos@sap.com>
Co-authored-by: Andrey Klimentyev <andrey.klimentyev@flant.com>
Co-authored-by: prashanth26 <prashanth@sap.com>
Co-authored-by: dkistner <dominickistner@yahoo.de>
Co-authored-by: Samarth Deyagond <samarth.deyagond@sap.com>
Co-authored-by: Anthony Comtois <anthony.comtois@sky.uk>
Co-authored-by: Dmitry Shurupov <dmitry.shurupov@flant.com>
Co-authored-by: Thomas Buchner <thomas.buchner@sap.com>
Co-authored-by: Kistner, Dominic <dominic.kistner@sap.com>
Co-authored-by: Dieter Guendisch <dieter.guendisch@sap.com>
Co-authored-by: Rafael Franzke <rafael.franzke@sap.com>
Co-authored-by: 郑佳金 <zhengjiajin@pingcap.com>
Co-authored-by: gardener-robot-ci-2 <gardener.ci.user2@gmail.com>
Co-authored-by: Samarth Deyagond <deyagondsamarth@gmail.com>
Co-authored-by: Georgi Pavlov <georgi.pavlov@sap.com>
Co-authored-by: Andreas Burger <andreas.burger@sap.com>
Co-authored-by: gardener-robot-ci-1 <gardener.ci.user@gmail.com>
Co-authored-by: Himanshu Sharma <79965161+himanshu-kun@users.noreply.github.com>
Co-authored-by: James Ravn <james@r-vn.org>
Co-authored-by: Manuel Ganter <manuel.ganter@hotmail.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) priority/blocker Needs to be resolved now, because it breaks the service size/m Size of pull request is medium (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants