-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow deletion to proceed in case of VM initialization error #928
Conversation
@rishabh-11 what changes are required on the provider e.g. mcm-provider-aws to return the appropriate error ? |
No change required on the provider side. The problem was with the |
Once this PR merges, I'll make a patch release of MCM. The providers will need to be updated with this patch version of MCM |
What about this scenario: User is doing some experiment and changes the something like the I am positing that checking the VM status during deletion and blocking the deletion only on that may not be the best thing to do. Especially when these post-init checks are involved in healthchecks of the VM. Or maybe we can have 2 "versions" of the check depending on if we are deleting the VM. |
This case will be handled in this PR. Let's consider that the user is experimenting and changes something on the VM, causing the |
Consider the case for errors apart from |
Point raised by @kon-angelo feels right. Perhaps it shouldn't be blocked ? (who cares about health at this point?). Ideally, we should just go ahead with driain->deletion in all circumstances except for |
@rishabh-11 Maybe first, I know that the PR solves the issue for provider-aws - so in that regard it is a /lgtm. I still do not think that it is a nice experience to not be able to delete a VM for something silly like flipping a boolean flag on the VM. @elankath summarised perfectly: why do the full blown healthcheck in this case. You particularly only care if the machine exists or not.
Our implementations already have their own validations before doing a delete call.
How exactly ? In this case the provider does not know if it should do the "full-check". You could change the interface to note if the machine is in deletion if you want to go that route.
Take this as an anecdote, but for provider-openstack I do not implement I don't really see a case where the delete call itself cannot and should not handle this case. Either the delete would fail, or maybe the delete call must do a get check beforehand - but in either case everything necessary would be handled by the delete call. |
@kon-angelo I agree that But, the
The "check" here is a check of the providerSpec in the machine class. I meant that we should only check those fields in the providerSpec that are needed for the particular driver method to work and not the entire providerSpec for every call. |
Sure. Technically without the
Maybe I misunderstood the point above. The issue with provider-aws currently is not that the spec validation does not match - what happened with gcp and the "static" spec checking. The machine is indeed "unhealthy" because the VM in the hyperscaler does not match its expected spec. The healthcheck is correct - but you don't need a "healthy" machine to move over to delete. There is no way from the Anyway, we can have this discussion offline. I think we all agree that we should merge the PR and go ahead with the fix and optimisations can come later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
The idea behind |
Test Log Before Fix.
|
Test Log Post Fix
|
Test Post-Fix with Kubelet Crash Simulation (checking that
|
@rishabh-11 Tests complete. Please merge and release whenever ready. |
…r#928) * allow deletion to proceed in case of VM initialization error * omit tool binaries * set_makefile_env: addec CONTROL_NAMESPACE, LEADER_ELECT --------- Co-authored-by: elankath <tarun.ramakrishna.elankath@sap.com>
What this PR does / why we need it:
This PR fixes the
triggerDeletionFlow
, specifically thegetVMStatus
function to allow deletion of Unitialized VMs to proceedWhich issue(s) this PR fixes:
Fixes #926
Special notes for your reviewer:
Release note: