BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed #696

JoelSpeed · 2020-09-08T11:44:27Z

This ensures that if the Machine goes into a Failed state, and the provider has already set the providerStatus to include an instanceState or vmState, that we override this with the value Unknown.

This ensures consistency between the providerStatus and the instance state annotation.

Ref:

The OpenStack and Baremetal providers do not have an equivalent field.

openshift-ci-robot · 2020-09-08T11:44:35Z

@JoelSpeed: This pull request references Bugzilla bug 1875598, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.6.0) matches configured target release for branch (4.6.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

enxebre · 2020-09-10T08:36:57Z

pkg/controller/machine/controller.go

 		baseToPatch := client.MergeFrom(machine.DeepCopy())
+
+		if phase == phaseFailed {


shouldn't this reuse the if block in line 429?

If we use the block from above, then it's before the baseToPatch is created and as such, the patching mechanism thinks that the changes were there already. This modification has to come after baseToPatch is created, but the other one has to come before it, because it is modifying the spec.

I added the line on 438 to try and make this more obvious

enxebre · 2020-09-10T08:40:00Z

pkg/controller/machine/controller.go

 		baseToPatch := client.MergeFrom(machine.DeepCopy())
+
+		if phase == phaseFailed {
+			if err := r.patchFailedMachineProviderStatusState(machine); err != nil {


this is called patchFailedMachineProviderStatusState but is not actually patching but just setting right?
We should be consistent with a patchFailedMachineInstanceAnnotation and either let patchFailedMachineProviderStatusState to create a baseToPatch and do its patch request or rename both funcs and let them just set.

I agree the naming is wrong, but I think the Instance annotation needs to stay as a patching function otherwise we have the issue of not actually updating the resource. The annotation needs to send a client.Patch but the rest of the changes need a client.Status().Patch because of the subresource.

I think the preferable route here is to just rename the new one not to include the patching word and leave the logic as is

This ensures that if the Machine goes into a Failed state, and the provider has already set the providerStatus to include an instanceState or vmState, that we override this with the value `Unknown`. This ensures consistency between the providerStatus and the instance state annotation.

enxebre · 2020-09-10T10:32:14Z

/approve

openshift-ci-robot · 2020-09-10T10:32:31Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [enxebre]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JoelSpeed · 2020-09-10T12:31:55Z

/retest

enxebre · 2020-09-11T08:52:49Z

Machine API operator deployment should maintains spec after validating webhook configuration change and preserve caBundle

/test e2e-aws-operator

JoelSpeed · 2020-09-11T09:37:07Z

/test e2e-aws-operator

JoelSpeed · 2020-09-11T10:34:09Z

/test e2e-aws-operator

Danil-Grigorev · 2020-09-16T08:51:00Z

/lgtm

openshift-bot · 2020-09-16T09:20:03Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T09:59:01Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T11:17:03Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T12:09:01Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T13:01:02Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T13:14:05Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T13:57:08Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T14:19:02Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T15:11:06Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T16:55:33Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-16T17:21:03Z

/retest

Please review the full test history for this PR and help us cut down flakes.

enxebre · 2020-09-21T12:19:00Z

/lgtm

openshift-bot · 2020-09-21T13:00:45Z

/retest

Please review the full test history for this PR and help us cut down flakes.

michaelgugino · 2020-09-21T13:03:03Z

pkg/controller/machine/controller.go

+	const instanceStateField = "instanceState"
+	const vmStateField = "vmState"
+
+	providerStatus, err := runtime.DefaultUnstructuredConverter.ToUnstructured(machine.Status.ProviderStatus)


Will this work correctly if ProviderStatus is nil?

There is a check on L#480 which prevents us getting this far if the providerStatus is nil, hence that scenario is avoided (and if the check is removed, the tests fail)

michaelgugino · 2020-09-21T13:03:27Z

pkg/controller/machine/controller.go

+	const vmStateField = "vmState"
+
+	providerStatus, err := runtime.DefaultUnstructuredConverter.ToUnstructured(machine.Status.ProviderStatus)
+	if err != nil {


Should this really prevent us from doing the rest?

michaelgugino · 2020-09-21T13:07:40Z

pkg/controller/machine/controller.go

+		if phase == phaseFailed {
+			if err := r.overrideFailedMachineProviderStatusState(machine); err != nil {
+				klog.Errorf("Failed to update machine provider status %q: %v", machine.GetName(), err)
+				return err


No need to fail the whole run here, we should log the error and continue processing. Nothing about this added behavior for setting subsequent status.

I've had a look through the kinds of errors that we could get returned from this function and I don't think in a running environment we would ever actually see them. To me, they all seem to be that they would exist only from programming errors (https://github.com/kubernetes/apimachinery/blob/94222d04a59075a01fddedd696037db9e61db6e9/pkg/runtime/converter.go#L404), digging into this, an error could come up if either the object passed is nil (Which is already being checked), or if the type of object cannot be converted there is a possibility for an error. In either of these cases we should see this during unit testing so I think this is better to leave as it is, else we may miss our programming errors as we make changes to this in the future

michaelgugino · 2020-09-21T13:08:00Z

/hold

JoelSpeed · 2020-09-21T14:47:47Z

/retest

JoelSpeed · 2020-09-22T09:13:13Z

/retest

JoelSpeed · 2020-09-23T11:07:46Z

/retest

michaelgugino · 2020-09-23T11:49:10Z

/hold cancel

JoelSpeed · 2020-09-23T12:56:05Z

/retest

openshift-bot · 2020-09-23T13:31:29Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-23T14:23:26Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-23T15:41:47Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-23T15:54:33Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-09-23T15:59:59Z

@JoelSpeed: The following test failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-azure	`894b530`	link	`/test e2e-azure`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2020-09-23T16:07:25Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-bot · 2020-09-23T16:20:56Z

/retest

Please review the full test history for this PR and help us cut down flakes.

openshift-ci-robot · 2020-09-23T16:51:40Z

@JoelSpeed: All pull requests linked via external trackers have merged:

openshift/machine-api-operator#696

Bugzilla bug 1875598 has been moved to the MODIFIED state.

In response to this:

BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Sep 8, 2020

openshift-ci-robot requested review from alexander-demicev and Danil-Grigorev September 8, 2020 11:44

enxebre reviewed Sep 10, 2020

View reviewed changes

JoelSpeed force-pushed the override-vm-state branch from 6bdbf76 to 00d6df0 Compare September 10, 2020 10:02

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 10, 2020

openshift-ci-robot assigned Danil-Grigorev Sep 16, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 16, 2020

openshift-ci-robot assigned enxebre Sep 21, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 21, 2020

michaelgugino reviewed Sep 21, 2020

View reviewed changes

michaelgugino suggested changes Sep 21, 2020

View reviewed changes

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 21, 2020

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2020

openshift-merge-robot merged commit ad45c86 into openshift:master Sep 23, 2020

JoelSpeed deleted the override-vm-state branch September 28, 2020 14:49

		baseToPatch := client.MergeFrom(machine.DeepCopy())

		if phase == phaseFailed {

BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed #696

BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed #696

Conversation

JoelSpeed commented Sep 8, 2020

openshift-ci-robot commented Sep 8, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre Sep 10, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

enxebre commented Sep 10, 2020

openshift-ci-robot commented Sep 10, 2020

JoelSpeed commented Sep 10, 2020

enxebre commented Sep 11, 2020

JoelSpeed commented Sep 11, 2020

JoelSpeed commented Sep 11, 2020

Danil-Grigorev commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

openshift-bot commented Sep 16, 2020

enxebre commented Sep 21, 2020

openshift-bot commented Sep 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelgugino commented Sep 21, 2020

JoelSpeed commented Sep 21, 2020

JoelSpeed commented Sep 22, 2020

JoelSpeed commented Sep 23, 2020

michaelgugino commented Sep 23, 2020

JoelSpeed commented Sep 23, 2020

openshift-bot commented Sep 23, 2020

openshift-bot commented Sep 23, 2020

openshift-bot commented Sep 23, 2020

openshift-bot commented Sep 23, 2020

openshift-ci-robot commented Sep 23, 2020 • edited Loading

openshift-bot commented Sep 23, 2020

openshift-bot commented Sep 23, 2020

openshift-ci-robot commented Sep 23, 2020

enxebre Sep 10, 2020 •

edited

Loading

openshift-ci-robot commented Sep 23, 2020 •

edited

Loading