Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed #696

Merged
merged 2 commits into from
Sep 23, 2020

Conversation

JoelSpeed
Copy link
Contributor

This ensures that if the Machine goes into a Failed state, and the provider has already set the providerStatus to include an instanceState or vmState, that we override this with the value Unknown.

This ensures consistency between the providerStatus and the instance state annotation.

Ref:

The OpenStack and Baremetal providers do not have an equivalent field.

@openshift-ci-robot
Copy link
Contributor

@JoelSpeed: This pull request references Bugzilla bug 1875598, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

In response to this:

BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. labels Sep 8, 2020
baseToPatch := client.MergeFrom(machine.DeepCopy())

if phase == phaseFailed {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this reuse the if block in line 429?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we use the block from above, then it's before the baseToPatch is created and as such, the patching mechanism thinks that the changes were there already. This modification has to come after baseToPatch is created, but the other one has to come before it, because it is modifying the spec.

I added the line on 438 to try and make this more obvious

baseToPatch := client.MergeFrom(machine.DeepCopy())

if phase == phaseFailed {
if err := r.patchFailedMachineProviderStatusState(machine); err != nil {
Copy link
Member

@enxebre enxebre Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is called patchFailedMachineProviderStatusState but is not actually patching but just setting right?
We should be consistent with a patchFailedMachineInstanceAnnotation and either let patchFailedMachineProviderStatusState to create a baseToPatch and do its patch request or rename both funcs and let them just set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the naming is wrong, but I think the Instance annotation needs to stay as a patching function otherwise we have the issue of not actually updating the resource. The annotation needs to send a client.Patch but the rest of the changes need a client.Status().Patch because of the subresource.

I think the preferable route here is to just rename the new one not to include the patching word and leave the logic as is

This ensures that if the Machine goes into a Failed state, and the 
provider has already set the providerStatus to include an instanceState 
or vmState, that we override this with the value `Unknown`.

This ensures consistency between the providerStatus and the instance 
state annotation.
@enxebre
Copy link
Member

enxebre commented Sep 10, 2020

/approve

@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: enxebre

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 10, 2020
@JoelSpeed
Copy link
Contributor Author

/retest

@enxebre
Copy link
Member

enxebre commented Sep 11, 2020

Machine API operator deployment should maintains spec after validating webhook configuration change and preserve caBundle

/test e2e-aws-operator

@JoelSpeed
Copy link
Contributor Author

/test e2e-aws-operator

1 similar comment
@JoelSpeed
Copy link
Contributor Author

/test e2e-aws-operator

@Danil-Grigorev
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 16, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

10 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@enxebre
Copy link
Member

enxebre commented Sep 21, 2020

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 21, 2020
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

const instanceStateField = "instanceState"
const vmStateField = "vmState"

providerStatus, err := runtime.DefaultUnstructuredConverter.ToUnstructured(machine.Status.ProviderStatus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work correctly if ProviderStatus is nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a check on L#480 which prevents us getting this far if the providerStatus is nil, hence that scenario is avoided (and if the check is removed, the tests fail)

const vmStateField = "vmState"

providerStatus, err := runtime.DefaultUnstructuredConverter.ToUnstructured(machine.Status.ProviderStatus)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this really prevent us from doing the rest?

if phase == phaseFailed {
if err := r.overrideFailedMachineProviderStatusState(machine); err != nil {
klog.Errorf("Failed to update machine provider status %q: %v", machine.GetName(), err)
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to fail the whole run here, we should log the error and continue processing. Nothing about this added behavior for setting subsequent status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a look through the kinds of errors that we could get returned from this function and I don't think in a running environment we would ever actually see them. To me, they all seem to be that they would exist only from programming errors (https://github.com/kubernetes/apimachinery/blob/94222d04a59075a01fddedd696037db9e61db6e9/pkg/runtime/converter.go#L404), digging into this, an error could come up if either the object passed is nil (Which is already being checked), or if the type of object cannot be converted there is a possibility for an error. In either of these cases we should see this during unit testing so I think this is better to leave as it is, else we may miss our programming errors as we make changes to this in the future

@michaelgugino
Copy link
Contributor

/hold

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 21, 2020
@JoelSpeed
Copy link
Contributor Author

/retest

2 similar comments
@JoelSpeed
Copy link
Contributor Author

/retest

@JoelSpeed
Copy link
Contributor Author

/retest

@michaelgugino
Copy link
Contributor

/hold cancel

@openshift-ci-robot openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 23, 2020
@JoelSpeed
Copy link
Contributor Author

/retest

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

3 similar comments
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Sep 23, 2020

@JoelSpeed: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-azure 894b530 link /test e2e-azure

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit ad45c86 into openshift:master Sep 23, 2020
@openshift-ci-robot
Copy link
Contributor

@JoelSpeed: All pull requests linked via external trackers have merged:

Bugzilla bug 1875598 has been moved to the MODIFIED state.

In response to this:

BUG 1875598: Ensure the Virtual Machine provider state is set to Unknown when Failed

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-low Referenced Bugzilla bug's severity is low for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants