Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for SSA Snapshot Success #125

Merged
merged 3 commits into from
Sep 14, 2017

Conversation

jerryk55
Copy link
Member

We need to wait for success status from the SSA Snapshot
operations for both Managed and non-managed (blob) disks.
The Managed snapshot will use the Snapshot Service to wait
while the blob snapshot will use the Storage Account Service
to wait.

This is the story of the continuing saga of the BZs:
https://bugzilla.redhat.com/show_bug.cgi?id=1463780 (for a running non-managed disk VM)
and
https://bugzilla.redhat.com/show_bug.cgi?id=1475540 (for Managed Disk VMs)

This PR needs to be back ported to FINE and will be added to a hot fix covering both of these BZs.
Not needing to be back ported but still needed to be added are a PR to bump the manageiq-smartstate gem version to 1.4 and one in manageiq to use 1.4.

@roliveri please review for sanity (especially the length of the wait)
@bronaghs or @djberg96 please review and merge so we can put these blocker BZs to bed and push out the hot fix. Thanks all.

We need to wait for success status from the SSA Snapshot
operations for both Managed and non-managed (blob) disks.
The Managed snapshot will use the Snapshot Service to wait
while the blob snapshot will use the Storage Account Service
to wait.
@@ -142,7 +143,8 @@ def vm_create_evm_managed_snapshot(vm)
snap_svc.get(ssa_snap_name, resource_group)
rescue ::Azure::Armrest::NotFoundException, ::Azure::Armrest::ResourceNotFoundException => err
begin
snap_svc.create(ssa_snap_name, resource_group, snap_options)
response = snap_svc.create(ssa_snap_name, resource_group, snap_options)
raise "Maximum snapshot wait time exceeded" unless snap_svc.wait(response.response_headers, SSA_SNAPSHOT_WAIT_TIME) == "Succeeded"
Copy link
Collaborator

@djberg96 djberg96 Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd change this to =~ /^succe/i insted of == "Succeeded"because, depending on the operation, azure returns "Success" or "Succeeded", and I don't remember where and/or why.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@djberg96 cool - thanks for the heads up.

@@ -158,6 +160,7 @@ def vm_create_evm_blob_snapshot(vm)
_log.debug("vm=[#{vm.name}] creating SSA snapshot for #{vm.blob_uri}")
begin
snapshot_info = vm.storage_acct.create_blob_snapshot(vm.container, vm.blob, vm.key)
raise "Maximum snapshot wait time exceeded" unless vm.storage_acct_service.wait(snapshot_info, SSA_SNAPSHOT_WAIT_TIME) == "Succeeded"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Changing the response checking string parsing per @djberg96 comments.
@djberg96
Copy link
Collaborator

👍

@djberg96
Copy link
Collaborator

@bronaghs Jerry is going to change the timeout handling.

Rather than timeout after 30 minutes as we did previously, have the wait timeout after one minute,
but loop indefinitely until it succeeds or aborts.  Instead the timeout will occur at the job level.
@jerryk55
Copy link
Member Author

After offline discussion with @roliveri we changed the wait processing to allow the Job to control timeout rather than the wait.

@miq-bot
Copy link
Member

miq-bot commented Sep 13, 2017

Checked commits jerryk55/manageiq-providers-azure@5c7a6bf~...b4d2c73 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
1 file checked, 0 offenses detected
Everything looks fine. 🏆

@jerryk55
Copy link
Member Author

@miq-bot add_label fine/yes

@bronaghs
Copy link

@roliveri - can you approve?

snap_svc.create(ssa_snap_name, resource_group, snap_options)
response = snap_svc.create(ssa_snap_name, resource_group, snap_options)
# wait a minute at a time, allowing the Job Timeout to handle long-running snapshots here
next until snap_svc.wait(response.response_headers) =~ /^succe/i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably can be merged for now, so we can get the fix out. But this condition is too specific. If it doesn't complete successfully, we'll have to wait for the job to time out. The test should be more along the lines of:

next if snap_svc.wait(response.response_headers) =~ /^In progress/i

Not sure of the specific "in progress" state.

Same comment for the similar check below.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I discussed that with @djberg96 on last Wednesday and he was of the opinion that there would be an Exception raised in the case of an unsuccessful completion and therefore we wouldn't hit the test regardless.

@bronaghs bronaghs merged commit 9217377 into ManageIQ:master Sep 14, 2017
@bronaghs bronaghs added this to the Sprint 69 Ending Sep 18, 2017 milestone Sep 14, 2017
@simaishi
Copy link
Contributor

Fine backport details:

$ git log -1
commit 26ddc4fb61cfdaa1fcb697c47b23da2216ed3b02
Author: Bronagh Sorota <bsorota@redhat.com>
Date:   Thu Sep 14 16:18:51 2017 -0400

    Merge pull request #125 from jerryk55/wait_for_snapshot_success
    
    Wait for SSA Snapshot Success
    (cherry picked from commit 9217377b32db9ad7251b30cba442ff8a0e0fb305)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967
    https://bugzilla.redhat.com/show_bug.cgi?id=1491310

Copy link
Member

@roliveri roliveri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants