Wait for SSA Snapshot Success #125

jerryk55 · 2017-09-13T14:45:16Z

We need to wait for success status from the SSA Snapshot
operations for both Managed and non-managed (blob) disks.
The Managed snapshot will use the Snapshot Service to wait
while the blob snapshot will use the Storage Account Service
to wait.

This is the story of the continuing saga of the BZs:
https://bugzilla.redhat.com/show_bug.cgi?id=1463780 (for a running non-managed disk VM)
and
https://bugzilla.redhat.com/show_bug.cgi?id=1475540 (for Managed Disk VMs)

This PR needs to be back ported to FINE and will be added to a hot fix covering both of these BZs.
Not needing to be back ported but still needed to be added are a PR to bump the manageiq-smartstate gem version to 1.4 and one in manageiq to use 1.4.

@roliveri please review for sanity (especially the length of the wait)
@bronaghs or @djberg96 please review and merge so we can put these blocker BZs to bed and push out the hot fix. Thanks all.

We need to wait for success status from the SSA Snapshot operations for both Managed and non-managed (blob) disks. The Managed snapshot will use the Snapshot Service to wait while the blob snapshot will use the Storage Account Service to wait.

djberg96 · 2017-09-13T14:58:11Z

app/models/manageiq/providers/azure/cloud_manager.rb

@@ -142,7 +143,8 @@ def vm_create_evm_managed_snapshot(vm)
      snap_svc.get(ssa_snap_name, resource_group)
    rescue ::Azure::Armrest::NotFoundException, ::Azure::Armrest::ResourceNotFoundException => err
      begin
-        snap_svc.create(ssa_snap_name, resource_group, snap_options)
+        response = snap_svc.create(ssa_snap_name, resource_group, snap_options)
+        raise "Maximum snapshot wait time exceeded" unless snap_svc.wait(response.response_headers, SSA_SNAPSHOT_WAIT_TIME) == "Succeeded"


I'd change this to =~ /^succe/i insted of == "Succeeded"because, depending on the operation, azure returns "Success" or "Succeeded", and I don't remember where and/or why.

@djberg96 cool - thanks for the heads up.

djberg96 · 2017-09-13T14:58:49Z

app/models/manageiq/providers/azure/cloud_manager.rb

@@ -158,6 +160,7 @@ def vm_create_evm_blob_snapshot(vm)
    _log.debug("vm=[#{vm.name}] creating SSA snapshot for #{vm.blob_uri}")
    begin
      snapshot_info = vm.storage_acct.create_blob_snapshot(vm.container, vm.blob, vm.key)
+      raise "Maximum snapshot wait time exceeded" unless vm.storage_acct_service.wait(snapshot_info, SSA_SNAPSHOT_WAIT_TIME) == "Succeeded"


Same, same.

@djberg96

Changing the response checking string parsing per @djberg96 comments.

djberg96 · 2017-09-13T15:14:15Z

👍

djberg96 · 2017-09-13T15:37:28Z

@bronaghs Jerry is going to change the timeout handling.

Rather than timeout after 30 minutes as we did previously, have the wait timeout after one minute, but loop indefinitely until it succeeds or aborts. Instead the timeout will occur at the job level.

jerryk55 · 2017-09-13T15:58:06Z

After offline discussion with @roliveri we changed the wait processing to allow the Job to control timeout rather than the wait.

miq-bot · 2017-09-13T15:58:22Z

Checked commits jerryk55/manageiq-providers-azure@5c7a6bf~...b4d2c73 with ruby 2.2.6, rubocop 0.47.1, and haml-lint 0.20.0
1 file checked, 0 offenses detected
Everything looks fine. 🏆

jerryk55 · 2017-09-13T17:33:41Z

@miq-bot add_label fine/yes

bronaghs · 2017-09-14T15:35:33Z

@roliveri - can you approve?

roliveri · 2017-09-14T18:09:45Z

app/models/manageiq/providers/azure/cloud_manager.rb

-        snap_svc.create(ssa_snap_name, resource_group, snap_options)
+        response = snap_svc.create(ssa_snap_name, resource_group, snap_options)
+        # wait a minute at a time, allowing the Job Timeout to handle long-running snapshots here
+        next until snap_svc.wait(response.response_headers) =~ /^succe/i


This probably can be merged for now, so we can get the fix out. But this condition is too specific. If it doesn't complete successfully, we'll have to wait for the job to time out. The test should be more along the lines of:

next if snap_svc.wait(response.response_headers) =~ /^In progress/i

Not sure of the specific "in progress" state.

Same comment for the similar check below.

So I discussed that with @djberg96 on last Wednesday and he was of the opinion that there would be an Exception raised in the case of an unsuccessful completion and therefore we wouldn't hit the test regardless.

Wait for SSA Snapshot Success (cherry picked from commit 9217377) https://bugzilla.redhat.com/show_bug.cgi?id=1488967 https://bugzilla.redhat.com/show_bug.cgi?id=1491310

simaishi · 2017-09-14T20:34:23Z

Fine backport details:

$ git log -1
commit 26ddc4fb61cfdaa1fcb697c47b23da2216ed3b02
Author: Bronagh Sorota <bsorota@redhat.com>
Date:   Thu Sep 14 16:18:51 2017 -0400

    Merge pull request #125 from jerryk55/wait_for_snapshot_success
    
    Wait for SSA Snapshot Success
    (cherry picked from commit 9217377b32db9ad7251b30cba442ff8a0e0fb305)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1488967
    https://bugzilla.redhat.com/show_bug.cgi?id=1491310

roliveri

I approve the changes.

Wait for SSA Snapshot Success

5c7a6bf

We need to wait for success status from the SSA Snapshot operations for both Managed and non-managed (blob) disks. The Managed snapshot will use the Snapshot Service to wait while the blob snapshot will use the Storage Account Service to wait.

djberg96 reviewed Sep 13, 2017

View reviewed changes

Change success string parsing per review comments

33b4f1d

Changing the response checking string parsing per @djberg96 comments.

bronaghs approved these changes Sep 13, 2017

View reviewed changes

Let Wait Loop Until Success or Job Timeout

b4d2c73

Rather than timeout after 30 minutes as we did previously, have the wait timeout after one minute, but loop indefinitely until it succeeds or aborts. Instead the timeout will occur at the job level.

miq-bot added the fine/yes label Sep 13, 2017

roliveri reviewed Sep 14, 2017

View reviewed changes

bronaghs merged commit 9217377 into ManageIQ:master Sep 14, 2017

bronaghs added this to the Sprint 69 Ending Sep 18, 2017 milestone Sep 14, 2017

simaishi added fine/backported and removed fine/yes labels Sep 14, 2017

roliveri reviewed Sep 15, 2017

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait for SSA Snapshot Success #125

Wait for SSA Snapshot Success #125

jerryk55 commented Sep 13, 2017

djberg96 Sep 13, 2017 •

edited

Loading

jerryk55 Sep 13, 2017

djberg96 Sep 13, 2017

jerryk55 Sep 13, 2017

djberg96 commented Sep 13, 2017

djberg96 commented Sep 13, 2017

jerryk55 commented Sep 13, 2017

miq-bot commented Sep 13, 2017

jerryk55 commented Sep 13, 2017

bronaghs commented Sep 14, 2017

roliveri Sep 14, 2017

jerryk55 Sep 18, 2017

simaishi commented Sep 14, 2017

roliveri left a comment

Wait for SSA Snapshot Success #125

Wait for SSA Snapshot Success #125

Conversation

jerryk55 commented Sep 13, 2017

djberg96 Sep 13, 2017 • edited Loading

Choose a reason for hiding this comment

jerryk55 Sep 13, 2017

Choose a reason for hiding this comment

djberg96 Sep 13, 2017

Choose a reason for hiding this comment

jerryk55 Sep 13, 2017

Choose a reason for hiding this comment

djberg96 commented Sep 13, 2017

djberg96 commented Sep 13, 2017

jerryk55 commented Sep 13, 2017

miq-bot commented Sep 13, 2017

jerryk55 commented Sep 13, 2017

bronaghs commented Sep 14, 2017

roliveri Sep 14, 2017

Choose a reason for hiding this comment

jerryk55 Sep 18, 2017

Choose a reason for hiding this comment

simaishi commented Sep 14, 2017

roliveri left a comment

Choose a reason for hiding this comment

djberg96 Sep 13, 2017 •

edited

Loading