Skip to content
This repository has been archived by the owner on Nov 1, 2023. It is now read-only.

A node instance may not exist by the time we attempt to update scale in protection #1712

Closed
tevoinea opened this issue Mar 16, 2022 · 0 comments · Fixed by #1719
Closed

A node instance may not exist by the time we attempt to update scale in protection #1712

tevoinea opened this issue Mar 16, 2022 · 0 comments · Fixed by #1719
Assignees
Labels
bug Something isn't working

Comments

@tevoinea
Copy link
Member

Information

  • Onefuzz version: 5.2.0
  • OS: Windows

Provide detailed reproduction steps (if any)

In the update_scale_in_protection function we:

  1. We check if an instance exists
    instance_vm = compute_client.virtual_machine_scale_set_vms.get(
  2. We update the scale in protection policy
    compute_client.virtual_machine_scale_set_vms.begin_update(

In between those 2 steps there are 2 scenarios that can cause issues:

  1. The node already didn't have scale-in protection so was auto scaled in by azure
  2. The node was deleted in some other fashion

Expected result

What is the expected result of the above steps?

The 2nd step (updating the protection policy) needs to handle the possibility that the node no longer exists.

If the action is to remove scale in protection, we should simply log that the node no longer exists since that was the desired behavior anyway.

If the action is to enable scale in protection, we should error.

Actual result

What is the actual result of the above steps?

Exception while executing function: Functions.timer_workers Result: Failure
Exception: HttpResponseError: (InvalidParameter) The provided instanceId 109 is not an active Virtual Machine Scale Set VM instanceId.
Code: InvalidParameter
Message: The provided instanceId 109 is not an active Virtual Machine Scale Set VM instanceId.
Target: instanceIds
Stack:   File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 402, in _handle__invocation_request
    call_result = await self._loop.run_in_executor(
  File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 611, in _run_sync_func
    return ExtensionManager.get_sync_invocation_wrapper(context,
  File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/extension.py", line 215, in _raw_invocation_wrapper
    result = function(**args)
  File "/home/site/wwwroot/timer_workers/__init__.py", line 62, in main
    process_scaleset(scaleset)
  File "/home/site/wwwroot/timer_workers/__init__.py", line 24, in process_scaleset
    if scaleset.cleanup_nodes():
  File "/home/site/wwwroot/onefuzzlib/workers/scalesets.py", line 335, in cleanup_nodes
    Node.reimage_long_lived_nodes(self.scaleset_id)
  File "/home/site/wwwroot/onefuzzlib/workers/nodes.py", line 477, in reimage_long_lived_nodes
    node.to_reimage()
  File "/home/site/wwwroot/onefuzzlib/workers/nodes.py", line 393, in to_reimage
    self.release_scale_in_protection()
  File "/home/site/wwwroot/onefuzzlib/workers/nodes.py", line 521, in release_scale_in_protection
    return update_scale_in_protection(
  File "/home/site/wwwroot/onefuzzlib/azure/creds.py", line 232, in decorated
    return func(*args, **kwargs)
  File "/home/site/wwwroot/onefuzzlib/azure/vmss.py", line 184, in update_scale_in_protection
    compute_client.virtual_machine_scale_set_vms.begin_update(
  File "/home/site/wwwroot/.python_packages/lib/site-packages/azure/core/tracing/decorator.py", line 83, in wrapper_use_tracer
    return func(*args, **kwargs)
  File "/home/site/wwwroot/.python_packages/lib/site-packages/azure/mgmt/compute/v2021_07_01/operations/_virtual_machine_scale_set_vms_operations.py", line 1059, in begin_update
    raw_result = self._update_initial(
  File "/home/site/wwwroot/.python_packages/lib/site-packages/azure/mgmt/compute/v2021_07_01/operations/_virtual_machine_scale_set_vms_operations.py", line 1000, in _update_initial
    raise HttpResponseError(response=response, error_format=ARMErrorFormat)
@tevoinea tevoinea added the bug Something isn't working label Mar 16, 2022
@ghost ghost added the Needs: triage label Mar 16, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Apr 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants