Virtual machine creation fails with RetryableError #16928

dbergel · 2022-05-24T16:55:15Z

Is there an existing issue for this?

I have searched the existing issues

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

1.1.9

AzureRM Provider Version

3.0.2

Affected Resource(s)/Data Source(s)

azurerm_linux_virtual_machine azurerm_windows_virtual_machine

Terraform Configuration Files

https://github.com/teradici/Azure_Deployments/tree/master/terraform-deployments/deployments/cas-mgr-load-balancer-one-ip-nat

Debug Output/Panic Output

2022-05-18T15:17:05.635Z [DEBUG] provider.terraform-provider-azurerm_v3.0.2_x5: AzureRM Response for https://management.azure.com/subscriptions/<redacted>/providers/Microsoft.Compute/locations/centralus/operations/90793864-6ba7-488d-b142-0b8128735630?p=7b61c3cd-cc9b-4d18-8ca3-a9c1b12efefd&api-version=2021-11-01: 
HTTP/2.0 200 OK
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Wed, 18 May 2022 15:17:05 GMT
Expires: -1
Pragma: no-cache
Server: Microsoft-HTTPAPI/2.0
Server: Microsoft-HTTPAPI/2.0
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: 83831b17-0254-51a9-b61d-b1a32674682d
X-Ms-Ratelimit-Remaining-Resource: Microsoft.Compute/GetOperation3Min;14975,Microsoft.Compute/GetOperation30Min;29895
X-Ms-Ratelimit-Remaining-Subscription-Reads: 11998
X-Ms-Request-Id: 903c4659-5b75-4522-a813-d9787a578144
X-Ms-Routing-Request-Id: WESTUS2:20220518T151705Z:fc57f1eb-6172-446a-8d0e-5976ae40a901

{
  "startTime": "2022-05-18T15:16:53.6761603+00:00",
  "endTime": "2022-05-18T15:16:57.7073471+00:00",
  "status": "Failed",
  "error": {
    "code": "RetryableError",
    "message": "A retryable error occurred."
  },
  "name": "90793864-6ba7-488d-b142-0b8128735630"
}: timestamp=2022-05-18T15:17:05.634Z

Expected Behaviour

Virtual machines provisioned successfully, retryable errors automatically retried.

Actual Behaviour

Intermittently workstation provisioning will fail with a basic "retryable error" with no additional information. Not able to reproduce 100%

  Error: waiting for creation of Linux Virtual Machine: (Name "dbj2m-scent-0" / Resource Group "cas-mgr-load-balancer-one-ip-nat-dbj2m"): Code="RetryableError" Message="A retryable error occurred."
  
    with module.centos-std-vm.azurerm_linux_virtual_machine.centos-std-vm["linux_std_0"],
    on ../../modules/centos-std-vm/main.tf line 50, in resource "azurerm_linux_virtual_machine" "centos-std-vm":
    50: resource "azurerm_linux_virtual_machine" "centos-std-vm" {
  
  
  Error: waiting for creation of Windows Virtual Machine: (Name "dbj2m-swin-0" / Resource Group "cas-mgr-load-balancer-one-ip-nat-dbj2m"): Code="RetryableError" Message="A retryable error occurred."
  
    with module.windows-std-vm.azurerm_windows_virtual_machine.windows-std-vm["windows_std_0"],
    on ../../modules/windows-std-vm/main.tf line 47, in resource "azurerm_windows_virtual_machine" "windows-std-vm":
    47: resource "azurerm_windows_virtual_machine" "windows-std-vm" {

Steps to Reproduce

terraform apply

Important Factoids

No response

References

No response

The text was updated successfully, but these errors were encountered:

myc2h6o · 2022-05-26T04:29:08Z

Hi @dbergel thanks for opening the issue! From the config and the error, I'm not able to identify the root cause, but the Additional information in #8052 may help with the trouble shooting. There was some issue in that issue with creating the VM/VMSS when the load balancer is updating the v-net. Would you be able to find additional details on Azure Portal related to the deployment failure?

ekristen · 2022-09-15T00:15:04Z

There are two issues at play here, why is azure throwing an error, especially "A retryable error occurred" and how the terraform provider is handling said error.

Since this error is specifically stated to be retryable, this should not be treated as a fatal error, instead it should simply retry the API call.

I see this happen with the same resource, azurerm_linux_virtual_machine, it starts the creation, I get a "Still creating [10s elapsed]" then the error happens and terraform exits. Since this specific error says it's retryable, I would suggest the provider simply retry whenever it encouters this error and then we'd get a "Still creating [20s elapsed]" etc, until it's created, or the internal timer (10-20 minutes) is hit, OR a non-retryable error is encountered.

One additional note: this is a pretty common theme throughout the provider and it makes it very frustrating to use, which is not entirely on the provider itself as the Azure API is just terribly inconsistent, but retrying retryable errors instead of treating them as fatal would go a long way in improving the user experience of this provider.

alok0310 · 2022-09-29T19:44:32Z

This is really frustrating as we only see this issue when using Azure, but, not AWS.
Does it have anything to do with the number of threads being used by Terraform apply?

ekristen · 2022-09-29T20:07:47Z

@alok0310 I don't believe so. From what I can tell this is due to the Azure API just being junky but at the same time in my opinion and also from what I can tell that the provider is not retrying errors dictated by the Azure API as "retryable" and instead exits hard as if an error occurred.

mtin · 2022-10-05T16:34:28Z

I also am experiencing these errors. Particularly annoying in pipelines as it exits with a hard error and fails deployment even though the message suggests to just retry (which also works, but has to start our pipelines from the very beginning). Same resource as mentioned above, azurerm_linux_virtual_machine...

eh-michael · 2022-11-14T18:45:33Z

Hello, I am also experiencing this error when deploying a VM of resource azurerm_windows_virtual_machine. Appreciate any assistance with this. Appreciate anyones insight into this. Happy to run any troubleshooting steps provided.

mathbab · 2023-06-10T13:00:40Z

Hello, Is there any retry mechanism in place for this issue. I have observed Retryable error and the health history in the azure portal says...

"Unavailable : Resource health event (Unplanned)At Saturday, June 10, 2023 at 5:06:18 AM XXX, the Azure monitoring system received the following information regarding your Virtual machine:Your virtual machine is unavailable at the moment. Please check back in a few minutes for any updates we find on the source of the unavailability of this VM. No additional action is required from you at this time.
"

and the VM was already up by the time it was checked in the portal... so a retry from the provider is much needed.

mpo-me · 2023-08-08T09:35:23Z

Hello,

we experience this problem occasionally on the provisioning of a simple VM at Azure and it makes the azurerm provider very unpredictable and unstable. I also agree that the API is not being used correctly because the error code "RetryableError" communicates to the API consumer that it could and (in my opinion) should be retried.

We have therefore developed a complex logic (many wrapper scripts) around Terraform to manually detect and handle such errors (deleting and re-provisioning), but the whole thing is very messy and makes the use of Terraform absurd.

Is there any news on this topic?

TCDooM · 2023-08-10T07:38:25Z

same issue here, seems like the provider needs to implement a retry on retriable errors from Azure OR at least not fail without updating the state...

MvRoo · 2023-10-13T13:10:10Z

We were running into this as well, while creating a VM using the azurerm_linux_virtual_machine resource. We found out through the Azure activity logs that in our case this was caused by parallel updates terraform was doing to the subnet that we were also deploying the VM into. We fixed this by explicitly waiting until the subnet changes were done, using depends_on in the vm resource.

Examples of the entries we found in the logs:
Cannot proceed with operation because resource /subscriptions/###/resourceGroups/test/providers/Microsoft.Network/virtualNetworks/test-vnet/subnets/main used by resource /subscriptions/####/resourceGroups/test/providers/Microsoft.Network/networkInterfaces/test-vm-nic is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is PutSubnetOperation.

Cannot proceed with operation because resource /subscriptions/####/resourceGroups/test/providers/Microsoft.Network/virtualNetworks/test-vnet/subnets/main used by resource /subscriptions/####/resourceGroups/test/providers/Microsoft.Network/loadBalancers/test-lb is not in Succeeded state. Resource is in Updating state and the last operation that updated/is updating the resource is PutSubnetOperation.

damianvandoom · 2024-08-28T09:52:00Z

I want to add that this issue isn't unique to Terraform.

I deploy via BICEP and have encountered this issue several times.

magodo · 2024-11-26T04:10:24Z

Similar to #21293

dbergel added the bug label May 24, 2022

github-actions bot removed the bug label May 24, 2022

bryan-bar mentioned this issue Jan 6, 2023

Azure Virtual Machine Support EnterpriseDB/edb-terraform#16

Merged

rcskosir added the service/virtual-machine label Jun 8, 2023

catriona-m added the v/3.x label Jul 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Virtual machine creation fails with RetryableError #16928

Virtual machine creation fails with RetryableError #16928

dbergel commented May 24, 2022

myc2h6o commented May 26, 2022

ekristen commented Sep 15, 2022 •

edited

Loading

alok0310 commented Sep 29, 2022

ekristen commented Sep 29, 2022

mtin commented Oct 5, 2022

eh-michael commented Nov 14, 2022

mathbab commented Jun 10, 2023 •

edited

Loading

mpo-me commented Aug 8, 2023 •

edited

Loading

TCDooM commented Aug 10, 2023

MvRoo commented Oct 13, 2023 •

edited

Loading

damianvandoom commented Aug 28, 2024

magodo commented Nov 26, 2024

Virtual machine creation fails with RetryableError #16928

Virtual machine creation fails with RetryableError #16928

Comments

dbergel commented May 24, 2022

Is there an existing issue for this?

Community Note

Terraform Version

AzureRM Provider Version

Affected Resource(s)/Data Source(s)

Terraform Configuration Files

Debug Output/Panic Output

Expected Behaviour

Actual Behaviour

Steps to Reproduce

Important Factoids

References

myc2h6o commented May 26, 2022

ekristen commented Sep 15, 2022 • edited Loading

alok0310 commented Sep 29, 2022

ekristen commented Sep 29, 2022

mtin commented Oct 5, 2022

eh-michael commented Nov 14, 2022

mathbab commented Jun 10, 2023 • edited Loading

mpo-me commented Aug 8, 2023 • edited Loading

TCDooM commented Aug 10, 2023

MvRoo commented Oct 13, 2023 • edited Loading

damianvandoom commented Aug 28, 2024

magodo commented Nov 26, 2024

ekristen commented Sep 15, 2022 •

edited

Loading

mathbab commented Jun 10, 2023 •

edited

Loading

mpo-me commented Aug 8, 2023 •

edited

Loading

MvRoo commented Oct 13, 2023 •

edited

Loading