-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Virtual machine creation fails with RetryableError #16928
Comments
Hi @dbergel thanks for opening the issue! From the config and the error, I'm not able to identify the root cause, but the |
There are two issues at play here, why is azure throwing an error, especially "A retryable error occurred" and how the terraform provider is handling said error. Since this error is specifically stated to be retryable, this should not be treated as a fatal error, instead it should simply retry the API call. I see this happen with the same resource, azurerm_linux_virtual_machine, it starts the creation, I get a "Still creating [10s elapsed]" then the error happens and terraform exits. Since this specific error says it's retryable, I would suggest the provider simply retry whenever it encouters this error and then we'd get a "Still creating [20s elapsed]" etc, until it's created, or the internal timer (10-20 minutes) is hit, OR a non-retryable error is encountered. One additional note: this is a pretty common theme throughout the provider and it makes it very frustrating to use, which is not entirely on the provider itself as the Azure API is just terribly inconsistent, but retrying retryable errors instead of treating them as fatal would go a long way in improving the user experience of this provider. |
This is really frustrating as we only see this issue when using Azure, but, not AWS. |
@alok0310 I don't believe so. From what I can tell this is due to the Azure API just being junky but at the same time in my opinion and also from what I can tell that the provider is not retrying errors dictated by the Azure API as "retryable" and instead exits hard as if an error occurred. |
I also am experiencing these errors. Particularly annoying in pipelines as it exits with a hard error and fails deployment even though the message suggests to just retry (which also works, but has to start our pipelines from the very beginning). Same resource as mentioned above, |
Hello, I am also experiencing this error when deploying a VM of resource azurerm_windows_virtual_machine. Appreciate any assistance with this. Appreciate anyones insight into this. Happy to run any troubleshooting steps provided. |
Hello, Is there any retry mechanism in place for this issue. I have observed Retryable error and the health history in the azure portal says... "Unavailable : Resource health event (Unplanned)At Saturday, June 10, 2023 at 5:06:18 AM XXX, the Azure monitoring system received the following information regarding your Virtual machine:Your virtual machine is unavailable at the moment. Please check back in a few minutes for any updates we find on the source of the unavailability of this VM. No additional action is required from you at this time. and the VM was already up by the time it was checked in the portal... so a retry from the provider is much needed. |
Hello, we experience this problem occasionally on the provisioning of a simple VM at Azure and it makes the azurerm provider very unpredictable and unstable. I also agree that the API is not being used correctly because the error code "RetryableError" communicates to the API consumer that it could and (in my opinion) should be retried. We have therefore developed a complex logic (many wrapper scripts) around Terraform to manually detect and handle such errors (deleting and re-provisioning), but the whole thing is very messy and makes the use of Terraform absurd. Is there any news on this topic? |
same issue here, seems like the provider needs to implement a retry on retriable errors from Azure OR at least not fail without updating the state... |
We were running into this as well, while creating a VM using the azurerm_linux_virtual_machine resource. We found out through the Azure activity logs that in our case this was caused by parallel updates terraform was doing to the subnet that we were also deploying the VM into. We fixed this by explicitly waiting until the subnet changes were done, using depends_on in the vm resource. Examples of the entries we found in the logs:
|
Similar to #21293 |
Is there an existing issue for this?
Community Note
Terraform Version
1.1.9
AzureRM Provider Version
3.0.2
Affected Resource(s)/Data Source(s)
azurerm_linux_virtual_machine azurerm_windows_virtual_machine
Terraform Configuration Files
Debug Output/Panic Output
Expected Behaviour
Virtual machines provisioned successfully, retryable errors automatically retried.
Actual Behaviour
Intermittently workstation provisioning will fail with a basic "retryable error" with no additional information. Not able to reproduce 100%
Steps to Reproduce
terraform apply
Important Factoids
No response
References
No response
The text was updated successfully, but these errors were encountered: