-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure Linux Agent fails to install extensions in Linux VMSS without settings
field in the extension
block
#23688
Comments
Just observed that a downside to the workaround is that terraform detects the |
👍 We see the same issue with Windows VMSS, terraform fails with the error:
The logs on the VM show:
We tried serveral different type handler versions and had the same issue. If I deploy the extension from the Portal which creates an ARM template with the following it works:
Which doesn't makes sense because this is the same config as via Terraform - e.g. we don't specify setings (which under the same thing as publicSettings). We also assumed that this might be some API level change that has caused this. The workaround suggested by @Laffs2k5 doesn't work for Windows it fails with the error below because it expects a principal ID to enable Intune management:
In our case they are Windows 2019 VMSS and the issue occurs with version 3.52 and 3.65.0 of the azurerm Terraform provider and using Terraform 1.3.2. |
Interesting that there's similar behavior with the Windows variant. Here is a updated workaround for the Linux VMSS without false positives: just add a dummy value to the settings ex. |
I can reproduce the issue with an Ubuntu 20.04 VMSS with Terraform 1.3.2 and version 3.65.0 of the azurerm provider. Also the workaround does not work for me in this case. I am using the custom script, Azure Monitor Agent and Application Health extensions. I have also tried only using some extensions but that didn't help - even if I have one extension - the Azure Monitor Agent extension it still fails with the goal state error. The only thing I can think of is to try to use the ordering of extensions through the explicit dependencies as @Laffs2k5 has done. |
So for Linux I think the fix is in this PR Azure/WALinuxAgent#2957 |
@Laffs2k5 You have already pointed out to the correct fix on the Agent. We started rolling out a workaround last Friday, but it will take a few weeks to reach all regions that are affected. As far as workarounds, A safer workaround is to remove the deployAfterExtensions property. You can re-add it once we have this scenario working correctly. The issue occurred because the service side was ignoring deployAfterExtensions when the extension has no settings. This issue on the service was fixed recently and ended up exposing this bug in the Agent. Removing deployAfterExtensions should not affect the functionality in your template, since anyways it had been ignored on extensions with no settings until very recently. I can link to this issue when we have a full fix on the service and the agent. Note that this is in specific for the Linux Agent. What is the issue with the Windows Agent? I can relay it to the Windows team. |
Thanks @narrieta - the issue with the Windows agent is very similar the VMSS fails to provision with the error:
When we then look at the logs on the VM it says:
However, unlike with Linux VMSS, we cannot supply a value for the settings field, because this is used on Windows when the VM is a Windows 10 VM that is intended to be onboarded to Intune (see here) We raised a support ticket but are struggling to convince them that this is due to a change in the backend service and interaction with the agent. |
@vijaytdh thank you. I relayed this to the Windows team. Could you share the ID of the support ticket? |
@narrieta great, thank you. The ticket TrackingID is 2310250050004606 |
@vijaytdh - I asked the Windows team to help with this ticket |
@vijaytdh I talked to the Windows team and, unfortunately, there is no workaround on the Agent side. The service side is rolling back the changes and that should alleviate the issue both on Linux and Windows. If you are looking for a temporary workaround, removing deployAfterExtensions should work. Note that this would need to be removed only on extensions with no settings. |
@narrieta thanks for following up on this and for the suggested workaround. It is good to hear that the service side change is being rolled back. |
@narrieta I checked and for Windows VMSS I am not even using deployAfterExtensions (I did have this earlier to see if forcing a strict ordering of the extensions somehow workaround the issue)......having said that I just retried a deployment and it worked! 😃 So I think the rollback may have aready happened (well at least in East US2 and North Europe - the two regions I just tested with). |
Thanks for taking the time to submit this issue. It looks like this has been resolved as of Azure/WALinuxAgent#2957 on the Azure side. As such, I am going to mark this issue as closed. If that is not the case, please provide additional information including the version in which you are still experiencing this issue, thanks! |
For anyone stumbling over this issue: it's correct that a fix has been merged as stated. But at the time of writing (3 months after merge of the fix) an updated version of WALinuxAgent has still to be deployed. My understanding is that the fix is part of the |
@Laffs2k5 even though the page says that there hasn't been a new version released, I can confirm that the issue has been fixed. I had been in contact with Microsoft on my end and did some testing after the technical support rep confirmed the fix had been rolled out, and it's been working great ever since. No idea why the new version doesn't show in the releases though. |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Is there an existing issue for this?
Community Note
Terraform Version
1.5.6
AzureRM Provider Version
3.77.0
Affected Resource(s)/Data Source(s)
azurerm_linux_virtual_machine_scale_set
Terraform Configuration Files
Debug Output/Panic Output
Expected Behaviour
The VMSS should enter running state, provision extensions and be healthy.
Actual Behaviour
The Azure Linux Agent (waagent) in each VM's in the VMSS fails to install the extensions as configured with error as shown.
We have been running several Linux VMSS (Ubuntu 20.04) with the given extension configuration for well over a year now.
The VMSS's are turned off during night and boots up each morning. Except today. There have been no changes in the Azure resource, no change in the terraform code or in provider version.
After a bit of digging around it turns out that the python code in waagent fails when an extension is declared without the
settings
field, as in this example for the extension namedazure-ad-ssh-login
.The workaround turned out to be quite simple: add
settings = jsonencode({})
to the extension declaration to help the python code.Not sure what has change that caused this to happened, either in ARM API or waagent?
The current waagent relase `2.9.1.1' is from April 2023. Note sure about
Creating this issue to make someone look into if the
settings
field of theextension
block ofazurerm_linux_virtual_machine_scale_set
(and possibly the windows variant) should be made required. Or maybe the provider should default to adding an empty JSON object when thesettings
field is not declared?Steps to Reproduce
terraform apply
.tail -f /var/log/waagent.log
Important Factoids
No response
References
The python stack trace from waagent points to code here: https://github.com/Azure/WALinuxAgent/blob/v2.9.1.1/azurelinuxagent/common/protocol/extensions_goal_state_from_vm_settings.py#L499
The text was updated successfully, but these errors were encountered: