Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent Error: Unable to locate Storage Account when RAGRS/RAZGRS account kind is used #15048

Open
andrey-dubnik opened this issue Jan 20, 2022 · 60 comments
Assignees
Labels
bug service/storage upstream/microsoft/waiting-on-service-team This label is applicable when waiting on the Microsoft Service Team v/2.x (legacy) v/3.x

Comments

@andrey-dubnik
Copy link
Contributor

andrey-dubnik commented Jan 20, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

2.29.0

Affected Resource(s)

azurerm_storage_account intermittently produces an error on plan for the already created resource Error: Unable to locate Storage Account

Terraform Configuration Files

resource "azurerm_resource_group" "example" {
  name     = "example-resources"
  location = "West Europe"
}

resource "azurerm_storage_account" "example" {
  name                     = "storageaccountname"
  resource_group_name      = azurerm_resource_group.example.name
  location                 = azurerm_resource_group.example.location
  account_tier             = "Standard"
  account_replication_type = "RAZGRS"
}

Expected Behaviour

Should be no error for the RA and non RA accounts

Actual Behaviour

intermittently produces an error on plan for the already created resource Error: Unable to locate Storage Account

Steps to Reproduce

  1. terraform apply
  2. terraform plan

Important Factoids

Only affecting RA storage accounts

@magodo
Copy link
Collaborator

magodo commented Jan 21, 2022

@andrey-moor I've applied the configuration as you provided above (a typo in the account_replication_type, which should be RAGZRS rather than RAZGRS). Then I started a loop to run terraform plan until it returns error. With a bunch of iterations, the issue doesn't appear.
So would you please follow the terraform debug guide and provide the debug log here? Mostly that is due to some API issues, while we will need the "request id" and other context info for further contacting the Azure supports.

@andrey-dubnik
Copy link
Contributor Author

Forgot to mention the issue was reproducible only via GH actions and was intermittent. It may be down to the Azure portal replication etc. as GH hosts runners in multiple location and 1st available runner is selected for the pipeline.

East US (eastus)
East US 2 (eastus2)
West US 2 (westus2)
Central US (centralus)
South Central US (southcentralus)

Similar behaviour is affecting the KeyVault but with much less frequency

Let me see if I can reproduce the issue and capture the debug data via the standalone pipeline, we have switched to ZGRS accounts to unblock the delivery so original config is not available.

@varnav
Copy link
Contributor

varnav commented Jan 24, 2022

I'm seeing this on creation:

Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: Service returned an error. Status=404 Code="StorageAccountNotFound" Message="The storage account testsa was not found."

with 2.93.1 with ZGRS type.

Related: #5299 (comment)

@andrey-dubnik
Copy link
Contributor Author

andrey-dubnik commented Feb 4, 2022

@magodo sorry took me a while, I have captured the trace debug of the problem.

@andrey-dubnik
Copy link
Contributor Author

suspect it is related to the call which comes back empty - https://docs.microsoft.com/en-us/rest/api/storagerp/storage-accounts/list

2022-02-03T16:14:30.610Z [DEBUG] provider.terraform-provider-azurerm_v2.94.0_x5: AzureRM Response for https://management.azure.com/subscriptions/264f194d-71cc-41d9-9bf4-c4f5456285e8/providers/Microsoft.Storage/storageAccounts?api-version=2021-04-01: 
HTTP/2.0 200 OK
Cache-Control: no-cache
Content-Type: application/json; charset=utf-8
Date: Thu, 03 Feb 2022 16:14:30 GMT
Expires: -1
Pragma: no-cache
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Ms-Correlation-Request-Id: fc3a2733-bb7a-9fdc-09e6-d9510d90bf10
X-Ms-Request-Id: 612cc49f-6921-4522-9a25-c7eaa4e7b3b2
X-Ms-Routing-Request-Id: EASTUS2:20220203T161430Z:612cc49f-6921-4522-9a25-c7eaa4e7b3b2

{"value":[]}: timestamp=2022-02-03T16:14:30.609Z

@andrey-dubnik
Copy link
Contributor Author

Seem the workaround is to create a TAG on the storage account (via portal), after that the issue goes away and above API returns the value. More to it it started to return value for all the storage account in that sub which were having the issue...

@andrey-dubnik
Copy link
Contributor Author

The workaround came from the #11059 as we've seen similar but not that permanent issue with the keyvault. The root cause is highly likely to be the same for both of those issues.

@magodo
Copy link
Collaborator

magodo commented Feb 7, 2022

@andrey-dubnik Just to be sure, the LIST call for the SA on your test sub always return an empty list, even after waiting for a long while? Is there any other SA in that sub?

@andrey-dubnik
Copy link
Contributor Author

This is correct, terraform was able to obtain the keys and all the data for the account but the list api returned blanc hence the account can't be found error.

There was another account in the sub which was having an issue originally so in total there are 2 accounts in there. After tagging at least one the second account also appeared in the api call.

@magodo magodo added the upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR label Feb 7, 2022
@mariussm
Copy link
Contributor

mariussm commented Mar 2, 2022

I can add that tagging a storage account worked on my side as well. My storage account was created like this:

resource "azurerm_storage_account" "sa" {
    name                     = var.storage_account_name
    location                 = var.location
    account_tier             = "Standard"
    account_replication_type = "GRS"
    resource_group_name      = azurerm_resource_group.rg.name
}

///It worked 1 out of 8 times, or something, before adding a manual tag. Now it seems stable. ///

Edit: Nope, it is not stable at all with the tag thing either.

@roehrijn
Copy link

roehrijn commented Mar 3, 2022

This issue is also causing sleepless nights on our side here. What I can say so far: most likely this issue is caused by some race condition hitting sorts of ARM API limits. We're experiencing this on a Terraform project with around 100 resources (1 storage account, 1 key vault, a lot of role assignments to them) and I'm not able to reproduce the issue in a smaller project. However, calling terraform apply -parallelism=1 seems to help.

Furthermore:
We're experiencing the very same behaviour also on Key Vaults (is switches randomly from Storage Account to Key Vault, whatever Terraform tries to access first). Here is a log example from that:

    2022-03-03T06:57:41.0852887Z 2022-03-03T06:57:40.565Z [INFO]  Starting apply for azurerm_key_vault_secret.githubtoken
    2022-03-03T06:57:41.0853464Z 2022-03-03T06:57:40.565Z [DEBUG] azurerm_key_vault_secret.githubtoken: applying the planned Update change
    2022-03-03T06:57:41.0854255Z 2022-03-03T06:57:40.567Z [INFO]  provider.terraform-provider-azurerm_v2.98.0_x5: preparing arguments for AzureRM KeyVault Secret update.: timestamp=2022-03-03T06:57:40.566Z
    2022-03-03T06:57:41.0854946Z 2022-03-03T06:57:40.567Z [DEBUG] provider.terraform-provider-azurerm_v2.98.0_x5: AzureRM Request: 
    2022-03-03T06:57:41.0855810Z GET /subscriptions/***/resources?%24filter=resourceType+eq+%27Microsoft.KeyVault%2Fvaults%27+and+name+eq+%27net-prod-839977%27&%24top=5&api-version=2020-06-01 HTTP/1.1
    2022-03-03T06:57:41.0856269Z Host: management.azure.com
    2022-03-03T06:57:41.0857164Z User-Agent: Go/go1.17.5 (amd64-linux) go-autorest/v14.2.1 Azure-SDK-For-Go/v61.4.0 resources/2020-06-01 HashiCorp Terraform/1.1.6 (+https://www.terraform.io) Terraform Plugin SDK/2.10.1 terraform-provider-azurerm/2.98.0 pid-222c6c49-1b0a-5959-a213-6608f9eb8820
    2022-03-03T06:57:41.0857908Z X-Ms-Correlation-Request-Id: b01b3074-24a4-9613-dd3f-c0a93338159e
    2022-03-03T06:57:41.0858375Z Accept-Encoding: gzip: timestamp=2022-03-03T06:57:40.566Z
    2022-03-03T06:57:41.0859118Z 2022-03-03T06:57:40.635Z [ERROR] vertex "azurerm_key_vault_secret.githubtoken" error: Unable to determine the Resource ID for the Key Vault at URL "https://net-prod-839977.vault.azure.net/"
    2022-03-03T06:57:41.0860401Z 2022-03-03T06:57:40.635Z [DEBUG] provider.terraform-provider-azurerm_v2.98.0_x5: AzureRM Response for https://management.azure.com/subscriptions/***/resources?%24filter=resourceType+eq+%27Microsoft.KeyVault%2Fvaults%27+and+name+eq+%27net-prod-839977%27&%24top=5&api-version=2020-06-01: 
    2022-03-03T06:57:41.0861012Z HTTP/2.0 200 OK
    2022-03-03T06:57:41.0861306Z Cache-Control: no-cache
    2022-03-03T06:57:41.0861671Z Content-Type: application/json; charset=utf-8
    2022-03-03T06:57:41.0861971Z Date: Thu, 03 Mar 2022 06:57:39 GMT
    2022-03-03T06:57:41.0862330Z Expires: -1
    2022-03-03T06:57:41.0862605Z Pragma: no-cache
    2022-03-03T06:57:41.0863019Z Strict-Transport-Security: max-age=31536000; includeSubDomains
    2022-03-03T06:57:41.0863386Z Vary: Accept-Encoding
    2022-03-03T06:57:41.0863725Z X-Content-Type-Options: nosniff
    2022-03-03T06:57:41.0864181Z X-Ms-Correlation-Request-Id: b01b3074-24a4-9613-dd3f-c0a93338159e
    2022-03-03T06:57:41.0864686Z X-Ms-Ratelimit-Remaining-Subscription-Reads: 11997
    2022-03-03T06:57:41.0865152Z X-Ms-Request-Id: f6f7065a-d272-43ec-9bf3-526a820189ac
    2022-03-03T06:57:41.0865654Z X-Ms-Routing-Request-Id: WESTUS3:20220303T065740Z:f6f7065a-d272-43ec-9bf3-526a820189ac
    2022-03-03T06:57:41.0865936Z 
    2022-03-03T06:57:41.0866144Z {"value":[]}: timestamp=2022-03-03T06:57:40.634Z

@mariussm
Copy link
Contributor

mariussm commented Mar 3, 2022

@roehrijn , I have tried with parallelism 1 now, and it does not resolve any issue when it comes to storage accounts at least. Maybe that workaround only works for key vaults?

@roehrijn
Copy link

roehrijn commented Mar 7, 2022

Hi @mariussm, it also works for storage accounts in my environment. However, as a wrote, this is unfortunately likely to be some sort of race condition in rate limiting. That's why I think parallelism=1 is not a 100% fix/workaround. Hope MS is going to address this soon.

@damoodamoo
Copy link
Contributor

@roehrijn / @andrey-dubnik - we've experienced this issue over the past month or so: We found that a call to the /resources?%24filter=resourceType+eq+%27Microsoft.KeyVault% endpoint provided different results depending on the region that serviced that ARM API call. Essentially if the X-Ms-Routing-Request-Id value was the same as the location of the resource, the call returned the key vault - but if not it sporadically would return an empty array. This had a downstream effect on the rest of the TF deployment as it treated the resource as missing. It would be interesting for you to check the X-Ms-Routing-Request-Id of your calls to see if there's a correlation between successful and failed ones.

We have an open case with the ARM API team, and so far they've confirmed that it's an issue with the ARM API cross-region Cache not being updated quick enough.

In terms of fixes / workarounds - they're currently a bit limited:

  • Recommendation from the API team is to use the Azure Resource Graph for list operations, which has a 1 minute SLO for replication. Potentially a pretty big change to the provider code.
  • Wait for the replication issues to die down + backend to improve
  • Find some way of manually targeting the ARM API at the location that contains the resource

cc: @stuartleeks

@andrey-dubnik
Copy link
Contributor Author

Using tags as a workaround worked so far and the portal cache was replicated. There were no re-occurrences of the issue since tagging.

Since this is a provider-api scope there is no way to influence it externally. If this is a replication lag and not a permanent issue then adding a retry logic would probably help in mitigating the issue as worst case it would be 1 min SLO in oppose to an error which is already good enough.

If tagging permanently fixes the issue maybe the api team can use this in the replication fix...

@292368
Copy link

292368 commented May 18, 2022

Hello .

I had also experienced same Intermittent issue i.e Unable to locate Storage Account for GRS S account.
Root cause of this issue is ARM cache sync issue. Contact to Microsoft to resync the storage account .

It working for me post resync of ARM cache

Thanks
Tushar P

@paweltucholskiigt
Copy link

paweltucholskiigt commented Jun 17, 2022

I got this issue also.
az storage account list
was listing []
adding tag did fix the problem so indeed this looks like stale cache problem.

@imduchy
Copy link
Contributor

imduchy commented Jul 1, 2022

We're experiencing this problem in GitHub actions too.

Run hashicorp/setup-terraform@v1.3.2

provider registry.terraform.io/hashicorp/azurerm v3.10.0

@AvtsVivek
Copy link

AvtsVivek commented Jul 4, 2022

I am experiencing the same problem on my local machine. Here is the repo.

I run these commands and I encounter this specifically when I run the destroy commands.

terraform plan -destroy -out main.destroy.tfplan

terraform apply main.destroy.tfplan

│ Error: Unable to locate Storage Account "staticwebsiteprfpim"!

The following shows plan command.

image

I get that error when I run the command for the first time. When I re-run the same command the second time, things run fine.

Same is the case with apply command as well.

@paalders
Copy link

paalders commented Jul 7, 2022

When using GH Actions in combination with AzureRM and account_replication_type LRS you have the same problem. When I run the same terraform apply locally I don't have any issues. Seems related to GH actions.

azrumrm version 3.12.0

resource "azurerm_storage_account" "redacted" {
  name                     = local.storage_account_name
  resource_group_name      = azurerm_resource_group.redacted.name
  location                 = azurerm_resource_group.redacted.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
}

image

@ekristen
Copy link

ekristen commented Aug 4, 2022

Using azurerm 3.16.0, storage account type LRS, this happens fairly frequently.

@magodo
Copy link
Collaborator

magodo commented Aug 5, 2022

@ekristen and @paalders could you please help confirm that the storage account list API (i.e. /subscriptions/{subscriptionId}/providers/Microsoft.Storage/storageAccounts) doesn't return the expected storage account, while the ARG API does? If that is the case, we might consider migrating to use ARG instead.

@AndrzejK-Atende
Copy link

AndrzejK-Atende commented Aug 12, 2022

Same problem in my case:

Terraform v.1.2.6
azurerm: v.3.17.0

It does not count if SA is LRS, od ZRS, 7 times on 10 I get error: "Unable to locate Storage Account".

@MLKiiwy
Copy link

MLKiiwy commented Aug 13, 2022

Also same issue for me on github actions + terraform cloud. Also LRS type for replication.
The storage account cannot be found.

I can see it on the azure dashboard and on my local machine the command line tool return the correct list.

Terraform v.1.2.6
azurerm: v.3.17.0

@tkekkone
Copy link

tkekkone commented Feb 8, 2024

For me this kept happening when I deployed even a completely new set of resources to new rg. After deploying the storage accounts, the plan stage failed because it did not find the storage account that was just deployed. Also az storage account list showed empty results. When I did az login and az account set --subscription... again, it finds the storage account and plan works.

@Uqqasha
Copy link

Uqqasha commented Mar 1, 2024

Still facing this issue intermittently on Storage Account Standard_LRS StorageV2.

Terraform version: 1.7.4
Azure provider: 3.91.0

Error on terraform apply
Error: retrieving queue properties for Storage Account (Subscription: "xxx" Resource Group Name: "xxx" Storage Account Name: "xxx"): queues.Client#GetServiceProperties: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthenticationFailed" Message="Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.\ with azurerm_storage_account.st, on xxx line 17, in resource "azurerm_storage_account" "st": 17: resource "azurerm_storage_account" "st" {

Error on terraform destroy
╷ │ Error: unable to locate "Storage Account (Subscription: \"xxxx\"\nResource Group Name: \"xxx\"\nStorage Account Name: | \"xxx\")" │ │ with azurerm_storage_account.st, │ on xxx line 17, in resource "azurerm_storage_account" "st": │ 17: resource "azurerm_storage_account" "st" { │

@alexivanov-danone
Copy link

I got this issue today, while deploying new resources into a clean subscription.
Not sure what this is about, i was creating 158 new resources, out of that 3 were storage accounts.

@c4milo

This comment was marked as duplicate.

@hrda81
Copy link

hrda81 commented Mar 12, 2024

Same issue for me on two storage accounts with an account replication type of "GRS"

Terraform version: 1.7.5
Azure provider: 3.92.0

@jobeehichoya

This comment was marked as duplicate.

@juanforero-ckw
Copy link

Same issue with type of "LRS"

Terraform version: 1.7.5
Azure provider: 3.95.0

@rcskosir rcskosir added v/3.x upstream/microsoft/waiting-on-service-team This label is applicable when waiting on the Microsoft Service Team and removed upstream/microsoft Indicates that there's an upstream issue blocking this issue/PR labels Mar 14, 2024
@musabmirza-amperon

This comment was marked as off-topic.

@musabmirza-amperon

This comment was marked as off-topic.

@alxndr13

This comment was marked as duplicate.

@pabloopez

This comment was marked as duplicate.

@scorpion35
Copy link

scorpion35 commented Apr 11, 2024

Same here.

Terraform v1.8.0

.\.terraform\providers\registry.terraform.io\hashicorp\azurerm\3.98.0\windows_386\terraform-provider-azurerm_v3.98.0_x5.exe

It works in one run, but stopped working majority of the time.

resource "azurerm_storage_account" "storage-account" {
  resource_group_name = "storage-rg"
  location            = "eastus"
  name = "stgacestus3839"

  account_tier             = "Standard"
  account_replication_type = "LRS"
  min_tls_version          = "TLS1_2"
}

Powershell equivalent is working fine though

Get-AzStorageAccount -ResourceGroupName "storage-rg"

# this returns the storage account properly

Seems the unreleased version 3.99.0 has some fixes around storage account? https://github.com/hashicorp/terraform-provider-azurerm/blob/main/CHANGELOG.md#3990-unreleased

Anyone know when this could be released?

@scorpion35
Copy link

Same here.

Terraform v1.8.0

.\.terraform\providers\registry.terraform.io\hashicorp\azurerm\3.98.0\windows_386\terraform-provider-azurerm_v3.98.0_x5.exe

It works in one run, but stopped working majority of the time.

resource "azurerm_storage_account" "storage-account" {
  resource_group_name = "storage-rg"
  location            = "eastus"
  name = "stgacestus3839"

  account_tier             = "Standard"
  account_replication_type = "LRS"
  min_tls_version          = "TLS1_2"
}

Powershell equivalent is working fine though

Get-AzStorageAccount -ResourceGroupName "storage-rg"

# this returns the storage account properly

Seems the unreleased version 3.99.0 has some fixes around storage account? https://github.com/hashicorp/terraform-provider-azurerm/blob/main/CHANGELOG.md#3990-unreleased

Anyone know when this could be released?

Womp womp. Looks like 3.99.0 was released last night, but it doesn't fix the issue. Still seeing

Planning failed. Terraform encountered an error while generating this plan.

╷
│ Error: unable to locate "Storage Account (Subscription: \"*****\"\nResource Group Name: \"*****\"\nStorage Account Name: \"****\")"
terraform init -upgrade

# Upgraded to terraform-provider-azurerm_v3.99.0_x5.exe (from 3.98.0)

@jkroepke
Copy link
Contributor

We got this error with 3.99 today.

But our issue was, that we had setup AzureAD Authentication for the Storage Account default_to_oauth_authentication=true, but we missed to set storage_use_azuread at provider level.

@c4milo
Copy link
Contributor

c4milo commented May 9, 2024

I got this using v3.102.0 today. It is really frustrating.

❯ terraform version
Terraform v1.7.5
on darwin_arm64
+ provider registry.terraform.io/hashicorp/azurerm v3.102.0

@scorpion35
Copy link

scorpion35 commented May 9, 2024

Lot of people are running into this issue, and for a long time. Anyone know how to summon the this repo's gods?

Tried to take a look at the code base, but it's in Go and I am not aware of this language yet

@manicminer sorry to ping you directly, saw your name on the latest closed PR's approver. Can you help out summon this repo's gods please

@ekristen
Copy link

ekristen commented May 9, 2024

So ultimately this is an Azure problem. Their API sucks. They decided it's better to be fast then accurate and have take eventually consistent to heart, it will be eventually consistent but that could be 30secs or 10 hours.

HOWEVER, I believe the terraform provider could be better!! It could stub state, I could not error on 404 not found given or try harder to wait knowing that the Azure API is terrible.

I'm a firm believer that unless it's a direct 409 conflict error, state should be stubbed with azure because of its predictable conflicting naming conventions. I also think that given its eventual consistency that the provide must try harder, wait longer, or stub state for a subsequent run and not error on things like this.

@tombuildsstuff tombuildsstuff self-assigned this May 23, 2024
@gctrevino
Copy link

I encountered this issue a few days ago when upgrading AzureRM from 2.90 to 3.107, I have tried the tag workaround and it does nothing to my problem. I am testing the change in a sandbox environment, in the dev environment everything is working perfectly.

Sandbox environment

#Can do a fresh deploy, but if I try to update, it fails consistently with Error: unable to locate Storage Account
AzureDevOps Pipeline
Terraform 1.0.2
AzureRM 3.107.0

Dev environment

AzureDevOps Pipeline
Terraform 1.0.2
AzureRM 2.90.0

storage block

resource "azurerm_storage_account" "learning_materials_storage_account" {
  name                            = "sa${var.location_abb}lm${var.project_key}${var.env}"
  resource_group_name             = var.rg_name
  location                        = var.location
  account_tier                    = "Standard"
  account_replication_type        = "RAGRS"
  enable_https_traffic_only       = true
  min_tls_version                 = "TLS1_2"
  allow_nested_items_to_be_public = "true"

  tags = var.tags
}

@gctrevino
Copy link

gctrevino commented Jun 10, 2024

I am not saying that this is the solution, but, I noticed that when I ran az storage account list without specifying the subscription, it returned an empty list [], so this left me thinking that maybe is the same for azurerm, specially because I use 3 different subscriptions.

So, I explicitly added the tenant and subscription ids to the setup of the azurerm and it seems to be working for me now. It would be good if someone else could make the test on their side to see if this consistently fixes the issue.

provider "azurerm" {
  tenant_id       = var.tenant_id
  subscription_id = var.subscription_id

  features {...}
}

Update:
Went to AzureRM 3.68.0 with Terraform 1.8.5 and have not encountered this error anymore. If I go up to 3.104.0 or higher, I get the error fairly frequently.

@chapmanc
Copy link

Just to add a note we are seeing this with azurerm at 3.98.0.

Can we get an update @tombuildsstuff as it's been a couple Months since an update :).

@manicminer
Copy link
Contributor

In the next provider release, we'll be updating the SDK used for storage accounts to use hashicorp/go-azure-sdk. Once that's been released, and we've addressed any potential regressions, we can take another look at this issue.

@samuel-baena-stenn
Copy link

Any updates? Still facing this today

@cglendenning
Copy link

👋 - Just hit this (again) today. Any update 🙏 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug service/storage upstream/microsoft/waiting-on-service-team This label is applicable when waiting on the Microsoft Service Team v/2.x (legacy) v/3.x
Projects
None yet
Development

No branches or pull requests