Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage - allow azurerm_storage_account to be used in Data Plane restrictive environments #27818

Merged
merged 8 commits into from
Nov 5, 2024

Conversation

jackofallops
Copy link
Member

@jackofallops jackofallops commented Oct 30, 2024

Community Note

  • Please vote on this PR by adding a 👍 reaction to the original PR to help the community and maintainers prioritise for review
  • Please do not leave comments along the lines of "+1", "me too" or "any updates", they generate extra noise for PR followers and do not help prioritize for review

Description

This PR introduces functionality to allow the azurerm_storage_account resource to be deployed in situations that prohibit data plane access, such as Terraform agents that are behind restrictive Enterprise Firewalls.

To preserve the ability to configure Data Plane only sections of the configuration, 2 new resources are introduced, these are azurerm_storage_account_queue_properties and azurerm_storage_account_static_website. These are intended to supersede the (now deprecated) queue_properties and static_website blocks on the azurerm_storage_account resource, these deprecated blocks are planned to be removed in v5.0 of the provider.

Also introduced is the data_plane_available Provider Feature flag. This allows the provider to be globally configured to not use the data plane APIs at all for azurerm_storage_account. The caveats are that the availability guarantees for the Data endpoints being ready are not available, and the queue_properties and static_website blocks become incompatible configuration, and will error on plan and apply. To allow these blocks to still be configured, The 2 new sub-resources mentioned above can allow users to configure these items post-creation of the account, and if necessary, after the creation of Private Endpoints. See below for example usage of the flag in conjunction with PE's:

provider "azurerm" {
  features {
    storage {
        data_plane_available = false 
    }
  }
}

resource "azurerm_storage_account" "nodataplane" {
  provider                 = azurerm.no_data_plane
  name                     = "account1"
  resource_group_name      = azurerm_resource_group.example
  location                 = "westeurope"
  account_tier             = "Standard"
  account_kind             = "StorageV2"
  account_replication_type = "LRS"
  min_tls_version          = "TLS1_2"
}

resource "azurerm_private_endpoint" "static_website" {
#  network specific config...
}

resource "azurerm_storage_account_static_website" "test" {
  storage_account_id = azurerm_storage_account.test.id
  error_404_document = "sadpanda_2.html"
  index_document     = "index_2.html"
  depends_on = [ azurerm_private_endpoint.static_website ]
}

PR Checklist

  • I have followed the guidelines in our Contributing Documentation.
  • I have checked to ensure there aren't other open Pull Requests for the same update/change.
  • I have checked if my changes close any open issues. If so please include appropriate closing keywords below.
  • I have updated/added Documentation as required written in a helpful and kind way to assist users that may be unfamiliar with the resource / data source.
  • I have used a meaningful PR title to help maintainers and other users understand this change and help prevent duplicate work.
    For example: “resource_name_here - description of change e.g. adding property new_property_name_here

Changes to existing Resource / Data Source

  • I have added an explanation of what my changes do and why I'd like you to include them (This may be covered by linking to an issue above, but may benefit from additional explanation).
  • I have written new tests for my resource or datasource changes & updated any relevent documentation.
  • I have successfully run tests with my changes locally. If not, please provide details on testing challenges that prevented you running the tests.

Testing

  • My submission includes Test coverage as described in the Contribution Guide and the tests pass. (if this is not possible for any reason, please include details of why you did or could not add test coverage)

Change Log

Below please provide what should go into the changelog (if anything) conforming to the Changelog Format documented here.

  • azurerm_storage_account - can now be created and managed if Data Plane endpoints are blocked by a firewall
  • New Resource: azurerm_storage_account_queue_properties
  • New Resource: azurerm_storage_account_static_website
  • New Provider Feature - storage data_plane_available feature flag

This is a (please select all that apply):

  • Bug Fix
  • New Feature (ie adding a service, resource, or data source)
  • Enhancement
  • Breaking Change

Depends on #27733
Closes #27750
Closes #27310
Closes #10765

Comment on lines +39 to +42
var e pollers.PollingDroppedConnectionError
if errors.As(err, &e) {
return nil, err
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other two availability pollers return the PollingDroppedConnectionError based on whether resp.HttpResponse == nil. Can we not use the same pattern for checking here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the API call GetServiceProperties does not return the same struct, and does not contain HttpResponse, it instead returns the details and an error type. We check that error type here via errors.As() to take the appropriate action.

Copy link
Member

@stephybun stephybun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jackofallops, provided the tests in both 4.0 and 5.0 mode pass this looks good to me 🫎

@jackofallops
Copy link
Member Author

Test results
v4 mode:
image

v5 mode:
image

@jackofallops jackofallops merged commit 6f0ca28 into main Nov 5, 2024
31 of 32 checks passed
@jackofallops jackofallops deleted the b/storage_account_dataplane_rework branch November 5, 2024 09:37
@github-actions github-actions bot added this to the v4.9.0 milestone Nov 5, 2024
@joakimlemb
Copy link

joakimlemb commented Nov 12, 2024

This only works for storage accounts created or imported after "data_plane_available = false" is set, for existing storage accounts (even if the TF code doesn't include any queue_properties{}) it will need to be removed from the state file and reimported.

If this is working as designed, a note about this limitation in the https://registry.terraform.io/providers/hashicorp/azurerm/4.9.0/docs/resources/storage_account docs would be helpful.

@bethforsyth
Copy link

bethforsyth commented Nov 12, 2024

(I'd left a comment on #27750, but commenting here too) - needing to re-import existing storage accounts is extremely non ideal for managing long lived resources - the more manual fiddling with terraform state you do, the easier it is to break something, and we use terraform to manage some fairly critical infratructure. If it is possible to make this upgradable with no special state manipulation (maybe warnings about any queue_properties or static_website blocks that would be lost if not migrated to the new types) that would be vastly preferable. 😄

@jackofallops
Copy link
Member Author

Hi @joakimlemb

You are correct, this was intended to be for newly created accounts, however, it is not necessary to set for existing accounts as the API calls to the Data Plane endpoints failing for reasons relating to network connection availability will now not fail the Terraform operation (create/read/update). This tolerance does come at the expense of deployment time, in that we have to perform retries to help protect against temporary failures, but ultimately fire-walled deployments unable to reach Data Plane URLs will still succeed. The data_plane_available feature flag is intended for use where the queue_properties and static_website blocks are never going to be used on the resource (even post creation after any Private Endpoints are available) to bypass these availability checks altogether and provide a similar experience to that which will be in place after 5.0 ships and these properties/blocks are removed altogether in favour of their new replacement resources. There should be no need to remove/re-import to make use of this functionality.

I can see that the storage feature block is missing from the features documentation, so we'll get that added with an explanation as above. I'll take a look at making the feature flag compatible with existing storage accounts too, but I suspect that may potentially end up with a Diff that would need to be added to ignore_changes since there may already be values present in the state.

Hope this helps!

@joakimlemb
Copy link

Hope this helps!

It does, thank you for the detailed explanation.

@bethforsyth
Copy link

@jackofallops Thanks for the clarification on in intended behaviour! To make sure I understand - you're saying that taking 4.9.0 should allow deployment of storage accounts with no change to definitions other than the provider version? Should that also apply to terraform plan? I've just attempted to test this out, and I'm still seeing errors on terraform plan operations unfortunately.

Resource Group Name: "<rg name>"
Storage Account Name: "<storage account name>"): executing request: Get "https://<storage account name>.blob.core.windows.net/?comp=properties&restype=service": dial tcp: lookup <storage account name>.blob.core.windows.net on 127.0.0.53:53: no such host```

@jackofallops
Copy link
Member Author

@jackofallops Thanks for the clarification on in intended behaviour! To make sure I understand - you're saying that taking 4.9.0 should allow deployment of storage accounts with no change to definitions other than the provider version? Should that also apply to terraform plan? I've just attempted to test this out, and I'm still seeing errors on terraform plan operations unfortunately.

Resource Group Name: "<rg name>"
Storage Account Name: "<storage account name>"): executing request: Get "https://<storage account name>.blob.core.windows.net/?comp=properties&restype=service": dial tcp: lookup <storage account name>.blob.core.windows.net on 127.0.0.53:53: no such host```

Hi @bethforsyth - Apologies for the delay in response, I didn't get a notification for some reason!

Correct, in every scenario (customer and theoretical) this was tested, this was the case. I see from the error you provided that you're using a link-local address for DNS resolution? Do you have a local Hosts file configured (or local DNS resolver) on your Terraform agent that has some bespoke configuration for responding to the Data Plane endpoint domains? (e.g. *.blob.core.windows.net) If so, can you test without this in place?

(in the meantime, I'll check why I didn't get notified of your mention, and I'll keep popping back to this issue to check for replies. Sorry again about that!)

@gesnaud
Copy link

gesnaud commented Dec 10, 2024

Hello @jackofallops !

I tried to use the data_plane_available feature without success

  • I created a storage account with data_plane_available = false and with public_network_access_enabled = false
  • Then I change the code to add a container to the created storage account
  • It failed with a 403
  • I destroyed everything
  • I created a storage account with data_plane_available = true and with public_network_access_enabled = false
  • Then I change the code to add a container to the created storage account
  • It failed agauin with a 403.

Shoud I have misunderstood the purpose? OR normally with data_plane_available = true I should have been able to create a container?

Thanks for your reply!

@gesnaud
Copy link

gesnaud commented Dec 11, 2024

Hi @joakimlemb !

Thanks for the tips... But, how to know which management features is concerned?

@jackofallops , I tried another time by

  1. Create ressource group + storage account + container with the following configuration:
  • Trying proivider version 3.117.0
  • data_plane_available = false
  • A network rule allowing one of the both public ip I have access to.
  • The host from where I execute the code does not use the public IP authorized on the storage account.
  • I expect, thanks to the feature, that my code will be able to update the storage account even if it is not in the white list.
  1. I update the code by adding the creation of a blob .
  • I have a 403....
  1. Rollback on blob creation, but modify code to add an IP in white list
  • Still have a 403...
  1. I upgrade provider to 4.9.0:
  • Still have a 403...

please @jackofallops , teach me how to use your feature!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment