add replace-provider capability #145

mdb · 2023-08-10T12:26:30Z

This addresses issue #144 by providing a replace-provider action.

Note that, unlike other state actions, the replace-provider action does not
support a -state-out:

terraform state replace-provider \
  -state=terraform.tfstate \
  -state-out=out.tfstate \
  registry.terraform.io/hashicorp/local registy.mdb.io/hashicorp/local

Error parsing command-line flags: flag provided but not defined:
-state-out

tfmigrate/state_replace_provider_action_test.go

mdb · 2023-08-16T00:06:48Z

tfmigrate/state_replace_provider_action_test.go

+
+	if !strings.Contains(planErr.Error(), expected) {
+		t.Fatalf("expected migrator plan error to include: %s; received: %s", expected, err)
+	}


☝️ There may be a better way to author this test. In this case, because we've replaced the legitimate registry.terraform.io/hashicorp/local provider with an illegitimate registry.tfmigrate.io/hashicorp/local provider, plan is expected to error, but we can verify that it does so with the expected illegitimate provider replacement, in effect validating that replace-provider did its job.

Hmmm, this test is quite tricky. I'll investigate if there is any way to test for a happy path.

I couldn't find the best solution. Let me share possible options:

(1) Use a registry proxy such as https://github.com/jasonwbarnett/terraform-registry-proxy
Pros: A general solution for all providers
Cons: Require additional setup for testing

(2) Use any untrusted fork of hashicorp/null and rename to it
Pros: Easy to use
Cons: Potential supply chain attack risk

(3) Fork hashicorp/null by myself
Pros: Trust to use
Cons: Publishing a forked provider to the Terraform Registry takes a lot of work. I don't have experience in developing providers.

(4) Use a real-world example of renaming a namespace for verified providers such as hashicorp/github to integrations/github
Pros: Trust to use
Cons: Need a valid credential
The data source did not result in a plan diff even with namespace renaming, so a resource is needed. However, we do not want to create a real resource for testing. I tried to import a public repository, but it requires a valid GitHub token even a public one.

(5) Embed a legacy v0.12.x tfstate example as a test fixture and force push to remote in advance
Pros: No additional dependency
Cons: As I mentioned in #144, the test may break in the future because the behavior is undefined when an old state file format is loaded with a new version of terraform state commands.

As a reasonable compromise, I am leaning towards option (5) and am trying it out, but when I load a legacy tfstate created in v0.12.31 with Terraform v1.5.5, the terraform init fails.
I'm wondering if I need to run state replace-provider before terraform init. Does it work as intended in your upgrade scenario?

a05728d678f2:/work/tmp/pr-145/dir5# cat main.tf resource "null_resource" "foo" {} a05728d678f2:/work/tmp/pr-145/dir5# terraform -v Terraform v1.5.5 on linux_arm64 Your version of Terraform is out of date! The latest version is 1.5.6. You can update by downloading from https://www.terraform.io/downloads.html a05728d678f2:/work/tmp/pr-145/dir5# terraform init && terraform apply -auto-approve a05728d678f2:/work/tmp/pr-145/dir5# terraform state list null_resource.foo a05728d678f2:/work/tmp/pr-145/dir5# terraform providers Providers required by configuration: . └── provider[registry.terraform.io/hashicorp/null] Providers required by state: provider[registry.terraform.io/hashicorp/null]

a05728d678f2:/work/tmp/pr-145/dir5# cat legacy.tfstate { "version": 4, "terraform_version": "0.12.31", "serial": 1, "lineage": "3cb3461c-0eab-a7b2-e2e0-57a93c3cc2a1", "outputs": {}, "resources": [ { "mode": "managed", "type": "null_resource", "name": "foo", "provider": "provider.null", "instances": [ { "schema_version": 0, "attributes": { "id": "4107759474886503934", "triggers": null } } ] } ] } a05728d678f2:/work/tmp/pr-145/dir5# terraform state push -force legacy.tfstate a05728d678f2:/work/tmp/pr-145/dir5# terraform providers Providers required by configuration: . └── provider[registry.terraform.io/hashicorp/null] Providers required by state: provider[registry.terraform.io/-/null] a05728d678f2:/work/tmp/pr-145/dir5# terraform init Initializing the backend... ╷ │ Error: Invalid legacy provider address │ │ This configuration or its associated state refers to the unqualified provider "null". │ │ You must complete the Terraform 0.13 upgrade process before upgrading to later versions. ╵

a05728d678f2:/work/tmp/pr-145/dir5# terraform state replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null Terraform will perform the following actions: ~ Updating provider: - registry.terraform.io/-/null + registry.terraform.io/hashicorp/null Changing 1 resources: null_resource.foo Do you want to make these changes? Only 'yes' will be accepted to continue. Enter a value: yes Successfully replaced provider for 1 resources. a05728d678f2:/work/tmp/pr-145/dir5# terraform providers Providers required by configuration: . └── provider[registry.terraform.io/hashicorp/null] Providers required by state: provider[registry.terraform.io/hashicorp/null] a05728d678f2:/work/tmp/pr-145/dir5# terraform init Initializing the backend... Initializing provider plugins... - Reusing previous version of hashicorp/null from the dependency lock file - Using previously-installed hashicorp/null v3.2.1 Terraform has been successfully initialized! You may now begin working with Terraform. Try running "terraform plan" to see any changes that are required for your infrastructure. All Terraform commands should now work. If you ever set or change modules or backend configuration for Terraform, rerun this command to reinitialize your working directory. If you forget, other commands will detect it and remind you to do so if necessary. a05728d678f2:/work/tmp/pr-145/dir5# terraform plan null_resource.foo: Refreshing state... [id=4107759474886503934] No changes. Your infrastructure matches the configuration. Terraform has compared your real infrastructure against your configuration and found no differences, so no changes are needed.

I'm wondering if I need to run state replace-provider before terraform init. Does it work as intended in your upgrade scenario?

I don't believe you can run terraform state replace-provider before first running terraform init. Otherwise, you'll run into an Error loading the state error:

terraform state replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null Error loading the state: Backend initialization required, please run "terraform init": Reason: Initial configuration of th e requested backend "s3"

In my experience, when using a Terraform version >= v0.13.0 to run terraform state replace-provider against a legacy v0.12.31 tfstate, it's necessary to...

First run terraform init; this will fail with the Error: Invalid legacy provider address error you mention above. Despite the failure, it's still necessary.

Run terraform state replace-provider

I suggest the acceptance test confirm this upgrade scenario by embedding the legacy v0.12.x tfstate as a test fixture and force push to remote in advance.

@minamijoyo Thanks for the suggestion. I gave this a shot (see 4ecad8d). I have mixed feelings about some aspects of the implementation, so please let me know if you have other concerns and feedback.

Shouldn't we need to fix OverrideBackendToLocal as well as setupWorkDir? The error occurs during rollback but does not throw the error. It works in unintended balance, but it is not so user-friendly.

6d3015a26631:/work/tmp/pr-145/dir11# cat test.hcl migration "state" "test" { dir = "." actions = [ "replace-provider registry.terraform.io/-/null registry.terraform.io/hashicorp/null", ] } 6d3015a26631:/work/tmp/pr-145/dir11# tfmigrate plan test.hcl 2023/08/28 14:52:35 [INFO] [runner] load migration file: test.hcl 2023/08/28 14:52:35 [INFO] [migrator] start state migrator plan 2023/08/28 14:52:35 [INFO] [migrator@.] terraform version: 1.5.5 2023/08/28 14:52:35 [INFO] [migrator@.] initialize work dir 2023/08/28 14:52:35 [INFO] [migrator@.] ignoring error 'Error: Invalid legacy provider address' initilizing work dir; the error is expected when using Terraform >= 0.13 with a legacy Terraform state 2023/08/28 14:52:35 [INFO] [migrator@.] get the current remote state 2023/08/28 14:52:35 [INFO] [migrator@.] override backend to local 2023/08/28 14:52:35 [INFO] [executor@.] create an override file 2023/08/28 14:52:35 [INFO] [migrator@.] creating local workspace folder in: terraform.tfstate.d/default 2023/08/28 14:52:35 [INFO] [executor@.] switch backend to local 2023/08/28 14:52:35 [INFO] [migrator@.] compute a new state 2023/08/28 14:52:36 [INFO] [migrator@.] check diffs 2023/08/28 14:52:36 [INFO] [executor@.] remove the override file 2023/08/28 14:52:36 [INFO] [executor@.] remove the workspace state folder 2023/08/28 14:52:36 [INFO] [executor@.] switch back to remote 2023/08/28 14:52:36 [ERROR] [executor@.] failed to switch back to remote: failed to run command (exited 1): terraform init -input=false -no-color -reconfigure stdout: Initializing the backend... Successfully configured the backend "s3"! Terraform will automatically use this backend unless the backend configuration changes. stderr: Error: Invalid legacy provider address This configuration or its associated state refers to the unqualified provider "null". You must complete the Terraform 0.13 upgrade process before upgrading to later versions. 2023/08/28 14:52:36 [ERROR] [executor@.] please re-run terraform init -reconfigure 2023/08/28 14:52:36 [INFO] [migrator] state migrator plan success! 6d3015a26631:/work/tmp/pr-145/dir11# echo $? 0

My intuition was that the error would also occur when switching the backend to local, but I wonder why it doesn't.
Switching to local with a direct aws s3 cp of a v0.12.31 tfstate doesn't cause an error, but with terraform v1.5.5, After state push, switching to local also gives an error.
Rollback to remote throws the error in both cases. I don't know why 🤷‍♂️

Anyway, it is better to fix the OverrideBackendToLocal function to suppress the legacy state error in both the switch and rollback phases, isn't it?

👍 Good call! I took a stab at implementing your suggestion in 37e2834. How's that look?

This addresses issue #144 by providing a `replace-provider` action. Note that, unlike other `state` actions, the `replace-provider` action does not support a `-state-out`: ``` terraform state replace-provider \ -state=terraform.tfstate \ -state-out=out.tfstate \ registry.terraform.io/hashicorp/local registy.mdb.io/hashicorp/local Error parsing command-line flags: flag provided but not defined: -state-out ``` Signed-off-by: Mike Ball <mikedball@gmail.com>

mdb · 2023-08-17T14:51:56Z

@minamijoyo Addressing issue #145 and adding replace-provider functionality turned out to be a bit more quirky than I anticipated, but this first pass implementation is ready for your review. Thanks!

minamijoyo · 2023-08-22T13:49:31Z

@mdb Thanks for working on this! As you said, it's more complicated than I expected, and there are some points where I need to check the Terraform CLI behavior, so please give me some time to review it. Instead of returning the review results to you in a summary, I will comment on them in order as I study the areas in my spare time.

tfexec/terraform.go

tfexec/terraform_providers_test.go

tfexec/terraform_state_replace_provider.go

tfexec/terraform_state_replace_provider_test.go

tfmigrate/state_action.go

tfmigrate/state_replace_provider_action.go

This corrects a typo, ensuring that StateReplaceProviderAction implements StateAction.

StateReplaceProviderAction#StateUpdate now correctly documents that the method updates a provider address in TF state.

This seeks to add additional unit tests surrounding scenarios where... * `terraform version` succeeds, but returns an unsupported version * `terraform version` exits nonzero

mdb · 2023-08-24T14:59:26Z

@mdb Thanks for working on this! As you said, it's more complicated than I expected, and there are some points where I need to check the Terraform CLI behavior, so please give me some time to review it. Instead of returning the review results to you in a summary, I will comment on them in order as I study the areas in my spare time.

Thanks @minamijoyo - I believe I've addressed the concerns you've expressed so far. I'll stay tuned for any further feedback!

* use force-pushed TF 0.12.31 state file in `state replace-provider` acceptance tests, [as suggested in code review](#145 (comment)) * allow `terraform init` to fail -- as expected -- in instances where a non-legacy Terraform is used against a legacy TF state for the purposes of `state replace-provider`-ing

Makefile

docker-compose.yml

mdb · 2023-08-28T01:33:37Z

tfmigrate/migrator.go

-		return nil, nil, err
+		acceptable := "Error: Invalid legacy provider address"
+		if supportsStateReplaceProvider && strings.Contains(err.Error(), acceptable) {
+			log.Printf("[INFO] [migrator@%s] ignoring error '%s' initilizing work dir; the error is expected when using Terraform %s with a legacy Terraform state\n", tf.Dir(), acceptable, constraints)


☝️ Could/should this logic be improved? For example, perhaps the Error: Invalid legacy provider address should only be permitted during state replace-provider actions, but not other actions? Doing so would require passing the action(s) to setupWorkDir, though. Do you have thoughts, @minamijoyo ?

I think it is safer to narrow down the special error-handling conditions.
Rather than passing actions directly to the setupWorkDir function, why not pass it as a bool argument, such as ignoreLegacyState?

@minamijoyo Yep; we can pass a bool flag if you prefer. I opted not to, as that arguably redundantly distributes the need to correctly set this bool value across all invocations of the function (rather than consolidated the logic to a single place, within the function). However, perhaps you're right that doing so is best in this case. Stay tuned :)

If logic is duplicated, we can define some helper functions.

@minamijoyo I believe I've implemented your suggestion in 67ed675. Does that seem acceptable?

@minamijoyo

allow `terraform init` to fail -- as expected -- in instances where a non-legacy Terraform is used against a legacy TF state for the purposes of `state replace-provider`-ing _also_ when invoking `terraform init` from within `OverrideBackendToLocal`. This addresses the _second_ `init` error seen in the logs: ``` 2023/08/28 14:52:35 [INFO] [runner] load migration file: test.hcl 2023/08/28 14:52:35 [INFO] [migrator] start state migrator plan 2023/08/28 14:52:35 [INFO] [migrator@.] terraform version: 1.5.5 2023/08/28 14:52:35 [INFO] [migrator@.] initialize work dir 2023/08/28 14:52:35 [INFO] [migrator@.] ignoring error 'Error: Invalid legacy provider address' initilizing work dir; the error is expected when using Terraform >= 0.13 with a legacy Terraform state 2023/08/28 14:52:35 [INFO] [migrator@.] get the current remote state 2023/08/28 14:52:35 [INFO] [migrator@.] override backend to local 2023/08/28 14:52:35 [INFO] [executor@.] create an override file 2023/08/28 14:52:35 [INFO] [migrator@.] creating local workspace folder in: terraform.tfstate.d/default 2023/08/28 14:52:35 [INFO] [executor@.] switch backend to local 2023/08/28 14:52:35 [INFO] [migrator@.] compute a new state 2023/08/28 14:52:36 [INFO] [migrator@.] check diffs 2023/08/28 14:52:36 [INFO] [executor@.] remove the override file 2023/08/28 14:52:36 [INFO] [executor@.] remove the workspace state folder 2023/08/28 14:52:36 [INFO] [executor@.] switch back to remote 2023/08/28 14:52:36 [ERROR] [executor@.] failed to switch back to remote: failed to run command (exited 1): terraform init -input=false -no-color -reconfigure stdout: Initializing the backend... Successfully configured the backend "s3"! Terraform will automatically use this backend unless the backend configuration changes. stderr: Error: Invalid legacy provider address This configuration or its associated state refers to the unqualified provider "null". You must complete the Terraform 0.13 upgrade process before upgrading to later versions. 2023/08/28 14:52:36 [ERROR] [executor@.] please re-run terraform init -reconfigure 2023/08/28 14:52:36 [INFO] [migrator] state migrator plan success! ``` This seeks to address feedback from @minamijoyo: #145 (comment) Signed-off-by: Mike Ball <mikedball@gmail.com>

@minamijoyo

This seeks to further reduce the acceptance of `Error: Invalid legacy provider address` errors to scenarios where tfmigrate is performing `state replace-provider`. This is inspired by code review feedback from @minamijoyo here: #145 (comment)

minamijoyo · 2023-09-04T09:48:16Z

tfmigrate/migrator.go

@@ -60,7 +70,7 @@ func setupWorkDir(ctx context.Context, tf tfexec.TerraformCLI, workspace string,
 	}
 	// override backend to local
 	log.Printf("[INFO] [migrator@%s] override backend to local\n", tf.Dir())
-	switchBackToRemoteFunc, err := tf.OverrideBackendToLocal(ctx, "_tfmigrate_override.tf", workspace, isBackendTerraformCloud, backendConfig)
+	switchBackToRemoteFunc, err := tf.OverrideBackendToLocal(ctx, "_tfmigrate_override.tf", workspace, isBackendTerraformCloud, backendConfig, supportsStateReplaceProvider)


@mdb Thank you for working on this!
I think we should pass ignoreLegacyStateInitErr instead of supportsStateReplaceProvider to OverrideBackendToLocal. Because the supportStateReplaceProvider depends on the Terraform version; on the other hand, the ignoreLegacyStateInitErr depends on actions. The ignoreLegacyStateInitErr is more strict to check.

@minamijoyo That sounds reasonable - I pushed a change doing so!

@minamijoyo

Per feedback from @minamijoyo... > Because the supportStateReplaceProvider depends on the Terraform version; on the other hand, the ignoreLegacyStateInitErr depends on actions. The ignoreLegacyStateInitErr is more strict to check. #145 (comment)

minamijoyo

Thank you for your great work! LGTM 👍

mdb · 2023-09-04T13:52:58Z

@minamijoyo Thanks so much for the patient reviews and great feedback!

minamijoyo · 2023-09-04T14:12:12Z

@mdb Thank you for your contribution! I've shipped it in v0.3.15. 🚀

mdb changed the title ~~WIP: add replace-provider capability~~ add replace-provider capability Aug 15, 2023

mdb marked this pull request as ready for review August 15, 2023 10:56

mdb commented Aug 15, 2023

View reviewed changes

tfmigrate/state_replace_provider_action_test.go Outdated Show resolved Hide resolved

mdb commented Aug 16, 2023

View reviewed changes

minamijoyo reviewed Aug 22, 2023

View reviewed changes

tfexec/terraform.go Show resolved Hide resolved

minamijoyo reviewed Aug 22, 2023

View reviewed changes

tfexec/terraform_providers_test.go Outdated Show resolved Hide resolved