Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to marshal state to json error when upgrading from 1.5.1 to 1.6.0 #256

Closed
AgustinBettati opened this issue Dec 22, 2023 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@AgustinBettati
Copy link

AgustinBettati commented Dec 22, 2023

terraform-plugin-testing version

1.6.0

Relevant provider source code

Migration test that uses an old version 1.11.0 of our provider and then runs a plan operation using latest build.

func TestAccClusterRSCluster_withDefaultBiConnectorAndAdvancedConfiguration_maintainsBackwardCompatibility(t *testing.T) {
	var (
		...
	)

	resource.ParallelTest(t, resource.TestCase{
		PreCheck:     func() { acc.PreCheckBasic(t) },
		CheckDestroy: acc.CheckClusterDestroy,
		Steps: []resource.TestStep{
			{
				ExternalProviders: map[string]resource.ExternalProvider{
					"mongodbatlas": {
						VersionConstraint: "1.11.0",
						Source:            "mongodb/mongodbatlas",
					},
				},
				Config:            config,
				Check: resource.ComposeTestCheckFunc(
					testAccCheckMongoDBAtlasClusterExists(resourceName, &cluster),
					testAccCheckMongoDBAtlasClusterAttributes(&cluster, name),
					resource.TestCheckResourceAttrSet(resourceName, "project_id"),
					resource.TestCheckResourceAttr(resourceName, "name", name),
				),
			},
			{
				ProtoV6ProviderFactories: acc.TestAccProviderV6Factories,
				Config:                   config,
				ConfigPlanChecks: resource.ConfigPlanChecks{
					PostApplyPreRefresh: []plancheck.PlanCheck{
						acc.DebugPlan(),
					},
				},
				PlanOnly: true,
			},
		},
	})
}

Relevant attribute definition from version 1.11.0, which was removed in version 1.12.0

.....,
"bi_connector": {
        Type:          schema.TypeMap,
        Optional:      true,
        Deprecated:    "use bi_connector_config instead",
        ConflictsWith: []string{"bi_connector_config"},
        Elem: &schema.Schema{
	        Type: schema.TypeString,
        },
},

Terraform Configuration Files

Note that bi_connector attribute is not defined.

resource "mongodbatlas_cluster" "test" {
	project_id                   = mongodbatlas_project.cluster_project.id
	name                         = %[3]q
	disk_size_gb                 = 100
        cluster_type = "REPLICASET"
        replication_specs {
            num_shards = 1
            regions_config {
	        region_name     = "EU_CENTRAL_1"
                electable_nodes = 3
                priority        = 7
                read_only_nodes = 0
            }
        }
	cloud_backup                 = %[4]t
	pit_enabled                  = %[4]t
	retain_backups_enabled       = true
	auto_scaling_disk_gb_enabled = %[5]t
	provider_name               = "AWS"
	provider_instance_size_name = "M30"
}

Expected Behavior

Test is running successfully with version 1.5.1 of terraform-plugin-testing.

When running this scenario manually no issues are encountered:

  • Resource is created with version 1.11.0 and as a result having "bi_connector": null defined in the state.
  • An empty plan is returned when running with the latest version of our provider (which does not have this attribute defined in the schema).

Actual Behavior

2023-12-21T22:45:14.929+0100 [DEBUG] sdk.helper_resource: Stopping providers: test_terraform_path=/Users/agustin.bettati/bin/terraform test_working_directory=/var/folders/d6/zqxkvmcs15gb5xtn30rnxz780000gp/T/plugintest3689099347 test_step_number=2 test_name=TestAccClusterRSCluster_withDefaultBiConnectorAndAdvancedConfiguration_maintainsBackwardCompatibility
2023-12-21T22:45:14.929+0100 [ERROR] sdk.helper_resource: Error retrieving state, there may be dangling resources: test_working_directory=/var/folders/d6/zqxkvmcs15gb5xtn30rnxz780000gp/T/plugintest3689099347
  error=
  | exit status 1
  | Failed to marshal state to json: unsupported attribute "bi_connector"
   test_step_number=2 test_name=TestAccClusterRSCluster_withDefaultBiConnectorAndAdvancedConfiguration_maintainsBackwardCompatibility test_terraform_path=/Users/agustin.bettati/bin/terraform
--- FAIL: TestAccClusterRSCluster_withDefaultBiConnectorAndAdvancedConfiguration_maintainsBackwardCompatibility (473.86s)

To provide more context, bi_connector is an attribute that was removed in version 1.12.0, meaning it is defined in the state as null when the resource is created (not defined in the config) with version 1.11.0, and then when using the latest version (higher than 1.12.0) as the attribute is unknown it would seem to cause an error in the step 2.

Steps to Reproduce

  1. Two different versions of same provider must be available, and the latest version must have a removed an optional attribute definition.
  2. Define a migration test using version 1.6.0 of terraform-plugin-testing in which state is created having the removed attribute defined with null value, and then run a plan with latest version.

References

@AgustinBettati AgustinBettati added the bug Something isn't working label Dec 22, 2023
@bflad
Copy link
Contributor

bflad commented Dec 26, 2023

Hi @AgustinBettati 👋 Thank you for reporting this issue and sorry you ran into trouble here. A lot of the maintainers are out this week for an end of year break, however let's see if we cannot get you unstuck.

What I suspect is happening is the combination of:

  • The automatic TestCase-level destroy relies on parsing the final state after all test steps before calling terraform destroy (source of the error)
  • terraform-plugin-testing v1.6.0 removed explicit calls to the terraform refresh command to better match practitioner workflows (e.g. calling plan with its default -refresh=true instead)
  • The terraform refresh command updates state to match the current provider schema
  • The second TestStep with PlanOnly: true implicitly included terraform refresh before, but after v1.6.0 the state is no longer automatically updated with the current provider schema before the destroy begins

In general, PlanOnly: true (plan without apply) is going to be problematic in a final TestStep when the underlying provider schema changed between TestStep. The hashicorp/aws provider maintainers ran into similar issues with state upgrades for a similar reason and opted to fix their tests to remove PlanOnly: true (additional context in #237). I would also suggest the same remediation of removing PlanOnly: true in tests like these since you do really need the state to be updated as part of that TestStep since the provider schema changed. You can still verify that there are no expected plan differences via ConfigPlanChecks and plancheck.ExpectEmptyPlan() which is methodology we intend to support going forward.

Could you let us know how removing PlanOnly: true goes for tests like these and roughly how many tests this issue affected out of your test suite? As mentioned in #237, there might be room to rollback the refresh changes or potentially we could try to detect when the provider setup changed to explicitly call refresh before the final destroy. In either case though, we would likely be making the testing logic do the exact same setup in the next major version of terraform-plugin-testing, so its probably best to update your testing code anyways.

Apologies again for the hassle and please reach out if you have any questions or issues. Thank you.

@bflad bflad self-assigned this Dec 26, 2023
@bflad bflad added the waiting-response Issues or pull requests waiting for an external response label Dec 26, 2023
bflad added a commit that referenced this issue Dec 26, 2023
Reference: #223
Reference: #237
Reference: #256

Post-release update to the v1.6.0 CHANGELOG to capture now-known situations that would affect existing provider testing. This update ensures developers upgrading to v1.6.0 will receive the latest guidance and the guidance will remain for this version after future releases. If approved and merged, the v1.6.0 GitHub Release description will also be updated with the same additional content.
@AgustinBettati
Copy link
Author

AgustinBettati commented Dec 27, 2023

Thank you for the response, the suggestion provided did in fact work.

how many tests this issue affected out of your test suite?

Only one test was impacted, but I imagine this also depends on the provider version defined in the migration test.

I was wondering if it would also be worth updating the Migration Test Documentation as it currently makes use of PlanOnly. Would you suggest modifying all our existing migrations tests to use plancheck.ExpectEmptyPlan()?

@github-actions github-actions bot removed the waiting-response Issues or pull requests waiting for an external response label Dec 27, 2023
bflad added a commit that referenced this issue Dec 28, 2023
Reference: #223
Reference: #237
Reference: #256

Post-release update to the v1.6.0 CHANGELOG to capture now-known situations that would affect existing provider testing. This update ensures developers upgrading to v1.6.0 will receive the latest guidance and the guidance will remain for this version after future releases. If approved and merged, the v1.6.0 GitHub Release description will also be updated with the same additional content.
@bflad
Copy link
Contributor

bflad commented Dec 28, 2023

@AgustinBettati thank you so much for the response.

I was wondering if it would also be worth updating the Migration Test Documentation as it currently makes use of PlanOnly.

That is a great suggestion. I will submit that change over in terraform-plugin-framework as that is where that particular documentation source is located.

Would you suggest modifying all our existing migrations tests to use plancheck.ExpectEmptyPlan()?

Without PlanOnly: true, I think you will want/need that additional testing assertion. PlanOnly: true would skip the test step logic that allowed plan differences to be applied by instead going right to its non-empty plan checking. Now that test step will allow any plan and apply it, then do the non-empty plan checking, those plans will likely be empty since any differences would already be applied.

bflad added a commit to hashicorp/terraform-plugin-framework that referenced this issue Dec 28, 2023
bflad added a commit that referenced this issue Dec 28, 2023
Reference: #237
Reference: #256

Technically, this testing change is always safe to make, but this calls out the currently known situations it will cause errors to not over-burden developers during their migration.
@AgustinBettati
Copy link
Author

Appreciate the doc updates, this brings more clarity. Happy to close this issue if nothing else is pending.

bflad added a commit that referenced this issue Jan 2, 2024
Reference: #237
Reference: #256

Technically, this testing change is always safe to make, but this calls out the currently known situations it will cause errors to not over-burden developers during their migration.
bflad added a commit to hashicorp/terraform-plugin-framework that referenced this issue Jan 2, 2024
@bflad
Copy link
Contributor

bflad commented Jan 9, 2024

Thanks, @AgustinBettati!

@bflad bflad closed this as completed Jan 9, 2024
Copy link

github-actions bot commented Feb 9, 2024

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants