-
Notifications
You must be signed in to change notification settings - Fork 9.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect order of destroying resources - Flaky/inconsistent behavior #23635
Comments
Hi @cvbarros! In recent releases we have made adjustments to the handling of dependencies during destroy. Could you give this a try with the latest release of Terraform and see if that improves the behavior? Looking at your trace log (thanks!) we can see the dependency graph Terraform built for the destroy step:
It does indeed seem that Terraform hasn't represented the dependency from This sort of missing dependency during destroy does seem consistent with the behavior fixed by the recent changes, one of which you've referenced from your issue here and was included for the first time in Terraform 0.12.14. |
Hi @teamterraform, Thanks for looking into it - However, the provider is built against the latest release (v0.12.17), as seen here: Also, I was looking at porting the provider to https://github.com/hashicorp/terraform-plugin-sdk, and these latest fixes weren't applied there, so I held that off for now. Edit: Another thing worth mentioning - There is another situation where the graph is not built correctly, although it involves a more complicated scenario like #18408. Test on:
Depicted by:
When updating to the new configuration, that removes template2 and adds a new resource and dependency (to update build config), Terraform tries to destroy template2 prior to updating the dependant resource, build config. Updated config graph:
In order to work around that in my tests I've kept the If helpful, I can also provide diagnostic output on this case - but I've reported the first one as it is way simpler to reproduce/understand. |
Hi @cvbarros, thanks for the extra info. The dependencies are solely handled by core, so the important part here is what is running on the cli. A provider should not be built with terraform 0.12.17, and must use the terraform-plugin-sdk. The changes that you linked cannot be applied to that project, because they are not compatible with the legacy state and acceptance test formats. That said, you seem to only have failures here during acceptance tests. Is this appearing as a regression when using terraform 0.12.17 (which is not unexpected)? Is this also happening when applying and destroying directly from the cli? If this is reproducible from the cli, the trace output from that execution would be more helpful. If this is only present when using the terraform-plugin-sdk, there's probably not much we can do other than work around the issue until the work on a new testing framework has progressed enough to take over the testing facilities for the plugin sdk. |
Hi James, thanks a bunch for taking a look at it!
This helps a lot, I'll migrate to terraform-plugin-sdk and see if this helps. I'll get back to you with results. In case this is a regression, I'll revert back to TF
I'll run tests using these configurations from the CLI as well and report back the results. |
I think the cut is post 0.12.7. The Terraform Plugin docs only list a date, which is September 2019 and coincides roughly with 0.12.8. |
Provider built against Core v0.12.7 I've completed a set of tests here regarding the 2nd use case (one with the graph, as per my comment above. Given the configuration on the first apply: provider "teamcity" {
address = var.teamcity_url
username = var.teamcity_username
password = var.teamcity_password
version = "~> 0.5.3"
}
resource "teamcity_project" "build_config_project_test" {
name = "build_config_project_test"
}
resource "teamcity_build_config" "build_configuration_test" {
name = "build config test"
project_id = teamcity_project.build_config_project_test.id
templates = [ teamcity_build_config.build_configuration_template1.id, teamcity_build_config.build_configuration_template2.id ]
}
resource "teamcity_build_config" "build_configuration_template1" {
name = "build template 1"
is_template = true
project_id = teamcity_project.build_config_project_test.id
}
resource "teamcity_build_config" "build_configuration_template2" {
name = "build template 2"
is_template = true
project_id = teamcity_project.build_config_project_test.id
} Apply works as expected. When updating it to the next configuration, which
provider "teamcity" {
address = var.teamcity_url
username = var.teamcity_username
password = var.teamcity_password
version = "~> 0.5.3"
}
resource "teamcity_project" "build_config_project_test" {
name = "build_config_project_test"
}
resource "teamcity_build_config" "build_configuration_test" {
name = "build config test"
project_id = teamcity_project.build_config_project_test.id
templates = [ teamcity_build_config.build_configuration_template1.id, teamcity_build_config.build_configuration_template3.id ]
}
resource "teamcity_build_config" "build_configuration_template1" {
name = "build template 1"
is_template = true
project_id = teamcity_project.build_config_project_test.id
}
resource "teamcity_build_config" "build_configuration_template3" {
name = "build template 2"
is_template = true
project_id = teamcity_project.build_config_project_test.id
} Apply results in error.
Output of Terraform state at that point: I didn't gather the traces building against |
I've also tested building the provider against Would it be the case that we need to keep the resource there and do a 2-phase change?
|
Thanks for the updates @cvbarros! Terraform wasn't ever able to precisely track the destroy+update dependencies, so any correct destroy order was often accidental. The order now is well defined, but it seems it's not what is needed in this situation. I'll take a closer look here is to see if this a situation that can't be directly modeled by terraform, or just that the update destroy order has been set incorrectly. |
Some notes on this while it's fresh, The graph is connecting
This is the "natural" order, as it's equivalent to how it would be updated if That second point brings up an interesting situation, because this type of resource (of which it is not the only one in its class) cannot work if the dependency is being replaced. That would require 2 independent updates to Offhand I can think of a couple ways to model this, but none are great. There could be a separate If the resource is inexpensive and not heavily referenced, requiring replacement when As for fixing the issue as reported here. In this partial case of an update and a destroy, we may be able to conditionally reverse the edge, provided there are no cycles introduced; meaning there's no associated create node for the destroy, and the dependent is only a resource update, and there are no intermediary dependencies. That's not optimal from for a couple reasons, but mostly it makes the graph harder to understand for both maintainers and users. Having the flow go from I don't have a decision on the situation yet, but that should at least document what we're dealing with. |
I just remembered the other approach, making the This has never been an option for this situation, because |
Thanks for the swift response! As per the workaround suggestions:
Requiring a 2-step update is also fine workaround, I can document it. However, it feels clunky from usability perspective - like a limitation on the tool. I don't know, just feels counter-intuitive IMO.
This could be a nice approach. I'll try creating something like a
I didn't understand if this would be a suggested workaround or a future plan for avoiding it (that would require implementation) - I tested it with |
Sorry about the braindump there, there was a lot of extra info that was more for documentation of the issue than anything. After giving it some thought, I think the This takes a little effort in the provider resource, as you need to ensure that multiple instances can coexist during the update. If the resource has a field that must be unique, providers often generate a unique identifier for the resource, or append one to a configured prefix. You then just need to document that the resource must be Though as you've found out, this doesn't work for the simple destroy case where the resource is removed from the configuration, because the information only exists in the config. This I believe is something we can remedy, and provide a general solution rather than core trying to guess when the order needs to be reversed. In the meantime, if this is a common situation for you, I would suggest documenting a 2-step update as the workaround. |
I've confirmed that this the change is possible, and the order is enforced in the way that we want for this type of resource. However, because of the way this is interacting with |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
Terraform Version
Terraform Configuration Files
Debug Output
https://gist.github.com/cvbarros/cd508eaeb50c2a310fc2c1057ddc3832
Relevant extract:
Crash Output
Expected Behavior
Resource
teamcity_project.vcs_root_project
is a dependency of resourceteamcity_vcs_root_git.git_test
.Thus, the destroy should happen in the sequence
teamcity_vcs_root_git.git_test
->teamcity_project.vcs_root_project
Actual Behavior
teamcity_project.vcs_root_project
got deleted before the destroy forteamcity_vcs_root_git.git_test
got called.For this resource, when the project is deleted, the VCS Root also is removed. Then when Terraform calls destroy for
teamcity_vcs_root_git.git_test
, a 404 from upstream API occurs.Steps to Reproduce
The issue happens when running a "Import" acceptance test for the provider, both locally and on CI. First, it was detected on CI
Relevant test case:
Full Source
Additional Context
There seems to be some sort of race condition happening, as these failures are intermittent. Below is a gist containing traces, where the same test was ran with no changes (environment, code) and succeeds:
https://gist.github.com/cvbarros/97bd088ae8084c89d340fde2e9db54ea
Relevant extract:
References
The text was updated successfully, but these errors were encountered: