Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

depends_on always triggers data source read #11806

Closed
queeno opened this issue Feb 9, 2017 · 30 comments · Fixed by #24904
Closed

depends_on always triggers data source read #11806

queeno opened this issue Feb 9, 2017 · 30 comments · Fixed by #24904
Labels
bug config core v0.9 Issues (primarily bugs) reported against v0.9 releases v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases v0.12 Issues (primarily bugs) reported against v0.12 releases
Milestone

Comments

@queeno
Copy link

queeno commented Feb 9, 2017

Hi there,

Terraform Version

Terraform v0.8.4

Affected Resource(s)

  • data.template_file

If this issue appears to affect multiple resources, it may be an issue with Terraform's core, so please mention this.

Terraform Configuration Files

data "template_file" "hello" {
    template = "template"
    depends_on = ["null_resource.hello"]
}

resource "null_resource" "hello" {}

Debug Output

https://gist.github.com/queeno/f198c93760f7c60e5102e7fd5873ad1f

Expected Behavior

terraform plan shouldn't re-read the data source.

Actual Behavior

terraform plan re-reads the data source.

❯ terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but
will not be persisted to local or remote state storage.

null_resource.hello: Refreshing state... (ID: 3983604783330107355)

The Terraform execution plan has been generated and is shown below.
Resources are shown in alphabetical order for quick scanning. Green resources
will be created (or destroyed and then created if an existing resource
exists), yellow resources are being changed in-place, and red resources
will be destroyed. Cyan entries are data sources to be read.

Note: You didn't specify an "-out" parameter to save this plan, so when
"apply" is called, Terraform can't guarantee this is what will execute.

<= data.template_file.hello
    rendered: "<computed>"
    template: "template"


Plan: 0 to add, 0 to change, 0 to destroy.

Steps to Reproduce

terraform plan

Important Factoids

When using depends_on in template_file, terraform plan always seems to re-read the data source. If the data source is used by an instance's user-data, terraform plans to change the instance's user-data. terraform apply, however, doesn't produce any change.

If depends_on is not used, then the data source is not re-read.

@apparentlymart
Copy link
Contributor

Hi @queeno! Thanks for this issue.

I seem to remember this being done intentionally to fix a bug a while ago. depends_on presents a rather-awkward situation for data sources because they are defined as being refreshed early when they have no computed data, but depends_on means Terraform can't tell what they might be depending on.

There is possibly something more refined we could do here to make this support more cases, but unfortunately I think right now this is working as designed and we don't really have a better strategy in mind.

@queeno
Copy link
Author

queeno commented Feb 9, 2017

Hi @apparentlymart

Thanks for your very quick response!

That makes sense and wouldn't be too worrying, however as I mentioned earlier, when using the data source in an instance's user-data, terraform plan constantly shows the user-data changing.

terraform apply doesn't do anything as expected.

If you could do anything to fix this behaviour, it would be hugely appreciated 👍 😄

Have a look:


<= module.tw_instance.data.template_file.cloud-config.0
    rendered:                "<computed>"
    template:                "(my template)"
    vars.%:                  "2"
    vars.etcd_discovery_url: "https://discovery.etcd.io/ddd"
    vars.region:             "europe-west-1"

<= module.tw_instance.data.template_file.cloud-config.1
    rendered:                "<computed>"
    template:                "(my template)"
    vars.%:                  "2"
    vars.etcd_discovery_url: "https://discovery.etcd.io/ddd"
    vars.region:             "europe-west-1"

<= module.tw_instance.data.template_file.cloud-config.2
    rendered:                "<computed>"
    template:                "(my template)"
    vars.%:                  "2"
    vars.etcd_discovery_url: "https://discovery.etcd.io/ddd"
    vars.region:             "europe-west-1"

~ module.tw_instance.google_compute_instance.myvm.0
    metadata.%:         "" => "<computed>"
    metadata.sshKeys:   "(my key)" => ""
    metadata.user-data: "(my template)" => ""

~ module.tw_instance.google_compute_instance.myvm.1
    metadata.%:         "" => "<computed>"
    metadata.sshKeys:   "(my key)" => ""
    metadata.user-data: "(my template)" => ""

~ module.tw_instance.google_compute_instance.myvm.2
    metadata.%:         "" => "<computed>"
    metadata.sshKeys:   "(my key)" => ""
    metadata.user-data: "(my template)" => ""

Plan: 0 to add, 3 to change, 0 to destroy.

@apparentlymart
Copy link
Contributor

@queeno as a workaround I suggest that you add a triggers map you your null_resource with some fixed value inside and then interpolate it into an unused value in the vars block on the template_file.

That should then make Terraform see the dependency via the interpolation, allowing you to remove the depends_on and bypass this.

That is assuming that wasn't just a contrived example for the bug report... If it was, I'm sure it's possible to adapt this workaround to whatever resources you are really using... just interpolate anything from the resource you want to depend on into a template variable. Most resources have a reasonable id attribute you can interpolate to achieve this.

@queeno
Copy link
Author

queeno commented Feb 9, 2017

Hi @apparentlymart

Great advice! I have followed your suggestion, removed the explicit dependency and had it all working as intended. I hope this issue can also help others in my same situation, while you fix the actual bug in terraform.

Yes, sorry it was a contrived example for the bug report. This is the actual code I am working on:

https://github.com/queeno/infra-problem/blob/master/terraform/modules/tw-instance/main.tf#L24

Thanks again for your help! 👍

@mitchellh
Copy link
Contributor

Data sources are always refreshed during refresh. However, if the upstream things it depends on haven't changed or are available, we should not refresh the data source or show it in the plan. Definitely a bug in my view. Thanks.

@jbardin
Copy link
Member

jbardin commented Feb 10, 2017

I agree. The change that @apparentlymart is referring to is #10670 which is only intended to prevent early evaluation when there is an explicit dependency.

I think this can be made to work with depends_on.

@apparentlymart
Copy link
Contributor

Perhaps a suitable compromise would be this:

If a data block has a depends_on, ignore it during the refresh walk and then during the plan walk look up the nodes being depended on and generate the data source refresh diff only if at least one of them has any sort of diff in the plan so far.

Perhaps this is trickier than it seems though, if e.g. the dependencies are indirect via nodes that don't themselves generate diffs (modules, for example).

@queeno
Copy link
Author

queeno commented Feb 11, 2017

I'm just throwing it there to you, but why would the behaviour of an implicit dependency be different from an explicit dependency?

In the previous example, if i reference the null_resource implicitly, by adding a variable in the vars section of the data resource, this doesn't produce a refresh diff.

When I explicitly set the dependency, by using the depends_on attribute, then I'll see the refresh diff. Please notice here that the null_resource is never triggered and never changes.

@apparentlymart
Copy link
Contributor

@queeno the problem is that with an interpolation Terraform can tell the difference between the value not being available yet and it being available in the state from a previous run. With depends_on it cannot, because there is no specific value check -- Terraform knows that something about the dependency affects the outcome of the datasource, but it can't tell what, so it just pessimistically assumes that we must always process the dependency resource first, because that's the safest and most conservative behavior.

My proposed compromise tries to get around that by using the presence of any diff on the dependency as a signal that the data source should be re-run, thus allowing us to mimic the convergent behavior of an interpolation-dependency where it'll trigger the read only when there's a create or update (of any attribute) on the things it depends on.

@mitchellh
Copy link
Contributor

@apparentlymart It has to be present in the diff because we need it to be present in the diff for Apply to do anything to downstream nodes that may depend on it. If we don't put the data source in the diff then its outputs won't be computed which won't trigger downstream normal resources to be in the diff and so on...

@johnrengelman
Copy link
Contributor

I'm also running into this currently because I was trying to use datasources to provide a module<->module dependency in a test framework. I'm setting up a test for a module which has some prereqs but I want it all deployed as 1 project in the test framework.

When I add a depends_on to a datasource then subsequent plans always show change to be made.

@apparentlymart
Copy link
Contributor

Sorry for the long silence here, everyone!

We've been looking at this issue again recently, and I've written up #17034 as a proposal for one way to address it. This proposal builds on the discussion above, and attempts to also deal with some other similar quirks with implicit dependencies from data resources.

It'll take some more prototyping to see if that proposal is workable, since there are undoubtedly some subtleties that we didn't consider yet. We won't be able to work on this immediately due to other work in progress, but we do intend to address this.

@swetli
Copy link

swetli commented Dec 10, 2018

I had a similar issue with data source local_file . In general I had null_resource which created file foo, and a data source that was reading the contents of this file in terraform. Whenever I added depends_on the local_file - it always triggered computed content of it. In order to workaround that I used this:

resource "null_resource" "create_file" {

}

data "null_data_source" "file" {
inputs = {
data = "${file("foo")}"
null = "${format(null_resource.create_file.id)}"
}
}

Later we can reference the data by: ${data.null_data_source.file.outputs["data"]}

thefirstofthe300 added a commit to thefirstofthe300/terraform-google-project-factory that referenced this issue Feb 28, 2019
The default service account data resource currently uses a depends_on
flag added to prevent a race condition in
terraform-google-modules#141

Due to the way that Terraform refreshes data resources, Terraform thinks
that the data resource has changed when in actuality it hasn't:
hashicorp/terraform#11806 (comment)

By changing to use a null data resource that interpolates the default
service account email, the data resource will only change when the project
number does.
thefirstofthe300 added a commit to thefirstofthe300/terraform-google-project-factory that referenced this issue Feb 28, 2019
The default service account data resource currently uses a depends_on
flag added to prevent a race condition in
terraform-google-modules#141

Due to the way that Terraform refreshes data resources, Terraform thinks
that the data resource has changed when in actuality it hasn't:
hashicorp/terraform#11806 (comment)

By changing to use a null data resource that interpolates the default
service account email, the data resource will only change when the project
number does.
@trebidav
Copy link

Terraform v0.11.13

I am having the same issue as mentioned here. Unfortunately looks like there is no workaround for my use case at this moment. Or.. any ideas?

resource "aws_ecs_task_definition" "application" {
  ...
}

data "aws_ecs_task_definition" "application" {
  task_definition = "${aws_ecs_task_definition.application.family}"
  depends_on = ["aws_ecs_task_definition.application"]
}

resource "aws_ecs_service" "application" {
  task_definition = "${aws_ecs_task_definition.application.family}:${max("${aws_ecs_task_definition.application.revision}", "${data.aws_ecs_task_definition.application.revision}")}"
  ...
}

Plan:

 <= module.celery.data.aws_ecs_task_definition.application
      id:              <computed>
      family:          <computed>
      network_mode:    <computed>
      revision:        <computed>
      status:          <computed>
      task_definition: "task_name"
      task_role_arn:   <computed>

  ~ module.celery.aws_ecs_service.application
      task_definition: "task_name:5" => "${aws_ecs_task_definition.application.family}:${max(\"${aws_ecs_task_definition.application.revision}\", \"${data.aws_ecs_task_definition.application.revision}\")}"

Plan: 0 to add, 1 to change, 0 to destroy.

Output:

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

@soumitmishra
Copy link

+1

@apparentlymart
Copy link
Contributor

@trebidav if you're reading that task definition immediately after creating it just to get the revision value from it, I'd suggest looking to see if that same attribute is exported from the aws_ecs_task_definition resource type, and if not to open a feature request for the AWS provider for it to be. There should rarely be any reason to both create something and then read it with a data source in the same module.

With that said, the depends_on is redundant in your configuration in any case. You can safely remove depends_on from the data block without changing behavior, because the reference in the task_definition argument already implies that same dependency.


To everyone else: leaving "+1" or 👍 comments here doesn't do anything except create noise for those who are watching this issue for updates. If you want to vote for this issue, please leave a 👍 reaction on the original comment on this issue (not this comment!), since then we can query that as an input for prioritization.

@sanchetanparmar
Copy link

I am having same issue with latest version. tried with null_resource as well but after null resource its stuck with data. could not find resource. My Code is here

@hashibot hashibot added config v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases v0.12 Issues (primarily bugs) reported against v0.12 releases v0.9 Issues (primarily bugs) reported against v0.9 releases labels Aug 29, 2019
@masterjg
Copy link

masterjg commented Oct 28, 2019

I do not know for sure if this is related but when I do not set depends_on in aws_network_interfaces datasource terraform doesn't find anything:

data "aws_network_interfaces" "lb" {
  filter {
    name = "description"
    values = [
      "ELB net/${aws_lb.ec2_service.name}/*"
    ]
  }
  filter {
    name = "vpc-id"
    values = [
      var.vpc_id
    ]
  }
  filter {
    name = "status"
    values = [
      "in-use"
    ]
  }
  filter {
    name = "attachment.status"
    values = [
      "attached"
    ]
  }
}

Since I am referring to aws_lb.ec2_service.name it should automatically wait for aws_lb resource but it doesn't for some reason... However if I add depends_on it waits for resource but will trigger updates to dependent resources during each apply...

@uhinze
Copy link

uhinze commented Jan 22, 2020

Just stumbled over this and just wanted to share my (hacky but not overwhelmingly complex) solution.

I basically take the ID of the null_resource and put it into some attribute in the data provider, reducing it to length 0 so it doesn't affect the actual attribute value. Like so:

resource "null_resource" "foo" {}
data "kubernetes_secret" "bar" {
  metadata {
    name      = "baz${replace(null_resource.foo.id, "/.*/", "")}"
  }
}

This should work with any data provider I think.

@evgenibi
Copy link

Same issue here,

Trying to get the aws_route_tables data with a specific filter and it just reads it before creating the resources resulting in:

The "count" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the count depends

@hakro
Copy link

hakro commented Apr 22, 2020

I have a similar issue too.
I need to create an AWS Secret, then trigger a lambda invocation using data.aws_lambda_invocation.my_lambda.

But the Lambda needs to be invoked only when the secret has been created, so I added depends_on = [aws_secretsmanager_secret_version.my_secret]
Otherwise the lambda invocation would fail, since it would reference a secret that doesn't exist yet.
But adding the 'depends_on' show on the plan :

Screenshot_2020-04-22_17-23-45

Even though there is nothing to create, change, or destroy

@hfgbarrigas
Copy link

hfgbarrigas commented May 8, 2020

Stumbled on something similar to this one unfortunately.

resource "time_rotating" "rsa1" {
  rotation_minutes = 1
}

resource "null_resource" "create-local-file" {
  //the file name will be the rotation timestamp
  triggers = {
    timestamp = time_rotating.rsa1.unix
  }
}

data "template_file" "loca-file-contents" {
  template = file("${path.module}/${time_rotating.rsa1.unix}")
  vars = {
    id = null_resource.rsa1.id
  }
}

Basically, I want to create a local a file with a fixed rotation.
On the first run everything goes smoothly because the file gets created, but, on subsequent runs, the file doesn't get created because the rotation period has not been reached thus the datasource for the file content shouldn't be considered due to dependencies having no changes correct?

FIRST RUN:
image

SECOND RUN (with file function and not removing the files):
image

THIRD RUN (with file function and removing the files):
image

It appears that although dependencies have no changes terraform still refreshes the datasource that depends on a file that will never exist until there are changes.

@apparentlymart
Copy link
Contributor

Hi @hfgbarrigas,

The behavior you saw there is as intended, because the file function is for reading files that exist statically as part of the configuration, not for files that are generated dynamically during a Terraform run. Terraform reads the file proactively during initial configuration decoding so that it can use the result as part of static validation.

Although I'd recommend avoiding generating local files on disk if you can, in unusual situations where you can't avoid it you can read the contents of a file using [the local_file data source] instead, which (because it's a data source, rather than an intrinsic function) will take its action during the graph walk, not during initial configuration loading.

resource "time_rotating" "rsa1" {
  rotation_minutes = 1
}

resource "null_resource" "create_template_file" {
  triggers = {
    filename = time_rotating.rsa1.unix
  }
  provisioner "local-exec" {
    # I added this because I assume you're intending to run
    # a local command to generate this file. You can refer
    # to self.triggers.filename in the provisioner configuration
    # to get the filename populated above.
  }
}

data "local_file" "generated_template" {
  filename = "${path.module}/${null_resource.create_template_file.triggers.filename}"
}

data "template_file" "result" {
  template = data.local_file.generated_template.content
  vars = {
    id = null_resource.rsa1.id
  }
}

If you have any follow-up questions about that, please feel free to create an issue in the community forum and I can follow up with you there.

@hfgbarrigas
Copy link

Hi @apparentlymart , you're right regarding the function, completely missed that documentation ... Although, I had tried the snippet you shared and had the same issue when it comes to refresh state and local files. Terraform seems to need the local file when refreshing state even though dependencies have no changes. Is this intended? Here's a snippet to reproduce.

data "local_file" "generated_template" {
  filename = "${path.module}/${null_resource.file.triggers.timestamp}"
}

data "template_file" "result" {
  template = data.local_file.generated_template.content
  vars = {
    id = null_resource.file.id
  }
}

resource "null_resource" "file" {
  provisioner "local-exec" {
    command = "echo test > $PATH/$NAME"
    environment = {
      PATH = path.module
      NAME = self.triggers.timestamp
    }
  }
  triggers = {
    timestamp = time_rotating.test.unix
  }
}

resource "time_rotating" "test" {
  rotation_minutes = 1
}

output "test" {
  value = data.template_file.result.rendered
}

First apply is ok, second apply with file still ok, third apply without file present is not ok. All of these applies were ran under a minute to avoid the time rotation.

@danieldreier danieldreier added this to the v0.13.0 milestone May 21, 2020
@antoniogomezalvarado
Copy link

antoniogomezalvarado commented May 24, 2020

I know this is closed but trying out my luck here with all the great minds out there. I'm trying to set the DNS record of an EMR master instance once its finished creation with the following

data aws_instance "hive_emr_master" {

  count = "${aws_emr_cluster.hive_cluster.count == 1 ? 1 : 0}"

  filter {
    name   = "tag:aws:elasticmapreduce:instance-group-role"
    values = ["MASTER"]
  }

  filter {
    name   = "tag:Service"
    values = ["Hive"]
  }
  
  filter {
    name   = "tag:Env"
    values = ["${var.tier}"]
  }
}

resource "aws_route53_record" "hive_master_dns_record" {
  zone_id = "${var.resources["rds.hive.metastore.route53.zone.id"]}"
  name    = "SOME_NAME"
  type    = "A"
  ttl     = "300"
  records = ["${data.aws_instance.hive_emr_master.private_ip}"]

  depends_on = [
    "data.aws_instance.hive_emr_master",
    "aws_emr_cluster.hive_cluster"
  ]
}

However the depends_on as you've already encountered keeps changing the DNS record on every plan/apply. Is there a way to trigger the data block once the cluster has finished creating (not using depends_on obviously)? The above will work only for sequential runs, which means I have to trigger apply once again in order for this to work.

Thanks in advance for your time!

@ghost
Copy link

ghost commented Jun 20, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Jun 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug config core v0.9 Issues (primarily bugs) reported against v0.9 releases v0.10 Issues (primarily bugs) reported against v0.10 releases v0.11 Issues (primarily bugs) reported against v0.11 releases v0.12 Issues (primarily bugs) reported against v0.12 releases
Projects
None yet
Development

Successfully merging a pull request may close this issue.