Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent order of environment variables in aws_ecs_task_definition #3035

Closed
s-maj opened this issue Jan 17, 2018 · 25 comments · Fixed by #11463
Closed

Inconsistent order of environment variables in aws_ecs_task_definition #3035

s-maj opened this issue Jan 17, 2018 · 25 comments · Fixed by #11463
Assignees
Labels
bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service.
Milestone

Comments

@s-maj
Copy link
Contributor

s-maj commented Jan 17, 2018

Folks,

It seems that order of environment variables is not being preserved. Also, mountPoints and volumesFrom are being showed in the plan, even when they're not defined in TF.

Large number of moving parts like environmental variables make working with big task definitions rather unpleasant :)

Terraform Version

Terraform v0.11.2

  • provider.aws v1.7.0

Affected Resource(s)

aws_ecs_task_definition

Steps to Reproduce

resource "aws_ecs_task_definition" "service2" {
  family                = "service"
  container_definitions = "${file("service.json")}"
}

service.json:

[{
	"cpu": 10,
	"environment": [
        {
          "name": "APP_VERSION",
          "value": "dev-26758"
        },
        {
          "name": "DATADOG_ENV",
          "value": "ee1-llll-pppppppp"
        },
        {
          "name": "DATADOG_PATCH_MODULES",
          "value": "celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true"
        },
        {
          "name": "DATADOG_SERVICE_NAME",
          "value": "translation-service"
        },
        {
          "name": "DATADOG_TRACE_AGENT_HOSTNAME",
          "value": "172.17.0.1"
        },
        {
          "name": "DATADOG_TRACE_AGENT_PORT",
          "value": "8126"
        },
        {
          "name": "DATADOG_TRACE_ENABLED",
          "value": "false"
        }
	],
	"essential": true,
	"image": "wordpress1",
	"memory": 500,
	"name": "wordpress",
	"portMappings": [{
			"containerPort": 80,
			"hostPort": 0,
			"protocol": "tcp"
		},
		{
			"containerPort": 81,
			"hostPort": 0,
			"protocol": "tcp"
		},
		{
			"containerPort": 82,
			"hostPort": 0,
			"protocol": "tcp"
		}
	]
}]

If I bump APP_VERSION from dev-26758 to dev-26760 I get:

-/+ aws_ecs_task_definition.service2 (new resource required)
      id:                    "service" => <computed> (forces new resource)
      arn:                   "arn:aws:ecs:eu-west-1:000000000000:task-definition/service:14" => <computed>
      container_definitions: "[{\"cpu\":10,\"environment\":[{\"name\":\"DATADOG_ENV\",\"value\":\"ee1-llll-pppppppp\"},{\"name\":\"DATADOG_TRACE_AGENT_HOSTNAME\",\"value\":\"172.17.0.1\"},{\"name\":\"DATADOG_SERVICE_NAME\",\"value\":\"translation-service\"},{\"name\":\"DATADOG_PATCH_MODULES\",\"value\":\"celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true\"},{\"name\":\"DATADOG_TRACE_AGENT_PORT\",\"value\":\"8126\"},{\"name\":\"DATADOG_TRACE_ENABLED\",\"value\":\"false\"},{\"name\":\"APP_VERSION\",\"value\":\"dev-26758\"}],\"essential\":true,\"image\":\"wordpress1\",\"memory\":500,\"mountPoints\":[],\"name\":\"wordpress\",\"portMappings\":[{\"containerPort\":80,\"hostPort\":0,\"protocol\":\"tcp\"},{\"containerPort\":81,\"hostPort\":0,\"protocol\":\"tcp\"},{\"containerPort\":82,\"hostPort\":0,\"protocol\":\"tcp\"}],\"volumesFrom\":[]}]" => "[{\"cpu\":10,\"environment\":[{\"name\":\"APP_VERSION\",\"value\":\"dev-26760\"},{\"name\":\"DATADOG_ENV\",\"value\":\"ee1-llll-pppppppp\"},{\"name\":\"DATADOG_PATCH_MODULES\",\"value\":\"celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true\"},{\"name\":\"DATADOG_SERVICE_NAME\",\"value\":\"translation-service\"},{\"name\":\"DATADOG_TRACE_AGENT_HOSTNAME\",\"value\":\"172.17.0.1\"},{\"name\":\"DATADOG_TRACE_AGENT_PORT\",\"value\":\"8126\"},{\"name\":\"DATADOG_TRACE_ENABLED\",\"value\":\"false\"}],\"essential\":true,\"image\":\"wordpress1\",\"memory\":500,\"name\":\"wordpress\",\"portMappings\":[{\"containerPort\":80,\"hostPort\":0,\"protocol\":\"tcp\"},{\"containerPort\":81,\"hostPort\":0,\"protocol\":\"tcp\"},{\"containerPort\":82,\"hostPort\":0,\"protocol\":\"tcp\"}]}]" (forces new resource)
      family:                "service" => "service"
      network_mode:          "" => <computed>
      revision:              "14" => <computed>

or in more human friendly form:

-/+ aws_ecs_task_definition.service2 (new resource required)
    id:                      "service" => "<computed>" (forces new resource)
    arn:                     "arn:aws:ecs:eu-west-1:000000000000:task-definition/service:14" => "<computed>"
    container_definitions:   [
                                {
                                  "cpu": 10,
                                  "environment": [
                                    {
                             +        "name": "APP_VERSION",
                             +        "value": "dev-26760"
                             +      },
                             +      {
                                      "name": "DATADOG_ENV",
                                      "value": "ee1-llll-pppppppp"
                                    },
                                    {
                             -        "name": "DATADOG_TRACE_AGENT_HOSTNAME",
                             -        "value": "172.17.0.1"
                             +        "name": "DATADOG_PATCH_MODULES",
                             +        "value": "celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true"
                                    },
                                    {
                                      "name": "DATADOG_SERVICE_NAME",
                                      "value": "translation-service"
                                    },
                                    {
                             -        "name": "DATADOG_PATCH_MODULES",
                             -        "value": "celery:true,elasticsearch:true,flask:true,httplib:true,mysql:true,redis:true,requests:true"
                             +        "name": "DATADOG_TRACE_AGENT_HOSTNAME",
                             +        "value": "172.17.0.1"
                                    },
                                    {
                                      "name": "DATADOG_TRACE_AGENT_PORT",
                                      "value": "8126"
                                    },
                                    {
                                      "name": "DATADOG_TRACE_ENABLED",
                                      "value": "false"
                             -      },
                             -      {
                             -        "name": "APP_VERSION",
                             -        "value": "dev-26758"
                                    }
                                  ],
                                  "essential": true,
                                  "image": "wordpress1",
                                  "memory": 500,
                             -    "mountPoints": [
                             -
                             -    ],
                                  "name": "wordpress",
                                  "portMappings": [
                                    {
                                      "containerPort": 80,
                                      "hostPort": 0,
                                    {
                                      "containerPort": 82,
                                      "hostPort": 0,
                                      "protocol": "tcp"
                                    }
                             -    ],
                             -    "volumesFrom": [
                             -
                                  ]
                                }
                              ] (forces new resource)
    network_mode:            "" => "<computed>"
    revision:                "14" => "<computed>"
@bflad bflad added the service/ecs Issues and PRs that pertain to the ecs service. label Jan 17, 2018
@i-ghost
Copy link

i-ghost commented Jan 19, 2018

It's possible to work around this by making sure your local task definition has the same order as stored on Amazon, which can be queried using the AWS cli (or just copy, paste, and sanitise the json that Terraform returns when planning). The order will remain the same until you add a new variable key-pair. Annoying, but it's the only way to avoid your plans getting junked.

@s-maj
Copy link
Contributor Author

s-maj commented Jan 19, 2018

There is nice a nice function https://github.com/terraform-providers/terraform-provider-aws/blob/fae04dfedfbd653d6a0bdbcc5d7c04f3d54e3048/aws/ecs_task_definition_equivalency.go#L56 used to reorder (diff suppression) task definition during diff but it's used only there. All values go raw to the state
https://github.com/terraform-providers/terraform-provider-aws/blob/fae04dfedfbd653d6a0bdbcc5d7c04f3d54e3048/aws/resource_aws_ecs_task_definition.go#L265
https://github.com/terraform-providers/terraform-provider-aws/blob/401dc017401659015fa0afdde5336b891c695fe6/aws/structure.go#L621

@radeksimko added task definition migration from sha to json (thanks million for that) plus further fixes. I was wondering if you could help us again to get this fixed :)

@radeksimko radeksimko added the bug Addresses a defect in current functionality. label Jan 25, 2018
@m13t
Copy link

m13t commented May 23, 2019

Hello,

Just wondering if there's any update on this? This is unfortunately causing our ECS Task Definition to be replaced every time despite zero changes.

The ordering once it gets to AWS certainly seems very random!

@devopsinfoltd
Copy link

Can we please have an update, we are having lots of issues with this bug @s-maj

@ctd
Copy link

ctd commented Jun 4, 2019

In the initial report, your example diff output shows:

  • values that have changed ("APP_VERSION")
  • values that have not changed, but appear to due to the ordering

Is this bug report related to the display when a value does change, or idempotence of subsequent plan/apply operations?

@m13t
Copy link

m13t commented Jun 4, 2019

@ctd, the original example isn’t great as some values have changed. However I can confirm that with TF 0.12 and latest AWS provider, the fact that AWS does not store the environment and secret variables in the order provided, results in Terraform wanting to change the order in the Terraform scripts.

@ctd
Copy link

ctd commented Jun 4, 2019

@m13t thanks for the clarification.

I'm new to this particular code, so please excuse any mistakes in my thinking.

From an initial skim, it looks like the local/remote JSON strings are unmarshaled, a few specific fields are normalised, then the data is marshaled to JSON for comparison. This happens inside the DiffSuppressFunc though, which might lead to some surprising diff output as the remarshalled JSON isn't used for display purposes.

Could you check your output for any other value changes between the local and remote task definition JSON values? I'll take a look at reproducing this issue on my end as well.

@m13t
Copy link

m13t commented Jun 4, 2019

@ctd, that is more or less my suspicion too. I’ll dig out the most complex task definition we have to see if there are any other fields that have this problem.

I guess if this was a native Terraform resource rather than being serialised to JSON, it would make use of the array item hash to ascertain whether or not it had just moved index rather than thinking it’s been removed and added.

@m13t
Copy link

m13t commented Jun 6, 2019

So it looks like there's also issues with volumes too:

      - volume {
          - name = "gocd_data" -> null

          - docker_volume_configuration {
              - autoprovision = false -> null
              - driver        = "local" -> null
              - driver_opts   = {
                  - "device" = ":/"
                  - "o"      = "addr=fs-xxxxxxxx.efs.eu-west-2.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
                  - "type"   = "nfs"
                } -> null
              - labels        = {} -> null
              - scope         = "task" -> null
            }
        }
      + volume {
          + name = "gocd_data"

          + docker_volume_configuration {
              + autoprovision = false
              + driver        = "local"
              + driver_opts   = {
                  + "device" = ":/"
                  + "o"      = "addr=fs-xxxxxxxx.efs.eu-west-2.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
                  + "type"   = "nfs"
                }
              + scope         = "task"
            }
        }
      - volume {
          - name = "gocd_home" -> null

          - docker_volume_configuration {
              - autoprovision = false -> null
              - driver        = "local" -> null
              - driver_opts   = {
                  - "device" = ":/"
                  - "o"      = "addr=fs-xxxxxxxx.efs.eu-west-2.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
                  - "type"   = "nfs"
                } -> null
              - labels        = {} -> null
              - scope         = "task" -> null
            }
        }
      + volume {
          + name = "gocd_home"

          + docker_volume_configuration {
              + autoprovision = false
              + driver        = "local"
              + driver_opts   = {
                  + "device" = ":/"
                  + "o"      = "addr=fs-xxxxxxxx.efs.eu-west-2.amazonaws.com,nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport"
                  + "type"   = "nfs"
                }
              + scope         = "task"
            }
        }

However, if there are no other changes the volumes don't show changes either. It appears that if there are other changes then the volumes get added to the diff also. i.e. if the container_definitions have changes, then so does volumes.

@ctd
Copy link

ctd commented Jun 10, 2019

I haven't been able to reproduce the idempotency issue using a very minimal ECS stub service (https://gist.github.com/ctd/bfba7b6b9e6c50d4cba0b4fe930597b2).

Are you able to confirm under what circumstances you're seeing a resource update/replacement - e.g. does running two subsequent apply operations result in an update on both? A full output log would be useful as well if you can provide one (obfuscated resource name/IDs are fine, of course).

@ctd
Copy link

ctd commented Jul 1, 2019

@m13t @s-maj

Just following up, is this still an issue for you? If so, can you provide any more details (such as in my last comment) to help me reproduce the problem?

@jonseymour
Copy link

Not saying this is the explanation for this case, but I have found similar problems arise if the terraform task definition contains two environment variables with the same name.

@tamsky
Copy link
Contributor

tamsky commented Jul 24, 2019

Suggestion:

Use the DEBUG output from the provider to determine the actual source of your diffs.

tl;dr summary:

Inconsistent order between environment objects is highly unlikely to be the cause of resource-level diffs.

diffs shown in plan output are not canonicalized.
Therefore, there can be visual differences in their element ordering.

my problem/fix summary:

I lacked default values for my healthcheck:s.
Adding the following values resolved my persistent diffs issue:

    interval: 30
    retries: 3
    timeout: 5

I'd suggest these missing values should not have caused diffs.
Perhaps there's an existing feature request?
If someone knows of one, please mention/link here.


Debug journal:

terraform: v0.11.14 with plugin.terraform-provider-aws_v2.20.0_x4:.

My first guess was that I had diffs because the value of this field is an opaque JSON string, and differences in object sort returned by the AWS API would be the cause of diffs.

Turns out, there's an entire go file dedicated to comparing ECS task definitions:

That looks good, and the tests look good; it appears to effectively test that un-ordered environment objects compare as "equal to" ordered environment objects.

The tested function is actually being called, so it's active code:

I'm now convinced object order is not my problem.

Now I want to see the canonicalized JSON and check it for diffs.

Checking TF_LOG=DEBUG output, I was able to see the canonicalized First: and Second: DEBUG values and was able to compare them for actual diffs.

No surprise, there were actual diffs.

My issue was due to not including any values for a few healthcheck: defaults which the AWS API happily inserts.

@nhooey
Copy link

nhooey commented Sep 5, 2019

Is there a plan to add a feature that lets the user say "Ignore the order of JSON lists for this particular JSON field value", so the AWS ENVIRONMENT JSON list doesn't get detected as a diff when AWS chooses a stupid, arbitrary order that is different from what was submitted by Terraform?

In my case, the ECS tasks are getting recreated every time because AWS re-orders JSON lists unpredictably, but the reordering is static.

It's hard to see what the course of action is from this ticket.

@tamsky
Copy link
Contributor

tamsky commented Sep 6, 2019

@nhooey In my last comment, I tried to explain that the order of ENVIRONMENT elements in terraform plan output is misleading.

The aws_provider is 100% properly canonicalizing all ENVIRONMENT variable lists before calculating diffs. I even link to the code where that happens.

At the same time, terraform plan output is not and does not canonicalize that struct before emitting it, giving an appearance of being the source of a diff, when, if only order is different, it is not a diff.

It is my strong opinion that there is something else in your ECS task resource's definition that is different.

Have you checked the provider's output under TF_LOG=DEBUG ?

@internetstaff
Copy link
Contributor

@tamsky is correct. However, for clarity, this should still be considered a bug since it renders a confusing diff - as evidenced by this thread. :)

@nhooey
Copy link

nhooey commented Sep 19, 2019

@tamsky, @internetstaff:

I haven't reviewed the debug output, but in my case I kept re-ordering the environment JSON list items in my resource "aws_ecs_task_definition" until running terraform plan said that the aws_ecs_task_definition didn't have to be replaced anymore.

All I changed was the ordering of the JSON list items. Once they were in some magic order, Terraform no longer wanted to replace the ECS task.

That seems like proof that Terraform is detecting and applying a diff when there conceptually isn't one.

@ctd
Copy link

ctd commented Sep 19, 2019

@nhooey Do you have any steps to reliably reproduce the issue you've described?

I either need a reliable way to reproduce (I've tried without success), or debug output of the issue occurring to have a better idea of what's going on.

As mentioned, there is a bug in the Terraform AWS Provider that may give deceptive output about the detected resource changes in a plan - at least until that is fixed we need to rely on the debugging output for determining if other issues exist.

@e-moshaya
Copy link

same issue here, ordering of the environment variables is different everytime

@remipichon
Copy link

remipichon commented Oct 22, 2019

It's far from ideal but with the help of terraform 0.12 you can re-order the environment variables to match the order given by AWS.

  1. create a Tf map with your env, in any order (because a map doesn't have an order)
locals {
  unordered_env  = {
      ENV_A = "${var.my_var}"
      ENV_1 = "value_for_1"
    }
}
  1. get the order from AWS, for a given task-definition run the following query
aws ecs describe-task-definition --task-definition $TASK_DEF | jq '[ .taskDefinition.containerDefinitions[0].environment[] | .name]'
  1. copy the result into
locals {
   aws_order = [ .... ]
}
  1. and the magic line which generate a list of maps in the right order
locals {
    environments_variables = [for order in local.aws_order : map(order, local.unordered_env[order])]
}
  1. then use that list to generate your task definition, the list looks like this
environments_variables = [
    {
      "ENV_A" = "."............
    },
    {
      "ENV_1" = "................."
    },
]
  1. (optional) it can be feed to the 0.11 old school template_file like this to to generate the json env list
data "template_file" "environments_variables" {
  template = " { \"name\" : \"$${name}\", \"value\": \"$${value}\" }"
  count    = length(var.environments_variables)
  vars = {
    value = "${element(values(var.environments_variables[count.index]), 0)}"
    name  = "${element(keys(var.environments_variables[count.index]), 0)}"
  }
}

data "template_file" "environments_variables_json" {
  template = "[  $${value}  ]"
  vars = {
    value = "${join(",", data.template_file.environments_variables.*.rendered)}"
  }
}

it doesn't work with 0.11 because the for each interpolation syntax is from 0.12 only

you need to re-run the aws cli command and update the aws_order list each time you add/remove an env

@piotrb
Copy link

piotrb commented Nov 21, 2019

This seems to have generally improved in the latest versions of the provider and TF itself .. but some improvements would be great ..

It doesn't seem like the ordering causes the task definitions to need to be updated any more ..

But when there is changes the diff is very nonsensical since it seems to be diffing against the raw order.

Could the provider not manage the order of the fields as they're coming from aws and when they're persisted to state? This would then just make the diff alphabetical and make the whole thing much more pleasant .. right now its a pain to actually analyze the changes in the variables ..

                  ~ environment       = [
                      - {
                          - name  = "AWS_REGION"
                          - value = "..."
                        },
                      - {
                          - name  = "PORT"
                          - value = "8001"
                        },
                      - {
                          - name  = "POD_SERVICE"
                          - value = "..."
                        },
                      - {
                          - name  = "POD_NAME"
                          - value = "..."
                        },
                        {
                            name  = "APP_NAME"
                            value = "..."
                        },
                      ~ {
                          ~ name  = "POD_ENVIRONMENT" -> "AWS_REGION"
                          ~ value = "..." -> "..."
                        },
                        {
                            name  = "LANG"
                            value = "en_US.UTF-8"
                        },
                      + {
                          + name  = "POD_ENVIRONMENT"
                          + value = "..."
                        },
                      + {
                          + name  = "POD_NAME"
                          + value = "..."
                        },
                      + {
                          + name  = "POD_SERVICE"
                          + value = "..."
                        },
                      + {
                          + name  = "PORT"
                          + value = "..."
                        },
                        {
                            name  = "SSM_PATHS"
                            value = "..."
                        },
                    ]

What was my actual change here? NONE .. I didn't add any env .. I made other changes on the task definition ...

@jbergknoff-rival
Copy link
Contributor

@piotrb yours is the only discussion I found of this problem. Just so you know, #11463 should fix that.

@bflad
Copy link
Contributor

bflad commented Jun 24, 2020

Improvements to this difference handling have been merged and will release with version 2.68.0 of the Terraform AWS Provider, later this week. Thanks to @jbergknoff-rival for the implementation. 👍

@ghost
Copy link

ghost commented Jun 26, 2020

This has been released in version 2.68.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks!

@ghost
Copy link

ghost commented Jul 24, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Jul 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ecs Issues and PRs that pertain to the ecs service.
Projects
None yet