Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency ordering issue when aws security groups has multiple ingress CIDR blocks #670

Closed
spyrospph opened this issue Dec 15, 2014 · 36 comments
Labels
bug provider/aws waiting-response An issue/pull request is waiting for a response from the community

Comments

@spyrospph
Copy link

Hi,

The below code causes terraform plan to always switch the order of the cidr_blocks

resource "aws_security_group" "admins" {
    name        = "admins"
    vpc_id      = "${aws_vpc.some.id}"

    ingress {
        from_port   = 0
        to_port     = 65535
        protocol    = "tcp"
        cidr_blocks = ["10.0.0.1/32", "10.0.0.2/32", "10.0.0.3/32"]
    }

    ingress {
        from_port   = 0
        to_port     = 65535
        protocol    = "tcp"
        cidr_blocks = ["10.0.0.4/32", "10.0.0.5/32"]
    }
}

-S.

@armon
Copy link
Member

armon commented Dec 15, 2014

I think this is because AWS reorders them when it returns. We should probably treat them as a set and not a list to fix this.

@svanharmelen
Copy link
Contributor

@armon I tested this and that is exactly the case. But I was thinking if it would make sense to start using a set on all places where the API could return a differently ordered list then you have in your config.

Personally I was thinking to sort the list before reading/writing it from/to config/state. This would also prevent these kinds of issues... But not sure if that would be a good approach?

/CC @mitchellh

@armon
Copy link
Member

armon commented Dec 15, 2014

@svanharmelen Hmm sorting may be a better call for simple list of strings. My worry is that the switch to set is not backwards compatible.

@svanharmelen
Copy link
Contributor

@armon the config would stay the same and the state would be converted (automatically done by the existing code already) without any issues. But it would mean updating and tweaking all locations where this (moving from a list to a set) is applicable which could be a little more work to test and work through.

Next to that it would mean that the use cases for a list would actually become less clear as you had to take this ordering issue into account when creating the resource schema/code.

So I think just ordering the list (only when the elements are int's or strings) would be a pretty nice and relative clean solution...

@mitchellh
Copy link
Contributor

I think conceptually a set is correct. A sorted string list "works" and we've done that before, but a set "just works" because it is correct (you don't do any extra work).

As a user, there will be a backwards incompatible issue here, and that is something to think about as well. I think what @armon brought up today is important: we need a way to be able to make schema changes like this and migrate to the new version. But, that is the topic of another issue.

Given we don't have such a system, I'm supportive of @svanharmelen's idea of just sorting the strings...

@svanharmelen
Copy link
Contributor

I fully agree that conceptually a set is correct... So maybe we shouldn't go the route of sorting lists after all.

If your concerns are only about backwards compatibly, I don't see that issue. I did some quick tests and think there will be no impact.

Let me do a quick PoC (guess will be tomorrow) with the use case of issue #657 to confirm my thoughts.

@svanharmelen
Copy link
Contributor

@mitchellh @armon see PR #677 for an example which actually works flawless on an existing config and state file.

I tested this with this config:

resource "aws_route53_zone" "test" {
  name = "test.com"
}

resource "aws_route53_record" "flipflop" {
  zone_id = "${aws_route53_zone.test.zone_id}"
  name = "flipflop.${aws_route53_zone.test.name}"
  type = "A"
  ttl = "5"
  records = ["10.0.0.1","10.0.0.100"]
}

Which just keeps updating every time you apply, until you merge this PR and then its done updating. After a terraform refresh or terraform plan it will automatically have updated the state file as well (that's due to the way sets are handled now).

So there is absolutely no backwards compatibility issue here, it's just a few lines of code that need to be changed in the providers having this issue...

@pmoust
Copy link
Contributor

pmoust commented Dec 15, 2014

👍 let's go down that route.

@mitchellh
Copy link
Contributor

Cool! Working on that PR then (commenting).

@svanharmelen
Copy link
Contributor

Well... In the end this is really simple, but the fix will not be backwards compatible...

Since the AWS API merges all rules that have the same from_port, to_port and protocol into one single rule, I think the only solution is to change the hash function resourceAwsSecurityGroupIngressHash to:

func resourceAwsSecurityGroupIngressHash(v interface{}) int {
    var buf bytes.Buffer
    m := v.(map[string]interface{})
    buf.WriteString(fmt.Sprintf("%d-", m["from_port"].(int)))
    buf.WriteString(fmt.Sprintf("%d-", m["to_port"].(int)))
    buf.WriteString(fmt.Sprintf("%s-", m["protocol"].(string)))
    return hashcode.String(buf.String())
}

By doing this you are able to read back the info from AWS and match that info to your config. With the current hash function each rule can have the same values for from_port, to_port and protocol, but if the cidr_blocks are different they will be treated as different rules with their own hashes.

So if you then read back the info from AWS and get the info from those two rules back as one rule, there is no way TF can match that back to your config in a sane way.

But like I said, this would be a backwards incompatible change for the aws_security_group resource, so not sure if we want this? On the other hand, there needs to change something as the current implementation is just broken and needs to be fixed.

@mitchellh what do you think about this solution? Any other thoughts?

@pmoust
Copy link
Contributor

pmoust commented Jan 16, 2015

That would mean that we 'll only get a big diff on tfstate and all aws_security_group resources would need to be reapplied.

the breakage would be an issue if across an organisation different terraform cli versions are used, but then again the result would only be this ping pong between previous hash method and suggested current method producing a plan that would update all aws_security_groups.

Let's not wait for 0.4.0 please

@svanharmelen
Copy link
Contributor

Well... That... And if you do have multiple rules defined in a single resource that have the same from_port, to_port and protocol, you will only get one of those rules to be executed.

Once a certain hash is added to your set, the next time you try to add a value with the same hash it's ignored and not actually added to the set (which makes sense of course). So people having such a config will need to adjust their config by merging the separate rules into one rule per from_port / to_port / protocol combination.

@pmoust
Copy link
Contributor

pmoust commented Jan 16, 2015

Woah, good point. And that explains some other bug, I will post it later on.

@svanharmelen
Copy link
Contributor

@mitchellh would love to get your take on this one... Would be nice to be able to close this one, but I fail to see a solution without impact...

@brycekahle
Copy link
Contributor

Couldn't you just split out the return values from AWS API into a resource having cidr_blocks and a resource having security_groups per from_port, to_port, protocol tuple? That would be backwards compatible too, since it doesn't require changing the hash function.

@svanharmelen
Copy link
Contributor

@brycekahle not sure if I'm following you correctly, could you elaborate and maybe give a small pseudo example? At first side I don't see how you would want to do this without a change to either the schema or the hash function.

@brycekahle
Copy link
Contributor

@svanharmelen Since you can't create an ingress with both cidr_blocks and security_groups, it doesn't make sense to lump them together during the read function. I'm guessing you can create invalid EC2 API calls by doing this too since the expansion code doesn't take this into account.

@svanharmelen
Copy link
Contributor

@brycekahle I think I get your point, but as far as I can tell that isn't related to the issue we need to solve here.

The problem here is that the AWS API merges all rules that have the same from_port, to_port and protocol into one single rule. But the schema of the aws_security_group resource allows you to specify multiple rules that have the same from_port, to_port and protocol but maybe different cidr_blocks and/or security_groups.

And since these (the cidr_blocks and/or security_groups) are part of the hash this means TF will treat them as separate rules. So when you then process two of these kinds of rules and try to read them back from AWS to get your current state, this will always give a diff.

So even if you make it so that you can only specify either cidr_blocks or security_groups in one rule, you would still be able to have multiple rules with the same from_port, to_port and protocol combo and so you would still have this issue...

@brycekahle
Copy link
Contributor

@svanharmelen That all makes sense. I do think the current hash function is correct and doesn't need to be changed. I think the issue is the intermediate structures being generated from the config and from the EC2 API.

My understanding is the following would need to happen:

  1. ingress config rules can stay as-is, but when read from the config need to be grouped by from_port, to_port, and protocol. cidr_blocks and security_groups can be folded together (with sorting to prevent flip-flop), but not into a rule that contains both cidr_blocks and security_groups.
  2. Reading from the EC2 API needs to expand rules that contain both cidr_blocks and security_groups.

Perhaps there is something I am missing about how terraform internals work?

@svanharmelen
Copy link
Contributor

@brycekahle I did have quick a look at an approach like that a few ago, but I couldn't figure out a sane way to do that in the current code base.

I think quite a few things would need to be changed/refactored in the core components of TF and I (personally) don't think that would be worth it as this is the only resource (AFAIK) that has this kind of issue.

So in my opinion this is just an error in the resource schema definition and/or hash function...

@brycekahle
Copy link
Contributor

@svanharmelen I'll see if I can come up with a PR that only changes this resource provider.

@imkira
Copy link

imkira commented Jan 30, 2015

This actually happened to me but when using self and security_groups
I am leaving the link for reference: #893

@catsby catsby self-assigned this May 7, 2015
@catsby catsby removed their assignment May 19, 2015
@tamsky
Copy link
Contributor

tamsky commented May 19, 2015

Could the aws_security_group provider manage its own "private" metadata tags on each security_group object it works with?

Seems like that would allow the merged security_group objects replies from AWS CLI to be separated correctly.

Pseudo-code for the tag value creation:

  tags { "tf_aws_security_group__from_to_proto_sg_cidr" = 
      base64($self.from, "__", $self.to, "__", $self.proto, "__", $self.security_groups, "__", $self.cidr)
  }

@seedub
Copy link

seedub commented Sep 15, 2015

so is this fixed or what? im on .0.6.3 and i have the problem with aws security groups constantly changing during an apply, due to the ordering issue

@amontalban
Copy link

+1

@wstaples
Copy link

as of 0.6.6 this does not appear to be fixed. Is there something I can do to help?

@catsby
Copy link
Contributor

catsby commented Oct 29, 2015

Hello all –

I'm following up here after some time, apologies for the silence here.

This block does not trigger constant plan changes:

    ingress {
        from_port   = 0
        to_port     = 65535
        protocol    = "tcp"
        cidr_blocks = ["10.0.0.1/32", "10.0.0.5/32", "10.0.0.3/32", "10.0.0.2/32", "10.0.0.4/32"]
    }

The ordering of the cidr_blocks shouldn't matter.

The problem in the original message is two distinct ingress blocks, with identical ports and protocols. AWS will group these on their side (though the console may show them separately in some situations). In the API, however, they get grouped into one "rule".

The short of it is that two ingress blocks with identical ports and protocols should be considered an invalid configuration, though I don't know that we have a way of detecting or preventing that.

Is there a specific need to have them in separate blocks?

@catsby catsby added the waiting-response An issue/pull request is waiting for a response from the community label Oct 29, 2015
@brikis98
Copy link
Contributor

I'm on version 0.6.4 and seeing a similar issue. I have the following aws_security_group:

resource "aws_security_group" "web_server_security_group" {
  name = "web_server_security_group"
  description = "Inbound: HTTP, HTTPS, SSH. Outbound: All."

  ingress {
    from_port = 80
    to_port = 80
    protocol = "TCP"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port = 443
    to_port = 443
    protocol = "TCP"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port = 22
    to_port = 22
    protocol = "TCP"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

Every single time I run terraform plan or terraform apply, it shows a change to this security group:

> terraform plan

(...)

~ aws_security_group.web_server_security_group
    ingress.2214680975.cidr_blocks.#:     "1" => "0"
    ingress.2214680975.cidr_blocks.0:     "0.0.0.0/0" => ""
    ingress.2214680975.from_port:         "80" => "0"
    ingress.2214680975.protocol:          "tcp" => ""
    ingress.2214680975.security_groups.#: "0" => "0"
    ingress.2214680975.self:              "0" => "0"
    ingress.2214680975.to_port:           "80" => "0"
    ingress.2541437006.cidr_blocks.#:     "1" => "0"
    ingress.2541437006.cidr_blocks.0:     "0.0.0.0/0" => ""
    ingress.2541437006.from_port:         "22" => "0"
    ingress.2541437006.protocol:          "tcp" => ""
    ingress.2541437006.security_groups.#: "0" => "0"
    ingress.2541437006.self:              "0" => "0"
    ingress.2541437006.to_port:           "22" => "0"
    ingress.2617001939.cidr_blocks.#:     "1" => "0"
    ingress.2617001939.cidr_blocks.0:     "0.0.0.0/0" => ""
    ingress.2617001939.from_port:         "443" => "0"
    ingress.2617001939.protocol:          "tcp" => ""
    ingress.2617001939.security_groups.#: "0" => "0"
    ingress.2617001939.self:              "0" => "0"
    ingress.2617001939.to_port:           "443" => "0"
    ingress.3846162336.cidr_blocks.#:     "0" => "1"
    ingress.3846162336.cidr_blocks.0:     "" => "0.0.0.0/0"
    ingress.3846162336.from_port:         "" => "80"
    ingress.3846162336.protocol:          "" => "TCP"
    ingress.3846162336.security_groups.#: "0" => "0"
    ingress.3846162336.self:              "" => "0"
    ingress.3846162336.to_port:           "" => "80"
    ingress.4131774049.cidr_blocks.#:     "0" => "1"
    ingress.4131774049.cidr_blocks.0:     "" => "0.0.0.0/0"
    ingress.4131774049.from_port:         "" => "22"
    ingress.4131774049.protocol:          "" => "TCP"
    ingress.4131774049.security_groups.#: "0" => "0"
    ingress.4131774049.self:              "" => "0"
    ingress.4131774049.to_port:           "" => "22"
    ingress.4207073788.cidr_blocks.#:     "0" => "1"
    ingress.4207073788.cidr_blocks.0:     "" => "0.0.0.0/0"
    ingress.4207073788.from_port:         "" => "443"
    ingress.4207073788.protocol:          "" => "TCP"
    ingress.4207073788.security_groups.#: "0" => "0"
    ingress.4207073788.self:              "" => "0"
    ingress.4207073788.to_port:           "" => "443"

Is this an issue of having multiple ingress blocks with identical ports and protocols or something different?

@josh-padnick
Copy link

@brikis98 I ran into this issue as well, but with a different Terraform resource. The following fix resolved it me. Does this apply to your issue?

# EVERY TERRAFORM PLAN DECLARES A CHANGE
resource "aws_autoscaling_group" "nat" {
    name = "${var.vpc_name}-nat"
    launch_configuration = "${aws_launch_configuration.nat.name}"
    max_size = "${var.num_nat_instances}"
    min_size = "${var.num_nat_instances}"
    vpc_zone_identifier = ["${var.vpc_public_subnet_list}"]
}

# WORKS FINE
resource "aws_autoscaling_group" "nat" {
    name = "${var.vpc_name}-nat"
    launch_configuration = "${aws_launch_configuration.nat.name}"
    max_size = "${var.num_nat_instances}"
    min_size = "${var.num_nat_instances}"
    vpc_zone_identifier = ["${split(",",var.vpc_public_subnet_list)}"]
}

Note the difference in the vpc_zone_identifier. The vpc_public_subnet_list var is of the form subnet-123,subnet-456,subnet-789 and is passed around as a string.

@brikis98
Copy link
Contributor

brikis98 commented Nov 3, 2015

@josh-padnick: No, I don't think that's the issue. The aws_security_group.web_server_security_group I'm defining doesn't use any lists itself, and when I reference web_server_security_group in other places, I refer directly to its id, so there is no list to split:

resource "aws_launch_configuration" "ecs_launch_configuration" {
  name = "ecs_launch_configuration"
  security_groups = ["${aws_security_group.web_server_security_group.id}"]
  # ...
}

@scalp42
Copy link
Contributor

scalp42 commented Nov 3, 2015

This is related to #2843

There is an issue with "TCP" vs "tcp" @brikis98 as well, hopefully helps!

@brikis98
Copy link
Contributor

brikis98 commented Nov 3, 2015

@scalp42: You're right, it was the "TCP" vs "tcp" thing. Thank you!

@catsby
Copy link
Contributor

catsby commented Nov 5, 2015

I'm going to close this for now. This issue has expanded to mention 2 or 3 things, all of which I believe are resolved now. If not, please let me know or make a separate issue.

Thanks!

@catsby catsby closed this as completed Nov 5, 2015
@wr0ngway
Copy link

wr0ngway commented Feb 1, 2016

I'm still seeing this in 0.6.9 with a security group that has multiple ingress (from self and vpc cidr) for the same port/protocol (setting up a "default" sg for the peered vpc so all traffic from the primary is allowed)

resource "aws_security_group" "secondary-default" {
  name = "secondary-default"
  vpc_id = "${aws_vpc.secondary.id}"

  ingress {
    from_port   = "0"
    to_port     = "0"
    protocol    = "-1"
    cidr_blocks = ["${aws_vpc.primary.cidr_block}"]
  }

  ingress {
    from_port   = "0"
    to_port     = "0"
    protocol    = "-1"
    self        = true
  }

  egress {
    from_port   = "0"
    to_port     = "0"
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

}

@wr0ngway
Copy link

wr0ngway commented Feb 1, 2016

After digging some more I found #2843 and when I normalize my rule to have both self and the cidr, the diff comes back clean. The normalization rules should probably be added to the docs for aws_security_group if this can't be fixed

@ghost
Copy link

ghost commented Apr 28, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Apr 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug provider/aws waiting-response An issue/pull request is waiting for a response from the community
Projects
None yet
Development

No branches or pull requests