Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS ALB http/https listener creation/destruction unstable and caused errors for dependencies #2456

Closed
hashibot opened this issue Nov 28, 2017 · 12 comments · Fixed by #5167
Closed
Labels
bug Addresses a defect in current functionality. service/elbv2 Issues and PRs that pertain to the elbv2 service.
Milestone

Comments

@hashibot
Copy link

This issue was originally opened by @dohoangkhiem as hashicorp/terraform#16779. It was migrated here as a result of the provider split. The original body of the issue is below.


Hi there,

Recently we've found out the creation/destruction of our ALB http/https (especially https one) listener become very unstable, it's very common (but not reproducible every time) that it failed the first time (with error described below) - the symptom is like the aws_alb_listener resource is created but ARN is not recorded in state - that caused failure for dependent resources like aws_alb_listener_rule, or it's destroyed during terraform destroy but is somehow not completely gone so aws_alb_target_group deletion failed (as target group is in-use by the listener).

We don't get these errors every time but it's increasingly happening recently and a small test with just few resources regarding ALB and running several apply and destroy continuously (like terraform destroy -force && terraform apply && terraform destroy -force && terraform apply) would occasionally produce such errors (with our real production code which is much more complex the errors happened more often):

here is the TF configuration for test

variable "domain_name" {
  default = "int.mytest.com"
}

variable "ssl_policy" {
  default = "ELBSecurityPolicy-2016-08"
}

data "aws_acm_certificate" "mgnl_certificate" {
  domain = "*.${var.domain_name}"
}

resource "aws_alb" "alb" {
  name = "khiem-test-alb"
  internal = false
  security_groups = ["sg-27cfa641"]
  subnets = ["subnet-d0aa1fb7", "subnet-c7c51e8e"]

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_target_group" "author_target_group" {
  name = "khiem-author-target-group"
  port = 8080
  protocol = "HTTP"
  vpc_id   = "vpc-72d23715"

  health_check = {
    protocol = "HTTP"
    path = "/.healthcheck/"
    port = 8080
    healthy_threshold = 5
    unhealthy_threshold = 2
    timeout = 5
    interval = 30
    matcher = "200"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_target_group_attachment" "author_target_group_att" {
  target_group_arn = "${aws_alb_target_group.author_target_group.arn}"
  target_id = "i-0285315cd59a13c17"
  port = 8080

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_target_group" "public_target_group" {
  name = "khiem-public-target-group"
  port = 8080
  protocol = "HTTP"
  vpc_id   = "vpc-72d23715"

  health_check = {
    protocol = "HTTP"
    path = "/.healthcheck/"
    port = 8080
    healthy_threshold = 5
    unhealthy_threshold = 2
    timeout = 5
    interval = 30
    matcher = "200"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_target_group_attachment" "public_target_group_att" {
  target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
  target_id = "i-0285315cd59a13c17"
  port = 8080

  lifecycle {
    create_before_destroy = true
  }
}

# http listener
resource "aws_alb_listener" "alb_http_listener" {
  load_balancer_arn = "${aws_alb.alb.arn}"
  port = "80"
  protocol = "HTTP"

  default_action {
    target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
    type             = "forward"
  }

  lifecycle {
    create_before_destroy = true
  }
}

# http listener rules
resource "aws_alb_listener_rule" "alb_http_public_rule" {
  listener_arn = "${aws_alb_listener.alb_http_listener.arn}"
  priority = 100

  action {
    type = "forward"
    target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
  }

  condition {
    field = "host-header"
    values = ["public-khiem.${var.domain_name}"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_listener_rule" "alb_http_author_rule" {
  listener_arn = "${aws_alb_listener.alb_http_listener.arn}"
  priority = 99

  action {
    type = "forward"
    target_group_arn = "${aws_alb_target_group.author_target_group.arn}"
  }

  condition {
    field = "host-header"
    values = ["author-khiem.${var.domain_name}"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

# https listener
resource "aws_alb_listener" "alb_https_listener" {
  load_balancer_arn = "${aws_alb.alb.arn}"
  port = "443"
  protocol = "HTTPS"

  ssl_policy        = "${var.ssl_policy}"
  certificate_arn   = "${data.aws_acm_certificate.mgnl_certificate.arn}"

  default_action {
    target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
    type             = "forward"
  }

  lifecycle {
    create_before_destroy = true
  }
}

# https listener rules
resource "aws_alb_listener_rule" "alb_https_public_rule" {
  listener_arn = "${aws_alb_listener.alb_https_listener.arn}"
  priority = 100

  action {
    type = "forward"
    target_group_arn = "${aws_alb_target_group.public_target_group.arn}"
  }

  condition {
    field = "host-header"
    values = ["public-khiem.${var.domain_name}"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_alb_listener_rule" "alb_https_author_rule" {
  listener_arn = "${aws_alb_listener.alb_https_listener.arn}"
  priority = 99

  action {
    type = "forward"
    target_group_arn = "${aws_alb_target_group.author_target_group.arn}"
  }

  condition {
    field = "host-header"
    values = ["author-khiem.${var.domain_name}"]
  }

  lifecycle {
    create_before_destroy = true
  }
}

Apply Error

data.aws_acm_certificate.mgnl_certificate: Refreshing state...
aws_alb_target_group.author_target_group: Creating...
  arn:                                "" => "<computed>"
  arn_suffix:                         "" => "<computed>"
  deregistration_delay:               "" => "300"
  health_check.#:                     "" => "1"
  health_check.0.healthy_threshold:   "" => "5"
  health_check.0.interval:            "" => "30"
  health_check.0.matcher:             "" => "200"
  health_check.0.path:                "" => "/.healthcheck/"
  health_check.0.port:                "" => "8080"
  health_check.0.protocol:            "" => "HTTP"
  health_check.0.timeout:             "" => "5"
  health_check.0.unhealthy_threshold: "" => "2"
  name:                               "" => "khiem-author-target-group"
  port:                               "" => "8080"
  protocol:                           "" => "HTTP"
  stickiness.#:                       "" => "<computed>"
  target_type:                        "" => "instance"
  vpc_id:                             "" => "vpc-72d23715"
aws_alb_target_group.public_target_group: Creating...
  arn:                                "" => "<computed>"
  arn_suffix:                         "" => "<computed>"
  deregistration_delay:               "" => "300"
  health_check.#:                     "" => "1"
  health_check.0.healthy_threshold:   "" => "5"
  health_check.0.interval:            "" => "30"
  health_check.0.matcher:             "" => "200"
  health_check.0.path:                "" => "/.healthcheck/"
  health_check.0.port:                "" => "8080"
  health_check.0.protocol:            "" => "HTTP"
  health_check.0.timeout:             "" => "5"
  health_check.0.unhealthy_threshold: "" => "2"
  name:                               "" => "khiem-public-target-group"
  port:                               "" => "8080"
  protocol:                           "" => "HTTP"
  stickiness.#:                       "" => "<computed>"
  target_type:                        "" => "instance"
  vpc_id:                             "" => "vpc-72d23715"
aws_alb.alb: Creating...
  access_logs.#:              "" => "<computed>"
  arn:                        "" => "<computed>"
  arn_suffix:                 "" => "<computed>"
  dns_name:                   "" => "<computed>"
  enable_deletion_protection: "" => "false"
  idle_timeout:               "" => "60"
  internal:                   "" => "false"
  ip_address_type:            "" => "<computed>"
  load_balancer_type:         "" => "application"
  name:                       "" => "khiem-test-alb"
  security_groups.#:          "" => "1"
  security_groups.930362799:  "" => "sg-27cfa641"
  subnets.#:                  "" => "2"
  subnets.1419775440:         "" => "subnet-c7c51e8e"
  subnets.3706636568:         "" => "subnet-d0aa1fb7"
  vpc_id:                     "" => "<computed>"
  zone_id:                    "" => "<computed>"
aws_alb_target_group.public_target_group: Creation complete after 1s (ID: arn:aws:elasticloadbalancing:ap-southea...m-public-target-group/8c83c5482782160c)
aws_alb_target_group_attachment.public_target_group_att: Creating...
  port:             "" => "8080"
  target_group_arn: "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:targetgroup/khiem-public-target-group/8c83c5482782160c"
  target_id:        "" => "i-0285315cd59a13c17"
aws_alb_target_group.author_target_group: Creation complete after 1s (ID: arn:aws:elasticloadbalancing:ap-southea...m-author-target-group/e600a57f2882299b)
aws_alb_target_group_attachment.author_target_group_att: Creating...
  port:             "" => "8080"
  target_group_arn: "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:targetgroup/khiem-author-target-group/e600a57f2882299b"
  target_id:        "" => "i-0285315cd59a13c17"
aws_alb_target_group_attachment.public_target_group_att: Creation complete after 0s (ID: arn:aws:elasticloadbalancing:ap-southea...5482782160c-20171128161046303200000001)
aws_alb_target_group_attachment.author_target_group_att: Creation complete after 0s (ID: arn:aws:elasticloadbalancing:ap-southea...57f2882299b-20171128161046332800000002)
aws_alb.alb: Still creating... (10s elapsed)
aws_alb.alb: Still creating... (20s elapsed)
aws_alb.alb: Still creating... (30s elapsed)
aws_alb.alb: Still creating... (40s elapsed)
aws_alb.alb: Still creating... (50s elapsed)
aws_alb.alb: Still creating... (1m0s elapsed)
aws_alb.alb: Still creating... (1m10s elapsed)
aws_alb.alb: Still creating... (1m20s elapsed)
aws_alb.alb: Still creating... (1m30s elapsed)
aws_alb.alb: Still creating... (1m40s elapsed)
aws_alb.alb: Still creating... (1m50s elapsed)
aws_alb.alb: Still creating... (2m0s elapsed)
teraws_alb.alb: Still creating... (2m10s elapsed)
aws_alb.alb: Still creating... (2m20s elapsed)
aws_alb.alb: Creation complete after 2m22s (ID: arn:aws:elasticloadbalancing:ap-southea...er/app/khiem-test-alb/3e1f7eaee507ffea)
aws_alb_listener.alb_http_listener: Creating...
  arn:                               "" => "<computed>"
  default_action.#:                  "" => "1"
  default_action.0.target_group_arn: "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:targetgroup/khiem-public-target-group/8c83c5482782160c"
  default_action.0.type:             "" => "forward"
  load_balancer_arn:                 "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:loadbalancer/app/khiem-test-alb/3e1f7eaee507ffea"
  port:                              "" => "80"
  protocol:                          "" => "HTTP"
  ssl_policy:                        "" => "<computed>"
aws_alb_listener.alb_https_listener: Creating...
  arn:                               "" => "<computed>"
  certificate_arn:                   "" => "arn:aws:acm:ap-southeast-1:218832052474:certificate/2819229d-6c29-4849-a476-b123f5b51f56"
  default_action.#:                  "" => "1"
  default_action.0.target_group_arn: "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:targetgroup/khiem-public-target-group/8c83c5482782160c"
  default_action.0.type:             "" => "forward"
  load_balancer_arn:                 "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:loadbalancer/app/khiem-test-alb/3e1f7eaee507ffea"
  port:                              "" => "443"
  protocol:                          "" => "HTTPS"
  ssl_policy:                        "" => "ELBSecurityPolicy-2016-08"
aws_alb_listener.alb_http_listener: Creation complete after 0s
aws_alb_listener.alb_https_listener: Creation complete after 0s (ID: arn:aws:elasticloadbalancing:ap-southea...-alb/3e1f7eaee507ffea/0d5df35cf6425343)
aws_alb_listener_rule.alb_https_author_rule: Creating...
  action.#:                      "" => "1"
  action.0.target_group_arn:     "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:targetgroup/khiem-author-target-group/e600a57f2882299b"
  action.0.type:                 "" => "forward"
  arn:                           "" => "<computed>"
  condition.#:                   "" => "1"
  condition.3686469405.field:    "" => "host-header"
  condition.3686469405.values.#: "" => "1"
  condition.3686469405.values.0: "" => "author-khiem.int.magnolia-now.com"
  listener_arn:                  "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:listener/app/khiem-test-alb/3e1f7eaee507ffea/0d5df35cf6425343"
  priority:                      "" => "99"
aws_alb_listener_rule.alb_https_public_rule: Creating...
  action.#:                     "" => "1"
  action.0.target_group_arn:    "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:targetgroup/khiem-public-target-group/8c83c5482782160c"
  action.0.type:                "" => "forward"
  arn:                          "" => "<computed>"
  condition.#:                  "" => "1"
  condition.590182385.field:    "" => "host-header"
  condition.590182385.values.#: "" => "1"
  condition.590182385.values.0: "" => "public-khiem.int.magnolia-now.com"
  listener_arn:                 "" => "arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:listener/app/khiem-test-alb/3e1f7eaee507ffea/0d5df35cf6425343"
  priority:                     "" => "100"
aws_alb_listener_rule.alb_https_author_rule: Creation complete after 0s (ID: arn:aws:elasticloadbalancing:ap-southea...ffea/0d5df35cf6425343/17350e80003a00f9)
aws_alb_listener_rule.alb_https_public_rule: Creation complete after 0s (ID: arn:aws:elasticloadbalancing:ap-southea...ffea/0d5df35cf6425343/0a204f8fe7701e0c)
Error applying plan:

2 error(s) occurred:

* aws_alb_listener_rule.alb_http_public_rule: Resource 'aws_alb_listener.alb_http_listener' does not have attribute 'arn' for variable 'aws_alb_listener.alb_http_listener.arn'
* aws_alb_listener_rule.alb_http_author_rule: Resource 'aws_alb_listener.alb_http_listener' does not have attribute 'arn' for variable 'aws_alb_listener.alb_http_listener.arn'

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

In this case actually the Listener is already created in AWS.

Destroy Error (after a successful apply)

ubuntu@ip-172-31-29-175:/vagrant/provision/terraform-test/alb_listener$ terraform destroy -force && terraform apply && terraform destroy -force && terraform apply
aws_alb_target_group.public_target_group: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...m-public-target-group/f03543805196b4ee)
aws_alb_target_group.author_target_group: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...m-author-target-group/ecd5669c14c43e5d)
aws_alb.alb: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...er/app/khiem-test-alb/4a210b69d6ae0f76)
data.aws_acm_certificate.mgnl_certificate: Refreshing state...
aws_alb_target_group_attachment.author_target_group_att: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...69c14c43e5d-20171128153710934400000002)
aws_alb_target_group_attachment.public_target_group_att: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...3805196b4ee-20171128153710873100000001)
aws_alb_listener.alb_https_listener: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...-alb/4a210b69d6ae0f76/f668a85fcab2d5b7)
aws_alb_listener.alb_http_listener: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...-alb/4a210b69d6ae0f76/12f64559728eb05f)
aws_alb_listener_rule.alb_https_public_rule: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...0f76/f668a85fcab2d5b7/ec3e976e8691387b)
aws_alb_listener_rule.alb_https_author_rule: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...0f76/f668a85fcab2d5b7/a8eb533e8fb47bfb)
aws_alb_listener_rule.alb_http_public_rule: Refreshing state... (ID: arn:aws:elasticloadbalancing:ap-southea...0f76/12f64559728eb05f/14f5d14b2ded5572)
aws_alb_listener_rule.alb_https_author_rule: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...0f76/f668a85fcab2d5b7/a8eb533e8fb47bfb)
aws_alb_target_group_attachment.author_target_group_att: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...69c14c43e5d-20171128153710934400000002)
aws_alb_listener_rule.alb_http_public_rule: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...0f76/12f64559728eb05f/14f5d14b2ded5572)
aws_alb_listener_rule.alb_https_public_rule: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...0f76/f668a85fcab2d5b7/ec3e976e8691387b)
aws_alb_target_group_attachment.public_target_group_att: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...3805196b4ee-20171128153710873100000001)
aws_alb_listener_rule.alb_https_author_rule: Destruction complete after 0s
aws_alb_listener_rule.alb_https_public_rule: Destruction complete after 0s
aws_alb_listener.alb_https_listener: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...-alb/4a210b69d6ae0f76/f668a85fcab2d5b7)
aws_alb_target_group_attachment.author_target_group_att: Destruction complete after 0s
aws_alb_target_group.author_target_group: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...m-author-target-group/ecd5669c14c43e5d)
aws_alb_target_group_attachment.public_target_group_att: Destruction complete after 0s
aws_alb_listener_rule.alb_http_public_rule: Destruction complete after 0s
aws_alb_listener.alb_http_listener: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...-alb/4a210b69d6ae0f76/12f64559728eb05f)
aws_alb_listener.alb_http_listener: Destruction complete after 0s
aws_alb_listener.alb_https_listener: Destruction complete after 0s
aws_alb_target_group.public_target_group: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...m-public-target-group/f03543805196b4ee)
aws_alb.alb: Destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...er/app/khiem-test-alb/4a210b69d6ae0f76)
aws_alb_target_group.public_target_group: Destruction complete after 0s
aws_alb.alb: Still destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...er/app/khiem-test-alb/4a210b69d6ae0f76, 10s elapsed)
aws_alb.alb: Still destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...er/app/khiem-test-alb/4a210b69d6ae0f76, 20s elapsed)
aws_alb.alb: Still destroying... (ID: arn:aws:elasticloadbalancing:ap-southea...er/app/khiem-test-alb/4a210b69d6ae0f76, 30s elapsed)
aws_alb.alb: Destruction complete after 35s
Error applying plan:

1 error(s) occurred:

* aws_alb_target_group.author_target_group (destroy): 1 error(s) occurred:

* aws_alb_target_group.author_target_group: Error deleting Target Group: ResourceInUse: Target group 'arn:aws:elasticloadbalancing:ap-southeast-1:218832052474:targetgroup/khiem-author-target-group/ecd5669c14c43e5d' is currently in use by a listener or a rule
	status code: 400, request id: 022c5dfb-d455-11e7-b38c-557a182c4eef

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

And in this case when I try delete the target group from EC2 Console there's no issue (as the listener is actually deleted).

Terraform version is 0.10.7

@dohoangkhiem
Copy link
Contributor

dohoangkhiem commented Dec 4, 2017

any updates or info regarding this, really a blocking issues for us as Terraform became pathetically unreliable?

@dohoangkhiem
Copy link
Contributor

dohoangkhiem commented Dec 14, 2017

As we suspect there're something wrong with the communication between Terraform and AWS API, diving deep into AWS provider source code, we found the Read method for resource AWS ALB Listener and AWS ALB Listener Rule like below

func resourceAwsAlbListenerRead(d *schema.ResourceData, meta interface{}) error {
	elbconn := meta.(*AWSClient).elbv2conn
	resp, err := elbconn.DescribeListeners(&elbv2.DescribeListenersInput{
		ListenerArns: []*string{aws.String(d.Id())},
	})
	if err != nil {
		if isListenerNotFound(err) {
			log.Printf("[WARN] DescribeListeners - removing %s from state", d.Id())
			d.SetId("")
			return nil
		}
		return errwrap.Wrapf("Error retrieving Listener: {{err}}", err)
	}
...
}

So in case of creating ALB Listener, after sending request to create the resource Terraform tries to read the created listener and IF (sounds weird, but somehow - might be just recently) it couldn't find - the listener resource is removed the state, despite the fact that listener was created successfully and existed in AWS.

Quite the same case when destroying, Terraform calls to read function of ALB Listener Rule to refresh the resource

func resourceAwsAlbListenerRuleRead(d *schema.ResourceData, meta interface{}) error {
	elbconn := meta.(*AWSClient).elbv2conn

	resp, err := elbconn.DescribeRules(&elbv2.DescribeRulesInput{
		RuleArns: []*string{aws.String(d.Id())},
	})

	if err != nil {
		if isRuleNotFound(err) {
			log.Printf("[WARN] DescribeRules - removing %s from state", d.Id())
			d.SetId("")
			return nil
		}
		return errwrap.Wrapf(fmt.Sprintf("Error retrieving Rules for listener %s: {{err}}", d.Id()), err)
	}

...

}

when it couldn't read the rule, rule is removed from the state (while the rule actually still exists in AWS) - so later we got error when Terraform trying to delete the target group that's still being used by the rule.

We try with the workaround by delay reading after creation or adding a retry, that helped us to overcome this problematic issue

func resourceAwsAlbListenerRead(d *schema.ResourceData, meta interface{}) error {
	elbconn := meta.(*AWSClient).elbv2conn
        time.Sleep(3000 * time.Millisecond)
	resp, err := elbconn.DescribeListeners(&elbv2.DescribeListenersInput{
		ListenerArns: []*string{aws.String(d.Id())},
	})
	if err != nil {
		if isListenerNotFound(err) {
			log.Printf("[WARN] DescribeListeners - removing %s from state", d.Id())
			d.SetId("")
			return nil
		}
		return errwrap.Wrapf("Error retrieving Listener: {{err}}", err)
	}
...
}

and

func resourceAwsAlbListenerRuleRead(d *schema.ResourceData, meta interface{}) error {
	elbconn := meta.(*AWSClient).elbv2conn

	resp, err := elbconn.DescribeRules(&elbv2.DescribeRulesInput{
		RuleArns: []*string{aws.String(d.Id())},
	})

	if err != nil {
		time.Sleep(2000 * time.Millisecond)
		resp, err = elbconn.DescribeRules(&elbv2.DescribeRulesInput{
			RuleArns: []*string{aws.String(d.Id())},
		})

		if err != nil {
			if isRuleNotFound(err) {
				log.Printf("[WARN] DescribeRules - removing %s from state", d.Id())
				d.SetId("")
				return nil
			}
			return errwrap.Wrapf(fmt.Sprintf("Error retrieving Rules for listener %s: {{err}}", d.Id()), err)
		}
	}
...
}

@whereisaaron
Copy link
Contributor

whereisaaron commented Jan 5, 2018

I get terraform destroy errors for NLB Target Groups, because terraform doesn't remove them from the Listeners first.

* module.nlb.aws_lb_target_group.nlb[0] (destroy): 1 error(s) occurred:

* aws_lb_target_group.nlb.0: Error deleting Target Group: ResourceInUse: Target group 'arn:aws:elasticloadbalancing:us-east-2:1234567890:targetgroup/NodePort-31001/1ef21a9ce6b9dda5' is currently in use by a listener or a rule
        status code: 400, request id: 1d147f4c-f24f-11e7-8b81-7128d9177daa

@whereisaaron
Copy link
Contributor

When terraform is destroying an NLB with associated EIP's, it unnecessarily dissociates the EIPs from the ENI and deletes the EIPs!!!

NLB ENI's are owned by the NLB, not your account, the NLB manages them. Neither Terraform action is necessary or correct. This leads to spurious errors, e.g. where Terraform is trying to dissociate an EIP from and ENI that has already been deleted due to the deletion of the NLB.

Terraform should just delete the NLB, wait for it complete, then create the new one using the existing EIP resources.

@radeksimko radeksimko added the service/elbv2 Issues and PRs that pertain to the elbv2 service. label Jan 28, 2018
@runtheops
Copy link

Same problem here.
Any Terraform activity around aws_alb_target_group w/ aws_lb_listener_rule's has a potential of ending up with manual work to do.
The issue pops up randomly upon resources destruction, depends_on and lifecycle do not help this anyhow, since Terraform completely loses track of lb listener rules, at this point they are there in AWS, but not in tfstate anymore.

 # terraform show
data.aws_vpc.vpc
module.x.aws_alb_target_group.tg
# terraform destroy
Terraform will perform the following actions:

  - module.x.aws_alb_target_group.tg

Plan: 0 to add, 0 to change, 1 to destroy.
...
Error deleting Target Group: ResourceInUse: Target group ... is currently in use by a listener or a rule

@mildred
Copy link
Contributor

mildred commented Jul 12, 2018

AWS resources can take a bit of time to appear in listings. We should not remove them from state before we see them for a first time after creation.

I believe I experienced the same issue but for ECR repositories leading to different error message too. See #3910

The solution is quite simple: instead of removing the listener from the tfstate when reading the resource after creation fails (because creation takes time), retry reading until the resource is found. Of course, only retry on not found resource if we are in the creation case.

The code is quite simple, here is the snippet for ECR repositories:

	err := resource.Retry(d.Timeout(schema.TimeoutRead), func() *resource.RetryError {
		var err error
		out, err = conn.DescribeRepositories(input)
		if err != nil {
			if d.IsNewResource() && isAWSErr(err, ecr.ErrCodeRepositoryNotFoundException, "") {
				return resource.RetryableError(err)
			} else {
				return resource.NonRetryableError(err)
			}
		}
		return nil
	})

@mildred
Copy link
Contributor

mildred commented Jul 12, 2018

This should fix the listener unstabilities...

diff --git a/aws/resource_aws_lb_listener.go b/aws/resource_aws_lb_listener.go
index 918d0525..133271b7 100644
--- a/aws/resource_aws_lb_listener.go
+++ b/aws/resource_aws_lb_listener.go
@@ -25,6 +25,10 @@ func resourceAwsLbListener() *schema.Resource {
                        State: schema.ImportStatePassthrough,
                },
 
+               Timeouts: &schema.ResourceTimeout{
+                       Read: schema.DefaultTimeout(10 * time.Minute),
+               },
+
                Schema: map[string]*schema.Schema{
                        "arn": {
                                Type:     schema.TypeString,
@@ -151,11 +155,26 @@ func resourceAwsLbListenerCreate(d *schema.ResourceData, meta interface{}) error
 func resourceAwsLbListenerRead(d *schema.ResourceData, meta interface{}) error {
        elbconn := meta.(*AWSClient).elbv2conn
 
-       resp, err := elbconn.DescribeListeners(&elbv2.DescribeListenersInput{
+       var resp *elbv2.DescribeListenersOutput
+       var request = &elbv2.DescribeListenersInput{
                ListenerArns: []*string{aws.String(d.Id())},
+       }
+
+       err := resource.Retry(d.Timeout(schema.TimeoutRead), func() *resource.RetryError {
+               var err error
+               resp, err = elbconn.DescribeListeners(request)
+               if err != nil {
+                       if d.IsNewResource() && isAWSErr(err, elbv2.ErrCodeListenerNotFoundException, "") {
+                               return resource.RetryableError(err)
+                       } else {
+                               return resource.NonRetryableError(err)
+                       }
+               }
+               return nil
        })
+
        if err != nil {
-               if isAWSErr(err, elbv2.ErrCodeListenerNotFoundException, "") {
+               if !d.IsNewResource() && isAWSErr(err, elbv2.ErrCodeListenerNotFoundException, "") {
                        log.Printf("[WARN] DescribeListeners - removing %s from state", d.Id())
                        d.SetId("")
                        return nil

mildred added a commit to squarescale/terraform-provider-aws that referenced this issue Jul 12, 2018
AWS resources can take some time to appear in listings. When we created
a listener before, it could happen that the creation succeeded but the
listing of the resource right after creation would return a resource not
found.

This can me normal on AWS where changes can take some time to propagate.
The correct behaviour in this case is to retry reading the resource
until we find it (because we know that it has been created
successfully).

We don't change the behaviour on resource reads that are not following a
creation where a resource not found is still going to remove the
resource from the tfstate.

This should fix hashicorp#2456
@huseyinham
Copy link

@mildred would that fix only work for a listener and not listener rule?

The main issue I face - and is mentioned above - is during a destruct of a listener rule. Terraform believes to have deleted a listener rule and removes it from state, but in fact it remains there. Then when terraform attempts to destroy the target group, it cannot as it is in use by a rule.

@ngortheone
Copy link

It seems that this #5490 is another manifestation of this problem

@bflad bflad added this to the v1.40.0 milestone Oct 9, 2018
@bflad
Copy link
Contributor

bflad commented Oct 9, 2018

The fix for the aws_lb_listener resource to retry reads for eventual consistency after creation has been merged and will release with version 1.40.0 of the AWS provider, likely middle of this week. 👍

@bflad
Copy link
Contributor

bflad commented Oct 10, 2018

This has been released in version 1.40.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

eric-luminal pushed a commit to LuminalHQ/terraform-provider-aws that referenced this issue Feb 19, 2020
AWS resources can take some time to appear in listings. When we created
a listener before, it could happen that the creation succeeded but the
listing of the resource right after creation would return a resource not
found.

This can me normal on AWS where changes can take some time to propagate.
The correct behaviour in this case is to retry reading the resource
until we find it (because we know that it has been created
successfully).

We don't change the behaviour on resource reads that are not following a
creation where a resource not found is still going to remove the
resource from the tfstate.

This should fix hashicorp#2456
@ghost
Copy link

ghost commented Apr 2, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Apr 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/elbv2 Issues and PRs that pertain to the elbv2 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants