-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS ALB http/https listener creation/destruction unstable and caused errors for dependencies #2456
Comments
any updates or info regarding this, really a blocking issues for us as Terraform became pathetically unreliable? |
As we suspect there're something wrong with the communication between Terraform and AWS API, diving deep into AWS provider source code, we found the Read method for resource AWS ALB Listener and AWS ALB Listener Rule like below
So in case of creating ALB Listener, after sending request to create the resource Terraform tries to read the created listener and IF (sounds weird, but somehow - might be just recently) it couldn't find - the listener resource is removed the state, despite the fact that listener was created successfully and existed in AWS. Quite the same case when destroying, Terraform calls to read function of ALB Listener Rule to refresh the resource
when it couldn't read the rule, rule is removed from the state (while the rule actually still exists in AWS) - so later we got error when Terraform trying to delete the target group that's still being used by the rule. We try with the workaround by delay reading after creation or adding a retry, that helped us to overcome this problematic issue
and
|
I get terraform destroy errors for NLB Target Groups, because terraform doesn't remove them from the Listeners first.
|
When terraform is destroying an NLB with associated EIP's, it unnecessarily dissociates the EIPs from the ENI and deletes the EIPs!!! NLB ENI's are owned by the NLB, not your account, the NLB manages them. Neither Terraform action is necessary or correct. This leads to spurious errors, e.g. where Terraform is trying to dissociate an EIP from and ENI that has already been deleted due to the deletion of the NLB. Terraform should just delete the NLB, wait for it complete, then create the new one using the existing EIP resources. |
Same problem here.
|
AWS resources can take a bit of time to appear in listings. We should not remove them from state before we see them for a first time after creation. I believe I experienced the same issue but for ECR repositories leading to different error message too. See #3910 The solution is quite simple: instead of removing the listener from the tfstate when reading the resource after creation fails (because creation takes time), retry reading until the resource is found. Of course, only retry on not found resource if we are in the creation case. The code is quite simple, here is the snippet for ECR repositories: err := resource.Retry(d.Timeout(schema.TimeoutRead), func() *resource.RetryError {
var err error
out, err = conn.DescribeRepositories(input)
if err != nil {
if d.IsNewResource() && isAWSErr(err, ecr.ErrCodeRepositoryNotFoundException, "") {
return resource.RetryableError(err)
} else {
return resource.NonRetryableError(err)
}
}
return nil
}) |
This should fix the listener unstabilities... diff --git a/aws/resource_aws_lb_listener.go b/aws/resource_aws_lb_listener.go
index 918d0525..133271b7 100644
--- a/aws/resource_aws_lb_listener.go
+++ b/aws/resource_aws_lb_listener.go
@@ -25,6 +25,10 @@ func resourceAwsLbListener() *schema.Resource {
State: schema.ImportStatePassthrough,
},
+ Timeouts: &schema.ResourceTimeout{
+ Read: schema.DefaultTimeout(10 * time.Minute),
+ },
+
Schema: map[string]*schema.Schema{
"arn": {
Type: schema.TypeString,
@@ -151,11 +155,26 @@ func resourceAwsLbListenerCreate(d *schema.ResourceData, meta interface{}) error
func resourceAwsLbListenerRead(d *schema.ResourceData, meta interface{}) error {
elbconn := meta.(*AWSClient).elbv2conn
- resp, err := elbconn.DescribeListeners(&elbv2.DescribeListenersInput{
+ var resp *elbv2.DescribeListenersOutput
+ var request = &elbv2.DescribeListenersInput{
ListenerArns: []*string{aws.String(d.Id())},
+ }
+
+ err := resource.Retry(d.Timeout(schema.TimeoutRead), func() *resource.RetryError {
+ var err error
+ resp, err = elbconn.DescribeListeners(request)
+ if err != nil {
+ if d.IsNewResource() && isAWSErr(err, elbv2.ErrCodeListenerNotFoundException, "") {
+ return resource.RetryableError(err)
+ } else {
+ return resource.NonRetryableError(err)
+ }
+ }
+ return nil
})
+
if err != nil {
- if isAWSErr(err, elbv2.ErrCodeListenerNotFoundException, "") {
+ if !d.IsNewResource() && isAWSErr(err, elbv2.ErrCodeListenerNotFoundException, "") {
log.Printf("[WARN] DescribeListeners - removing %s from state", d.Id())
d.SetId("")
return nil |
AWS resources can take some time to appear in listings. When we created a listener before, it could happen that the creation succeeded but the listing of the resource right after creation would return a resource not found. This can me normal on AWS where changes can take some time to propagate. The correct behaviour in this case is to retry reading the resource until we find it (because we know that it has been created successfully). We don't change the behaviour on resource reads that are not following a creation where a resource not found is still going to remove the resource from the tfstate. This should fix hashicorp#2456
@mildred would that fix only work for a listener and not listener rule? The main issue I face - and is mentioned above - is during a destruct of a listener rule. Terraform believes to have deleted a listener rule and removes it from state, but in fact it remains there. Then when terraform attempts to destroy the target group, it cannot as it is in use by a rule. |
It seems that this #5490 is another manifestation of this problem |
The fix for the |
This has been released in version 1.40.0 of the AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. |
AWS resources can take some time to appear in listings. When we created a listener before, it could happen that the creation succeeded but the listing of the resource right after creation would return a resource not found. This can me normal on AWS where changes can take some time to propagate. The correct behaviour in this case is to retry reading the resource until we find it (because we know that it has been created successfully). We don't change the behaviour on resource reads that are not following a creation where a resource not found is still going to remove the resource from the tfstate. This should fix hashicorp#2456
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
This issue was originally opened by @dohoangkhiem as hashicorp/terraform#16779. It was migrated here as a result of the provider split. The original body of the issue is below.
Hi there,
Recently we've found out the creation/destruction of our ALB http/https (especially https one) listener become very unstable, it's very common (but not reproducible every time) that it failed the first time (with error described below) - the symptom is like the
aws_alb_listener
resource is created but ARN is not recorded in state - that caused failure for dependent resources likeaws_alb_listener_rule
, or it's destroyed duringterraform destroy
but is somehow not completely gone soaws_alb_target_group
deletion failed (as target group is in-use by the listener).We don't get these errors every time but it's increasingly happening recently and a small test with just few resources regarding ALB and running several
apply
anddestroy
continuously (liketerraform destroy -force && terraform apply && terraform destroy -force && terraform apply
) would occasionally produce such errors (with our real production code which is much more complex the errors happened more often):here is the TF configuration for test
Apply Error
In this case actually the Listener is already created in AWS.
Destroy Error (after a successful apply)
And in this case when I try delete the target group from EC2 Console there's no issue (as the listener is actually deleted).
Terraform version is 0.10.7
The text was updated successfully, but these errors were encountered: