-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws_kinesis_firehose_delivery_stream intermittently fails to be created or destroyed before timeout #22645
Comments
I have also experienced this issue before. If I remember correctly, on timeout, the Specifically, if a timeout occurs when creating the resource for the first time, it is not written to state, and on subsequent runs, the resource collides with the stream that has already been created. This is less than ideal because resources are actually being orphaned which is why they have to be imported or manually deleted. I would agree that having the ability to increase the timeout for this resource would be the easiest and safest solution. |
I am seeing this also with a Firehose stream going to Amazon OpenSearch Service and logging to S3. The timeout is happening 100% of the time for me. The Firehose stream never completes successfully. After about an hour it finally leaves the "creating" status and is left in an "inoperable" status with the error below. The only option to delete it and try again. This seems like an AWS issue but Terraform could at least handle the state better. |
Through process of elimination I was able to resolve the issue that was causing my timeout. I was able to recreate the timeout in the AWS Console and after creating a dozen data streams narrowed the issue down to having only one subnet in the Elasticsearch config. Changing this in Terraform also fixed my timeout there. Specifically having two subnets instead of one for |
This bug happens intermittently for me as well, and this appears to have nothing to do with VPC configuration, etc. It seems to do with the sheer number of firehoses I have (the more I have, the longer the next one takes to provision). This seems to be a limitation of AWS, but is there no way to put retry logic in place, etc.? |
Add timeout to kineses firehose create tmp
* r/aws_kinesis_firehose_delivery_stream add timeouts (#22645) Add timeout to kineses firehose create tmp * r/aws_firehose_delivery_stream: add configurable delete timeout * r/aws_firehose_delivery_stream(docs): update timeouts * r/aws_firehose_delivery_stream: add configurable update timeout * chore: changelog --------- Co-authored-by: Jared Baker <jared.baker@hashicorp.com>
This functionality has been released in v4.55.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. |
Community Note
Terraform CLI and Terraform AWS Provider Version
Affected Resource(s)
Terraform Configuration Files
I don't believe that there's anything specific about the content of these resources in our configuration which leads to this problem but could provide a more specific example if that would be useful.
Debug Output
Panic Output
Expected Behavior
Kinesis Firehose delivery stream is created successfully.
Actual Behavior
Intermittently, the Terraform apply fails with this error:
When this error has occurred, creation of the delivery stream in AWS has usually succeeded eventually. Since the Terraform run errors out without committing information for the resource to state, subsequent attempts to repeat the apply fail due to the resource already existing in AWS. For example:
The only methods to recover from this condition seem to be to either import the resource into Terraform state or delete the delivery stream manually (outside of Terraform) before repeating a Terraform apply to recreate the stream. After the initial failure occurs, we've commonly seen that creations of a few additional delivery streams via Terraform with different names may fail with the same error. We've assumed that this may be due to some intermittent sluggishness in AWS, where 20 minutes doesn't seem to be a long enough wait timeout in some cases.
We don't have a good feel at this point as to what the timeout would need to be increased to in order to more reliably not see the issue. Ideally, it would be nice to be able to experiment with setting longer
create
anddelete
operation timeouts in theaws_kinesis_firehose_delivery_stream
resources to see what would work best. It does not appear, however, that theaws_kinesis_firehose_delivery_stream
resource currently supports setting operation timeouts and that these timeouts are instead just hardcoded to 20 minutes. If increasing the default timeout would not make sense for most users, we'd like to at least have the flexibility to set operation timeouts.Steps to Reproduce
terraform apply
Important Factoids
We've also intermittently seen
terraform destroy
commands for theaws_kinesis_firehose_delivery_stream
resources fail for a similar reason:In these cases, we have seen from the AWS console that the destroy does eventually succeed even after the
terraform
command has failed.References
The text was updated successfully, but these errors were encountered: