-
Notifications
You must be signed in to change notification settings - Fork 9.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
service/ec2: Additional error handling for VPC Endpoint and VPC Endpoint Service deletion, sweeper fixes for Route Tables, VPC Endpoints, and VPC Endpoint Services #16656
Conversation
…int Service deletion, sweeper fixes for Route Tables, VPC Endpoints, and VPC Endpoint Services The `DeleteVpcEndpoints` and `DeleteVpcEndpointServiceConfigurations` APIs will sometimes return failures in an `Unsuccessful` array in the response, instead of a normal error. Previously the resource and sweeper did not account for this type of error response and would timeout on deletion after never reporting underlying issue: ``` 2020/12/08 18:43:52 Sweeper Tests ran unsuccessfully: ... - aws_vpc_endpoint_service: error waiting for VPC Endpoint Service (vpce-svc-0c300eaebde5aec19) to delete: timeout while waiting for state to become 'Deleted' (last state: 'Available', timeout: 10m0s) ... - aws_vpc_endpoint: error waiting for VPC Endpoint (vpce-0395ac1f6cc86b11a) to delete: timeout while waiting for state to become 'deleted' (last state: 'available', timeout: 10m0s) ``` Now the resource will handle this response type, the VPC Endpoint sweepers have been refactored to use the resource deletion function, and the VPC Endpoint sweepers will correctly show the unsuccessful deletions while immediately continuing on to the next item: ``` 2020/12/08 20:46:59 Sweeper Tests ran unsuccessfully: - aws_vpc_endpoint_service: 1 error occurred: * error deleting EC2 VPC Endpoint Service (vpce-svc-0c300eaebde5aec19): error deleting EC2 VPC Endpoint Service (vpce-svc-0c300eaebde5aec19): 1 error occurred: * vpce-svc-0c300eaebde5aec19: ExistingVpcEndpointConnections: Service has existing active VPC Endpoint connections! ... - aws_vpc_endpoint: 1 error occurred: * error deleting EC2 VPC Endpoint (vpce-0395ac1f6cc86b11a): error deleting EC2 VPC Endpoint (vpce-0395ac1f6cc86b11a): 1 error occurred: * vpce-0395ac1f6cc86b11a: InvalidParameter: Endpoint must be removed from route table before deletion ``` To fix the underlying cause of these errors, the Route Table sweeper needed to be added as a VPC Endpoint dependency and the Route Table sweeper needed to delete non-local/non-public-IGW routes if the Route Table was the main route table for the VPC (as main Route Tables cannot be deleted): ``` 2020/12/08 21:12:50 [DEBUG] Running Sweepers for region (us-west-2): 2020/12/08 21:12:50 [DEBUG] Running Sweeper (aws_route_table) in region (us-west-2) 2020/12/08 21:12:50 [INFO] AWS Auth provider used: "SharedCredentialsProvider" 2020/12/08 21:12:50 [DEBUG] Trying to get account information via sts:GetCallerIdentity 2020/12/08 21:12:50 [DEBUG] Trying to get account information via sts:GetCallerIdentity 2020/12/08 21:12:52 [DEBUG] Deleting EC2 Route Table (rtb-09af9318dcc5ccaf9) Route 2020/12/08 21:12:52 [DEBUG] Sweeper (aws_vpc_endpoint_service) has dependency (aws_vpc_endpoint), running.. 2020/12/08 21:12:52 [DEBUG] Sweeper (aws_vpc_endpoint) has dependency (aws_route_table), running.. 2020/12/08 21:12:52 [DEBUG] Sweeper (aws_route_table) already ran in region (us-west-2) 2020/12/08 21:12:52 [DEBUG] Running Sweeper (aws_vpc_endpoint) in region (us-west-2) 2020/12/08 21:12:53 [INFO] Deleting EC2 VPC Endpoint: vpce-0395ac1f6cc86b11a 2020/12/08 21:12:53 [DEBUG] Waiting for state to become: [deleted] 2020/12/08 21:12:58 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a 2020/12/08 21:12:59 [TRACE] Waiting 5s before next try 2020/12/08 21:13:04 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a 2020/12/08 21:13:04 [TRACE] Waiting 10s before next try 2020/12/08 21:13:14 [DEBUG] Reading VPC Endpoint: vpce-0395ac1f6cc86b11a 2020/12/08 21:13:15 [DEBUG] Running Sweeper (aws_vpc_endpoint_service) in region (us-west-2) 2020/12/08 21:13:15 [INFO] Deleting EC2 VPC Endpoint Service: vpce-svc-0c300eaebde5aec19 2020/12/08 21:13:16 [DEBUG] Waiting for state to become: [Deleted] 2020/12/08 21:13:21 [DEBUG] Reading VPC Endpoint Service Configuration: vpce-svc-0c300eaebde5aec19 2020/12/08 21:13:21 [DEBUG] Sweeper (aws_vpc_endpoint) has dependency (aws_route_table), running.. 2020/12/08 21:13:21 [DEBUG] Sweeper (aws_route_table) already ran in region (us-west-2) 2020/12/08 21:13:21 [DEBUG] Sweeper (aws_vpc_endpoint) already ran in region (us-west-2) 2020/12/08 21:13:21 Sweeper Tests ran successfully: - aws_vpc_endpoint_service - aws_route_table - aws_vpc_endpoint ok github.com/terraform-providers/terraform-provider-aws/aws 33.689s ``` Output from acceptance testing: ``` --- PASS: TestAccAWSVpcEndpoint_disappears (37.60s) --- PASS: TestAccAWSVpcEndpoint_gatewayBasic (38.82s) --- PASS: TestAccAWSVpcEndpoint_gatewayPolicy (72.11s) --- PASS: TestAccAWSVpcEndpoint_gatewayWithRouteTableAndPolicy (87.14s) --- PASS: TestAccAWSVpcEndpoint_interfaceBasic (78.17s) --- PASS: TestAccAWSVpcEndpoint_interfaceNonAWSServiceAcceptOnCreate (276.65s) --- PASS: TestAccAWSVpcEndpoint_interfaceNonAWSServiceAcceptOnUpdate (333.65s) --- PASS: TestAccAWSVpcEndpoint_interfaceWithSubnetAndSecurityGroup (448.87s) --- PASS: TestAccAWSVpcEndpoint_tags (89.87s) --- PASS: TestAccAWSVpcEndpoint_VpcEndpointType_GatewayLoadBalancer (274.15s) --- PASS: TestAccAWSVpcEndpointService_AllowedPrincipals (280.60s) --- PASS: TestAccAWSVpcEndpointService_basic (252.94s) --- PASS: TestAccAWSVpcEndpointService_disappears (258.46s) --- PASS: TestAccAWSVpcEndpointService_GatewayLoadBalancerArns (208.91s) --- PASS: TestAccAWSVpcEndpointService_tags (288.75s) ``` Note: When working with assume role credentials, some of these test configurations can error due to the STS `GetCallerIdentity` ARN: ``` === CONT TestAccAWSVpcEndpoint_VpcEndpointType_GatewayLoadBalancer resource_aws_vpc_endpoint_test.go:519: Step 1/2 error: Error running apply: Error: error adding VPC Endpoint Service permissions: InvalidPrincipal: Invalid Principal: 'arn:aws:sts::--OMITTED--:assumed-role/terraform_team1_dev-admin/--OMITTED--' status code: 400, request id: 375c4645-3761-49b1-9758-3c9b5a51c115 === CONT TestAccAWSVpcEndpointService_AllowedPrincipals resource_aws_vpc_endpoint_service_test.go:125: Step 1/3 error: Error running apply: Error: error adding VPC Endpoint Service permissions: InvalidPrincipal: Invalid Principal: 'arn:aws:sts::--OMITTED--:assumed-role/terraform_team1_dev-admin/--OMITTED--' status code: 400, request id: f3e9a77f-3c7d-4acc-9127-f931c4ffbb37 ``` Will create followup issue for that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome! nice catch of those sneaky delete errors 👍 👍
Reverified after rebase to fix merge conflict:
|
This has been released in version 3.27.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading. For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template for triage. Thanks! |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks! |
Community Note
Reference: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DeleteVpcEndpoints.html
Reference: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DeleteVpcEndpointServiceConfigurations.html
Release note for CHANGELOG:
The
DeleteVpcEndpoints
andDeleteVpcEndpointServiceConfigurations
APIs will sometimes return failures in anUnsuccessful
array in the response, instead of a normal error. Previously the resource and sweeper did not account for this type of error response and would timeout on deletion after never reporting underlying issue:Now the resource will handle this response type, the VPC Endpoint sweepers have been refactored to use the resource deletion function, and the VPC Endpoint sweepers will correctly show the unsuccessful deletions while immediately continuing on to the next item:
To fix the underlying cause of these errors, the Route Table sweeper needed to be added as a VPC Endpoint dependency and the Route Table sweeper needed to delete non-local/non-public-IGW routes if the Route Table was the main route table for the VPC (as main Route Tables cannot be deleted):
Output from acceptance testing:
Note: When working with assume role credentials, some of these test configurations can error due to the STS
GetCallerIdentity
ARN:Will create followup issue for that problem.