Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spot fleet request doesn't kill spawned instances on a destroy #10083

Closed
chriskinsman opened this issue Sep 12, 2019 · 9 comments · Fixed by #17268
Closed

Spot fleet request doesn't kill spawned instances on a destroy #10083

chriskinsman opened this issue Sep 12, 2019 · 9 comments · Fixed by #17268
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Milestone

Comments

@chriskinsman
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.12.8
aws provider version 2.23

Affected Resource(s)

  • aws_spot_fleet_request

Terraform Configuration Files

resource "aws_spot_fleet_request" "spot_fleet" {
  iam_fleet_role                      = "${var.spot_fleet_role}"
  spot_price                          = "1.4"
  target_capacity                     = 1
  valid_until                         = "2036-01-01T00:00:01Z"
  excess_capacity_termination_policy  = "Default"
  replace_unhealthy_instances         = true
  terminate_instances_with_expiration = true

  lifecycle {
    create_before_destroy = true
    ignore_changes = ["target_capacity"]
  }

  dynamic "launch_specification" {
      for_each = local.launch_specs

      content {
            instance_type = launch_specification.value[0]
            ami           = "${lookup(local.amis, data.aws_region.current.name)}"

            root_block_device {
                volume_size = "30"
                volume_type = "gp2"
            }

            ebs_block_device {
                volume_size = "50"
                volume_type = "gp2"
                device_name = "/dev/xvdcz"
            }

            key_name = "Default"

            user_data = <<USER_DATA
#!/bin/bash
echo ECS_CLUSTER='${aws_ecs_cluster.ecs-cluster.name}' > /etc/ecs/ecs.config
USER_DATA

            subnet_id                   = launch_specification.value[1]
            iam_instance_profile        = "${var.iam_instance_profile}"
            associate_public_ip_address = true
            vpc_security_group_ids      = "${var.security_groups}"

            tags = {
                type    = "spot"
                service = "${var.cluster_name}"
            }
      }
  }
}

Expected Behavior

Terraform destroy completes and all instances spawned by spot fleet request are terminated

Actual Behavior

Spot fleet request is cancelled but instances remain

Steps to Reproduce

  1. terraform apply
  2. terraform destroy
@ghost ghost added the service/ec2 Issues and PRs that pertain to the ec2 service. label Sep 12, 2019
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Sep 12, 2019
@chriskinsman
Copy link
Author

I see in the source where terminate_instances_with_expiration is used to decide to terminate the instances. I have this set.

For some reason they aren't being killed. Wonder if it could be related to my lifecycle {} settings.

@Bryksin
Copy link

Bryksin commented Dec 6, 2019

I confirm the same issue, my terraform code snippet is provided in another opened issue #11161
The destroy command is stuck in an infinitive loop attempting to delete a security group which is still in use by spot-fleet instance which is not terminated, even though spot-fleet state changed to:
fleetRequestChange | cancelled_running

Not even sure if it's terraform provider issue, or AWS not killing instances when a spot-fleet request was cancelled.

Update:
I noticed that when I cancel a spot-fleet in AWS console, it follow with pop-up with default tick on "terminate instances", from that I suspect that cancellation of spot-fleet and termination of the instances are 2 separate api calls. and it feels like terraform is not making 2nd call for termination

Unfortunately, in the documentation, I didn't find any property on what to do during Destroy action
the only one is instance_interruption_behaviour could match to the context, but it is optional and by default set to terminate

Any suggestions? it is really a big problem, as termination cannot be automated in the pipeline, or I will have to hook up some custom python script to kill instances before running terraform termination

@Bryksin
Copy link

Bryksin commented Dec 11, 2019

Found solution:
terminate_instances_with_expiration = true
Ticket can be closed

@totomz
Copy link

totomz commented Jan 25, 2020

terminate_instances_with_expiration = true is a working workaround but does not resolve this issue.

TerminateInstancesWithExpiration indicates whether running Spot Instances are terminated when the Spot Fleet request expires, not when it is cancelled, as per AWS Docs

Also, Terraform docs for iam_fleet_role describe the correct behaviour: spot instances should be terminated if the fleet request is cancelled (which is happening with terraform destroy) or when the Spot fleet request expires, if you set terminateInstancesWithExpiration.

AWS expose a different parameter to terminate the running spot instances when cancelling a spot fleet request, TerminateInstances

@matthewfranglen
Copy link

Is there any progress on this issue?

@antonbormotov
Copy link

Confirm, terraform 0.12.24 cancels spotfleet and terminates instances whenterminate_instances_with_expiration = true is set explicitly without valid_until.

@pmalek
Copy link
Contributor

pmalek commented Jan 24, 2021

I have submitted a small change that should rectify this #17268

This basically adds terminate_instances in the same way as ec2 fleet defines it (registry link, source link)

I'd need to work on test for this PR but the gist of the change is there.

@justinretzolk justinretzolk added bug Addresses a defect in current functionality. and removed needs-triage Waiting for first response or review from a maintainer. labels Dec 9, 2021
@github-actions github-actions bot added this to the v4.12.0 milestone Apr 25, 2022
@github-actions
Copy link

This functionality has been released in v4.12.0 of the Terraform AWS Provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

For further feature requests or bug reports with this functionality, please create a new GitHub issue following the template. Thank you!

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Addresses a defect in current functionality. service/ec2 Issues and PRs that pertain to the ec2 service.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants