Skip to content
This repository has been archived by the owner on Feb 5, 2020. It is now read-only.

Terraform destroy fails if an ec2 instance fails to boot #894

Closed
ggreer opened this issue May 26, 2017 · 4 comments
Closed

Terraform destroy fails if an ec2 instance fails to boot #894

ggreer opened this issue May 26, 2017 · 4 comments

Comments

@ggreer
Copy link
Contributor

ggreer commented May 26, 2017

Sometimes, an instance fails to come up, causing terraform apply to fail:

[0m�[1mmodule.workers.aws_launch_configuration.worker_conf: Creation complete (ID: tf-aws-pr-891-1-worker-0029e03c8ba18038fd03677b55)�[0m�[0m
�[0m�[1mmodule.workers.aws_autoscaling_group.workers: Creating...�[0m
  arn:                            "" => "<computed>"
  availability_zones.#:           "" => "<computed>"
  default_cooldown:               "" => "<computed>"
  desired_capacity:               "" => "4"
  force_delete:                   "" => "false"
  health_check_grace_period:      "" => "300"
  health_check_type:              "" => "<computed>"
  launch_configuration:           "" => "tf-aws-pr-891-1-worker-0029e03c8ba18038fd03677b55"
  load_balancers.#:               "" => "<computed>"
  max_size:                       "" => "12"
  metrics_granularity:    �[31mError applying plan:

1 error(s) occurred:

* module.etcd.aws_instance.etcd_node[0]: 1 error(s) occurred:

* aws_instance.etcd_node.0: Error waiting for instance (i-09d8c734c50c966f2) to become ready: Failed to reach target state. Reason: Server.InternalError: Internal error on launch

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.�[0m�[0m
make: *** [apply] Error 1

If this happens, terraform destroy fails with an index error:

[0m�[1mtemplate_dir.tectonic: Refreshing state... (ID: d74b955d7329b0349ff0be4e2445a65c99b6f293)�[0m
�[31mError refreshing state: 1 error(s) occurred:

* module.etcd.aws_route53_record.etc_a_nodes: 1 error(s) occurred:

* module.etcd.aws_route53_record.etc_a_nodes[2]: index 2 out of range for list aws_instance.etcd_node.*.private_ip (max 2) in:

${aws_instance.etcd_node.*.private_ip[count.index]}�[0m�[0m
make: *** [destroy] Error 1
script returned exit code 2

Resources aren't destroyed and have to be cleaned up manually. :(

If you have access to Jenkins, you can see more logs of this failure at https://jenkins-tectonic-installer.prod.coreos.systems/blue/organizations/jenkins/coreos%20-%20tectonic-installer%2Ftectonic-installer/detail/PR-891/1/pipeline/

@alexsomesan
Copy link
Contributor

@ggreer
Can we still get the statefile from this incident, so we can dissect what's in there?

@ggreer
Copy link
Contributor Author

ggreer commented May 26, 2017

Looks like the assets were deleted.

@lblackstone
Copy link
Contributor

lblackstone commented May 26, 2017

I've seen the same error on the OpenStack provider. It looked to me like the DNS logic wasn't handling the difference between tectonic_master_count and count.index if an instance was deleted or failed to spawn.

https://github.com/coreos/tectonic-installer/blob/master/platforms/openstack/nova/dns.tf#L27
https://github.com/coreos/tectonic-installer/blob/master/platforms/openstack/nova/dns.tf#L32

@ggreer ggreer mentioned this issue Jun 14, 2017
6 tasks
@s-urbaniak
Copy link
Contributor

closing in favor of #1246

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants