Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioning with SSH on AWS times out #17140

Closed
vdemario opened this issue Jan 18, 2018 · 14 comments
Closed

Provisioning with SSH on AWS times out #17140

vdemario opened this issue Jan 18, 2018 · 14 comments
Labels
bug provisioner/remote-exec v0.11 Issues (primarily bugs) reported against v0.11 releases

Comments

@vdemario
Copy link

vdemario commented Jan 18, 2018

We've been provisioning machines with SSH on terraform for several months now. Today, for the first time, SSH timed out. I thought, at first, there was something wrong with the network or security group configuration on Amazon, perhaps our IP changed even though it was supposed to be fixed.

None of that happened and I can successfully connect via SSH to the instances created by Terraform. In fact, not only I can connect but if I run the inline commands manually that way everything works as expected.

Terraform Version

Terraform v0.11.2

I started experiencing this problem with v0.11.1 and upgraded hoping it would fix it. The first attempt at terraform apply with v0.11.2 succeeded. Later I changed the instance_count and the problem happened again.

Terraform Configuration Files

provisioner "remote-exec" {
    connection {
      type        = "ssh"
      agent       = false
      user        = "ubuntu"
      private_key = "${file("~/.ssh/private_key.pem")}"
    }

    inline = [
      # several removed commands
    ]
  }

Expected Behavior

aws_spot_instance_request.abcd_worker (remote-exec): Login Succeeded and then the inline commands.

Actual Behavior

SSH connection never happens and the following log messages loop until the apply command runs for 5 minutes and times out.

aws_spot_instance_request.abcd_worker[1] (remote-exec): Connecting to remote host via SSH...
aws_spot_instance_request.abcd_worker[1] (remote-exec):   Host:
aws_spot_instance_request.abcd_worker[1] (remote-exec):   User: ubuntu
aws_spot_instance_request.abcd_worker[1] (remote-exec):   Password: false
aws_spot_instance_request.abcd_worker[1] (remote-exec):   Private key: true
aws_spot_instance_request.abcd_worker[1] (remote-exec):   SSH Agent: false

Steps to Reproduce

terraform apply with the remote-exec/ssh provisioner is all we're doing.

References

I suspected my issue was related to #17117 since it mentions AWS security groups as well but it might not have any connection. I started having these problems on v0.11.1 and the security groups on my terraform configuration are being applied correctly.

@vdemario
Copy link
Author

More info: I ran terraform again on the same configuration, it was supposed to replace 3 instances. The provisioning worked on 2 of them and not on another.

It's possible there's something wrong on my end, perhaps network related. However, I can't explain being able to connect directly but not through terraform.

@jbardin jbardin added waiting-response An issue/pull request is waiting for a response from the community provisioner/remote-exec labels Jan 19, 2018
@jbardin
Copy link
Member

jbardin commented Jan 19, 2018

Hi @vdemario,

Sorry you're having an issue here. Is that the extent of the output that you see which is being repeated? Can you get the trace log output from terraform when this happens?

The only situation I can think of is that terraform got the wrong (or empty) information from the provider, but I"m not sure that could happen yet.

@vdemario
Copy link
Author

Hi @jbardin, I do have a trace log at https://drive.google.com/file/d/1OxOND5AOvJnZJG-K2Mouqf38m5PY7piu/view?usp=sharing. I edited a little bit of it in the beginning because I was concerned about credentials then gave up on editing and encrypted with the HashiCorp security public key.

It was generated at the time of my last comment, when provisioning worked for 2 instances and failed on 1.

@jbardin
Copy link
Member

jbardin commented Jan 19, 2018

Thanks for the logs!, Unfortunately they weren't very enlightening. What I didn't notice at first glance though is that the Host name in the cli output is empty. I'm not sure how that's happening yet, but it at least give me something to pursue.

@jbardin jbardin added bug and removed waiting-response An issue/pull request is waiting for a response from the community labels Jan 19, 2018
@riserrad
Copy link

@vdemario I experienced the same problem. In my case, the issue was that the security group that was being applied to the instance did not allow SSH connections. Once allowing it from the SG, I could achieve it.

@vdemario
Copy link
Author

That's not the case @ricardoserradas. I could connect directly outside terraform and these are very old security groups that only exist specifically for SSH.

@marcosinger
Copy link

I had the same issue today, so let me add more context to you guys how I solved it:

I changed from spot instances to a normal one (using aws_instance instead of aws_spot_instance_request) and it was deployed correctly. I don't know if changing the instance type is a false positive.

Let me know if there is something that I can provide to you guys, like logs or something to help in this case.

@vdemario
Copy link
Author

BTW, @marcosinger works with me so he's talking about the same terraform configuration as me.

@jbardin this reminds me: the empty Host is probably because of the spot instance request. Tags get applied to the spot request instead of the instance so the name is lost when we use spot instances. See #3263.

@ofrzeta
Copy link

ofrzeta commented Mar 9, 2018

I have the same problem. I had a working configuration with regular instances and changed to spot instances. Now Terraform times out trying to connect via SSH. While logging in with SSH from another terminal works fine. Running with TF_LOG=DEBUG shows the following:

2018-03-09T14:32:24.421+0100 [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2018/03/09 14:32:24 handshake error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
2018-03-09T14:32:24.421+0100 [DEBUG] plugin.terraform: remote-exec-provisioner (internal) 2018/03/09 14:32:24 [WARN] retryable error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain

@ofrzeta
Copy link

ofrzeta commented Mar 9, 2018

I found the solution now. For spot instances you need to add "wait_for_fulfillment = true" to make SSH remote exec provisioning work.

@marcosinger
Copy link

@ofrzeta I did the same here and it works. I also added associate_public_ip_address = true which is optional, I don't know why.

@Michael-McD
Copy link

Looks like using the file provisioner can cause Terraforms "aws_instance" creation to time out. Interesting this only happens on some of our AWS environments.

Removing the file provisioner block and the EC2 instance is created in a few seconds, and accessible using ssh.

@hashibot hashibot added the v0.11 Issues (primarily bugs) reported against v0.11 releases label Aug 29, 2019
@jbardin
Copy link
Member

jbardin commented Dec 10, 2019

The original issue here was caused by the host field being left unset by the provider.
This is no longer a concern, because connection blocks require a host to be explicitly defined in the configuration.

Since there is no further development of 0.11 happening, we can close this out .

@jbardin jbardin closed this as completed Dec 10, 2019
@ghost
Copy link

ghost commented Mar 28, 2020

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@ghost ghost locked and limited conversation to collaborators Mar 28, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug provisioner/remote-exec v0.11 Issues (primarily bugs) reported against v0.11 releases
Projects
None yet
Development

No branches or pull requests

7 participants