-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can no longer reboot and continue. #17844
Comments
Hi @AndrewSav! In Terraform 0.11.4 there was a change to try to make Terraform detect and report certain error conditions, rather than retrying indefinitely. Unfortunately this change was found to be a little too sensitive, so e.g. if In 0.11.6 (#17744) this behavior was refined to treat authentication errors as retryable to support situations where |
I just encountered similar problem. Reboot needs to be triggered during initial setup of EC2 instance. To do that I'm using
Terraform fails with the following message:
It's definitely not related to Terraform version:
|
It looks like in cfa299d we upgraded our vendored version of the Go SSH library to a newer version that added that error message, but that went out in v0.8.5 (over a year ago) and so cannot be the culprit for a recently-introduced issue. The error seems to indicate that the SSH server closed the connection without reporting the result of the command, as described in RFC 4254 section 6.10, which I suppose could make sense if the The tricky thing here is that arguably the new behavior is more correct since the SSH execution is failing (it's not completing fully) and so therefore Terraform should not proceed and assume the instance is fully provisioned in this case... there are other reasons why the connection might be shut down that would not be safe to continue. Perhaps we can make a compromise here and add an option to the provisioner to treat this particular situation as a success, for situations where either the SSH server is being restarted or the system itself is being shut down. I'm not sure what is the best way to describe that situation to make an intuitive option, though: provisioner "remote-exec" {
inline = [
"sudo yum update -y",
"sudo reboot",
]
# sshd may exit before "sudo reboot" completes, preventing it from
# returning the script's exit status.
allow_missing_exit_status = true
} |
Adding a |
The Terraform team at HashiCorp won't be able to work on this in the near future due to our focus being elsewhere, but we'd be happy to review a pull request if someone has the time and motivation to implement it. Otherwise, we should be able to take a look at it once we've completed some other work in progress on the configuration language, which is likely to be at least a few months away. I'm sorry for this unintended change in behavior. As an alternative to staying on 0.11.3, it might be possible to arrange for a necessary reboot operation to happen asynchronously so that the provisioner is able to complete successfully before it begins. For example, perhaps using the |
@apparentlymart apologies, I'm on holiday until 26th of April and don't have access to the required infrastructure to test this until then. I'll make sure to test and report back when I've returned from holiday. |
I also encountered this problem when I wanted to trigger a reboot in a provisioner "remote-exec" {
inline = [
"sudo reboot &",
]
} Not completely verified that it works all the time but so far it has. |
@haxorof probably depends on flavor of linux. For what ever reason it did not work from terrafrorm with rancherOS for me (did not cause a reboot). Although from command line it of course works. So I still think it's affected by terraform interaction. |
I think the My thought about using the |
Just a followup, we implemented the suggestion from @haxorof (reboot &) and it's worked perfectly on ubuntu 16.04 so far. I was going to use |
@AndrewSav : Yes you are right. I tested on an Ubuntu 17.10 and now tried it on a FreeBSD. It seems that the |
Rather than using a time argument to shutdown, you could delay the reboot in a subshell.
|
@apparentlymart do you think a "remote-reboot" provisioner is appropriate? Guys would you like me to close the issue? |
I think since this seems to be a common enough issue for users, we should consider making it part of the provisioner itself. I don't think we need another provisioner altogether, since this is just a special case of remote-exec. Having a special field like |
Hitting this issue aswell. For a temporary workaround this seems to work for me (as mentioned earlier by others): (sleep 5 && reboot)& |
The above background reboots don't appear to be working for me on Ubuntu 18-04. Any news on this as a provisioner feature, similar to Packer's windows restart? https://www.packer.io/docs/provisioners/windows-restart.html EDIT: Using the following workaround (a
|
A similar issue exists on Windows with WinRM. A workaround that works for us is a remote-exec provisioner like this:
The first command schedules the reboot a few seconds later. It avoids the shutdown to sometime kill the |
This did not work for me with So instead I tried this and it is working fine and would of course work for any OS.
|
@chakatz Nice workaround, Though Terraform should be working for any reboot in between the terraform run. Terraform Please provide a solution to it at the earliest. |
Alternative workaround: |
@frafra did you try it yourself? Because that's exactly what's not working. |
@AndrewSav yes, sure, but this is a different syntax, and it works just fine for me, while with Here is my script: https://github.com/frafra/fedora-atomic-hetzner/blob/master/fedora-atomic-hetzner.tf |
Hi all, after some frustration, it seems I'm able to run with Terraform 0.11.11, but it definitely feels hacky though; having 1 null_resource with 3 provisioners (FYI: Windows instance provisioning):
More advanced testing still in progress, but initial tests seem fine... I guess in an ideal scenario, I'd like the Chef run to exit with code 35 or 37, but then the Terraform Chef provisioner to allow that to happen, reconnect and then pick up and complete the provisioning. Happy to get stuck in with a few more pointers on the Terraform internals - thanks in advance for your feedback! |
Changing 'sudo reboot' to 'sudo shutdown -r +0' to address exit status issue encountered after Terraform 0.11.3, see hashicorp/terraform#17844
provisioner "remote-exec" {
when = "create"
inline = [
"sudo shutdown -r +60",
"echo 0",
]
} |
If anyone is fighting with this on Linux (connection actively refused error) I've written a little PowerShell/Bash combo that should cover Terraform running on both Windows and Linux: https://gist.github.com/janoszen/9df88ba0b906af1c18c0812a7128af7a |
@frafra hm... there is no mention of |
Provisioner waits for exit code from shutdown command and fails because reboot is performed too fast. Fixes bsc#1135937 Upstream issue: hashicorp/terraform#17844
I moved the commands in a shell scripts that gets executed by TF; it is in the same repository :-) |
Shameless plug here, but maybe it actually helps someone to get reasonable workaround for this issue. I created TF provider, which is able to execute the comment, but ignore the result for that purpose and I don't have any problems with reboots now. The configuration is limited, but can be easily extended. Also Windows is not supported. https://github.com/invidian/terraform-provider-sshcommand |
I have a done a quick implementation of the allow_missing_exit_status at #22180 as described by @apparentlymart to handle this case, tested on both linux and windows systems. I'm not totally sold on this or something more general as "ignore_errors" that would allow more use cases and weird stuff. |
Today got panic in terraform on VM reboot (Terraform version: 0.12.3)
|
@AndrewSav it does not look related, but could you try the the change on #22180 ? |
New proposed solution: provisioner "remote-exec" {
inline = [ "reboot" ]
on_failure = "continue"
connection { host = self.ipv4_address }
} |
Provisioner waits for exit code from shutdown command and fails because reboot is performed too fast. Fixes bsc#1135937 Upstream issue: hashicorp/terraform#17844
@frafra for what it's worth, I'm still getting connection errors intermittently even with |
I found |
The problem is that it's a race. So you change something, timing slightly changes and it works once and you think you fixed it, but it intermittently keeps failing. |
Is this available for terraform 0.12.24 ? I am running into issue : An argument named "allow_missing_exit_status" is not expected here. I am using the provider null 2.1.2. |
@roshanp85 no. |
Hey folks, just to confirm that rebooting with |
@mysticaltech as I explained above, this is a race, sometimes it works, sometimes it does not. We need a stable solution that always works. |
Thanks for the clarification @AndrewSav, so at least it seems to be winning that race most often than not. But maybe giving it some buffer time, to allow for the node to "calm down" would maximize those bets of winning. So adding a And IMHO, it should not be within the remote-exec provisioner, but be a different one, specifically built to handle that scenario and the remote behavior that results from it (it would be a lot simpler to achieve that way, as the intent and expected outcome would be clear). Ansible does it well, Terraform should already have a solution for that, it's long overdue! Of course, instances need to reboot, especially after upgrades! I wish I knew Golang, it should be pretty quick, just copying the remote-exec provisioner code and modifying it a little, I would imagine. |
Are there any updates on this? It feels like something that Terraform should have. |
Yes, not only that, it doesn't seem like sorcery to implement! |
@mysticaltech cool, I'm glad it looks easy for you, I hope to see your PR implementing it soon! |
@AndrewSav I'm like a true TF newbie, so probably not suited for this. But I am sure the folks at @hashicorp-cloud can make this happen in the blink of an eye! |
Bump |
Refer to this solved issue Bash script1,
Bash script2,
Bash script3,
With a normal way of running these provisioners remote-exec in aws instance, it will return To solve this, you should change settings in
This allows your system to keep pinging your ec2 instance until ssh is connected again after 120s x 720 times. Hope this works again. |
On my end, I circumvented the problem using this method: Issue a reboot command and wait for MicroOS to reboot and be ready
|
I posted on the Google Group and did not get any response. The gitter chat is full of questions and no answers.
In the absence of other avenue to get a question answered I'm positing it here.
How do I reboot-and-continue with terraform? In version 0.11.3 it was possible to issue the reboot command in the shell provisioner, and when the machine comes out of the reboot, the next provisioner in the file would re-connect and continue.
Since 0.11.4 this is no longer working. When machine goes to reboot the terraform will error out and provisioning would stop.
How is this supposed to work when set up correctly?
The text was updated successfully, but these errors were encountered: