Can no longer reboot and continue. #17844

AndrewSav · 2018-04-11T22:17:24Z

Hi there,
Thank you for opening an issue. Please note that we try to keep the Terraform issue tracker reserved for bug reports and feature requests. For general usage questions, please see: https://www.terraform.io/community.html.

I posted on the Google Group and did not get any response. The gitter chat is full of questions and no answers.

In the absence of other avenue to get a question answered I'm positing it here.

How do I reboot-and-continue with terraform? In version 0.11.3 it was possible to issue the reboot command in the shell provisioner, and when the machine comes out of the reboot, the next provisioner in the file would re-connect and continue.

Since 0.11.4 this is no longer working. When machine goes to reboot the terraform will error out and provisioning would stop.

How is this supposed to work when set up correctly?

apparentlymart · 2018-04-16T18:43:23Z

Hi @AndrewSav!

In Terraform 0.11.4 there was a change to try to make Terraform detect and report certain error conditions, rather than retrying indefinitely. Unfortunately this change was found to be a little too sensitive, so e.g. if sshd starts up before the authorized_keys file has been populated by cloud-init then Terraform would fail with an authentication error, rather than retrying. I think this may be the root cause of your problem here.

In 0.11.6 (#17744) this behavior was refined to treat authentication errors as retryable to support situations where sshd is running before credentials are fully populated. Could you try this with version 0.11.6 or later and see if that fixes the problem for you?

jwadolowski · 2018-04-17T14:47:02Z

I just encountered similar problem. Reboot needs to be triggered during initial setup of EC2 instance. To do that I'm using remote-exec inside null_resource:

resource "null_resource" "yum-update" {
  triggers {
    instance_id = "${aws_instance.webapp.id}"
  }

  connection = {
    type         = "ssh"
    user         = "${var.ssh_user}"
    host         = "${aws_instance.webapp.private_ip}"
    private_key  = "${file(var.ssh_key_path)}"
    bastion_host = "${var.ssh_use_bastion == true ? var.ssh_bastion_host : ""}"
  }

  provisioner "remote-exec" {
    inline = [
      "sudo yum update -y",
      "sudo reboot",
    ]
  }

  depends_on = [
    "aws_volume_attachment.webapp-ebs-att",
  ]
}

Terraform fails with the following message:

Error: Error applying plan:

1 error(s) occurred:

* module.xyz.null_resource.yum-update: error executing "/tmp/terraform_1226926016.sh": wait: remote command exited without exit status or exit signal

It's definitely not related to authorized_keys race condition, as yum update -y got executed without issues. Exactly the same code was working just fine with previous Terraform versions.

Terraform version:

$ terraform -v
Terraform v0.11.7
+ provider.aws v1.14.1
+ provider.null v1.0.0
+ provider.template v1.0.0

apparentlymart · 2018-04-17T18:41:24Z

It looks like in cfa299d we upgraded our vendored version of the Go SSH library to a newer version that added that error message, but that went out in v0.8.5 (over a year ago) and so cannot be the culprit for a recently-introduced issue.

The error seems to indicate that the SSH server closed the connection without reporting the result of the command, as described in RFC 4254 section 6.10, which I suppose could make sense if the sshd process were killed before reboot returned. I assume that prior to Terraform v0.11.4 this error was still occurring but being silently ignored.

The tricky thing here is that arguably the new behavior is more correct since the SSH execution is failing (it's not completing fully) and so therefore Terraform should not proceed and assume the instance is fully provisioned in this case... there are other reasons why the connection might be shut down that would not be safe to continue.

Perhaps we can make a compromise here and add an option to the provisioner to treat this particular situation as a success, for situations where either the SSH server is being restarted or the system itself is being shut down. I'm not sure what is the best way to describe that situation to make an intuitive option, though: allow_missing_exit_status is the most directly descriptive, but doesn't really get at the intent so if we went with that option I suppose configuration authors would need to annotate it with a comment explaining why:

  provisioner "remote-exec" {
    inline = [
      "sudo yum update -y",
      "sudo reboot",
    ]

    # sshd may exit before "sudo reboot" completes, preventing it from
    # returning the script's exit status.
    allow_missing_exit_status = true
  }

lamont · 2018-04-17T19:05:27Z

Adding a allow_missing_exit_status = true feature would work for me. I'm perfectly prepared to admit that rebooting during a provisioning is weird and call it out with a flag and a comment. As it is now, I'm falling back to tf 0.11.3 to keep working cause some of my fleet depend on the reboot before the next provisioner can continue. Thanks for looking at it.

apparentlymart · 2018-04-17T19:27:57Z

The Terraform team at HashiCorp won't be able to work on this in the near future due to our focus being elsewhere, but we'd be happy to review a pull request if someone has the time and motivation to implement it.

Otherwise, we should be able to take a look at it once we've completed some other work in progress on the configuration language, which is likely to be at least a few months away.

I'm sorry for this unintended change in behavior. As an alternative to staying on 0.11.3, it might be possible to arrange for a necessary reboot operation to happen asynchronously so that the provisioner is able to complete successfully before it begins. For example, perhaps using the shutdown command with a non-now time would do the trick. If there are other subsequent provisioning steps it may be necessary to take some additional steps to ensure that the next provisioner won't connect before the reboot begins, such as revoking the authorized SSH key with some mechanism to re-install it after the reboot has completed.

AndrewSav · 2018-04-18T07:04:18Z

@apparentlymart apologies, I'm on holiday until 26th of April and don't have access to the required infrastructure to test this until then. I'll make sure to test and report back when I've returned from holiday.

haxorof · 2018-04-18T21:39:22Z

I also encountered this problem when I wanted to trigger a reboot in a null_resource. For me it helped to just add & so now it looks like this for me:

provisioner "remote-exec" {
  inline = [
    "sudo reboot &",
  ]
}

Not completely verified that it works all the time but so far it has.

AndrewSav · 2018-04-21T21:37:35Z

@haxorof probably depends on flavor of linux. For what ever reason it did not work from terrafrorm with rancherOS for me (did not cause a reboot). Although from command line it of course works. So I still think it's affected by terraform interaction.

apparentlymart · 2018-04-23T16:33:24Z

I think the & solution for backgrounding might be a little tricky because the sudo process still remains attached to the shell while it's running and so sshd shutting down may also send a signal to sudo, and thus in turn to reboot, and so kill it before it gets a chance to compete.

My thought about using the shutdown command above is that it's implemented in a way where the actual shutdown is managed by a background process, and so the shutdown command completes immediately, allowing the shell to exit before the shutdown begins. In the case of a systemd system, for example, I believe (IIRC) that a timed shutdown is handled by sending a message to logind, which then itself coordinates the shutdown. Since logind is a system daemon, it is not connected to your SSH session.

lamont · 2018-04-24T18:18:55Z

Just a followup, we implemented the suggestion from @haxorof (reboot &) and it's worked perfectly on ubuntu 16.04 so far. I was going to use shutdown -r +1 plus a local-exec sleep 60 but was bummed that I'd be adding a minute to every instance creation. If I could pass a sub-minute timeout to shutdown I'd have done that, but till then I'll keep with the backgrounded reboot till we run into issues with it.

haxorof · 2018-04-24T21:19:19Z

@AndrewSav : Yes you are right. I tested on an Ubuntu 17.10 and now tried it on a FreeBSD. It seems that the reboot & workaround does not work with the FreeBSD version I tried.

jbardin · 2018-04-26T19:41:54Z

Rather than using a time argument to shutdown, you could delay the reboot in a subshell.

(sleep 2 && reboot)&

andrewsav-bt · 2018-05-03T23:29:40Z

@apparentlymart do you think a "remote-reboot" provisioner is appropriate?
@jbardin - wow, thank you so much! That actually worked for me! I'm guessing in the presence of a workable workaround this is a less of an issue now.

Guys would you like me to close the issue?

jbardin · 2018-05-04T00:22:51Z

I think since this seems to be a common enough issue for users, we should consider making it part of the provisioner itself. I don't think we need another provisioner altogether, since this is just a special case of remote-exec. Having a special field like shutdown_command would be fairly easy to add, and that command could just ignore a connection failure after execution.

pasikarkkainen · 2018-05-21T15:12:55Z

Hitting this issue aswell. For a temporary workaround this seems to work for me (as mentioned earlier by others):

(sleep 5 && reboot)&

karl-barbour · 2018-08-23T14:25:12Z

The above background reboots don't appear to be working for me on Ubuntu 18-04.

Any news on this as a provisioner feature, similar to Packer's windows restart? https://www.packer.io/docs/provisioners/windows-restart.html

EDIT:

Using the following workaround (a local-exec provisioner)

  provisioner "local-exec" {
    command = "ssh -o 'StrictHostKeyChecking no' -i ${var.pem_file_path} root@${digitalocean_droplet.web.ipv4_address} '(sleep 2; reboot)&'"
  }

GMZwinge · 2018-09-19T21:12:32Z

A similar issue exists on Windows with WinRM. A workaround that works for us is a remote-exec provisioner like this:

  provisioner "remote-exec" {
    inline = [
      "shutdown /r /t 5",
      "net stop WinRM",
    ]
    ...
  }

The first command schedules the reboot a few seconds later. It avoids the shutdown to sometime kill thenet stop WinRM. The second command makes sure that the next provisioner doesn't connect, while the machine is shutting down, and then fail. This can happen sometime even without a shutdown delay: shutdown /r /t 0. A separate remote-exec provisioner ensures that the output of the previous remote-exec provisioner is flushed.

chakatz · 2018-12-12T17:33:59Z

This did not work for me with remote-exec:
"(sleep 2 && sudo reboot)&",
It didn't cause an error but it also didn't actually do a reboot.

So instead I tried this and it is working fine and would of course work for any OS.

  provisioner "local-exec" {
    command = "aws ec2 reboot-instances --instance-ids ${self.id}"
  }

mohamaa · 2018-12-13T17:59:39Z

@chakatz Nice workaround, Though Terraform should be working for any reboot in between the terraform run.
I am using v0.11.10 now, still the same issue.

Terraform Please provide a solution to it at the earliest.

frafra · 2018-12-15T19:08:42Z

Alternative workaround: shutdown -r +0

AndrewSav · 2018-12-15T21:55:09Z

@frafra did you try it yourself? Because that's exactly what's not working.

frafra · 2018-12-15T22:34:26Z

@AndrewSav yes, sure, but this is a different syntax, and it works just fine for me, while with reboot, systemctl reboot and (sleep 3 && reboot) & do not. shutdown -r +0 still exits before restarting, so Terraform does not halts.

Here is my script: https://github.com/frafra/fedora-atomic-hetzner/blob/master/fedora-atomic-hetzner.tf

djoos · 2019-01-09T17:17:58Z

Hi all,

after some frustration, it seems I'm able to run with Terraform 0.11.11, but it definitely feels hacky though; having 1 null_resource with 3 provisioners (FYI: Windows instance provisioning):

provisioner "chef"  {
  # handles pre-reboot config mngmt; completes cleanly; schedules a delayed reboot
}

# see https://github.com/hashicorp/terraform/issues/17844#issuecomment-422960337 (above)
# `[remote-exec]: error during provision, continue requested` (see "on_failure" below)
provisioner "remote-exec" {
  inline = [
    "shutdown /r /f /t 5 /c "forced reboot",
    "net stop WinRM"
  ]
  # Terraform > v0.11.3 will fail if the provisioner doesn't report the exit status, but here we'll explicitly allow failure
  on_failure = "continue"
}

provisioner "chef"  {
  # handles post-reboot config mngmt
}

More advanced testing still in progress, but initial tests seem fine...

I guess in an ideal scenario, I'd like the Chef run to exit with code 35 or 37, but then the Terraform Chef provisioner to allow that to happen, reconnect and then pick up and complete the provisioning.
Perhaps not dissimilar to kitchen, using retry_on_exit_code (an array of exit codes to indicate that kitchen should retry the converge command) and max_retries (number of times to retry the converge before passing along the failed status).

Happy to get stuck in with a few more pointers on the Terraform internals - thanks in advance for your feedback!

Changing 'sudo reboot' to 'sudo shutdown -r +0' to address exit status issue encountered after Terraform 0.11.3, see hashicorp/terraform#17844

palfaiate · 2019-04-02T19:24:32Z

  provisioner "remote-exec" {
    when = "create"

    inline = [
      "sudo shutdown -r +60",
      "echo 0",
    ]
  }

ghost · 2019-04-30T11:26:20Z

If anyone is fighting with this on Linux (connection actively refused error) I've written a little PowerShell/Bash combo that should cover Terraform running on both Windows and Linux: https://gist.github.com/janoszen/9df88ba0b906af1c18c0812a7128af7a

andrewsav-bt · 2019-06-12T21:43:14Z

@frafra hm... there is no mention of shutdown in that script you linked.

Provisioner waits for exit code from shutdown command and fails because reboot is performed too fast. Fixes bsc#1135937 Upstream issue: hashicorp/terraform#17844

frafra · 2019-06-13T09:44:06Z

@frafra hm... there is no mention of shutdown in that script you linked.

I moved the commands in a shell scripts that gets executed by TF; it is in the same repository :-)

invidian · 2019-06-13T10:14:15Z

Shameless plug here, but maybe it actually helps someone to get reasonable workaround for this issue. I created TF provider, which is able to execute the comment, but ignore the result for that purpose and I don't have any problems with reboots now. The configuration is limited, but can be easily extended. Also Windows is not supported.

https://github.com/invidian/terraform-provider-sshcommand

brunotm · 2019-07-23T14:51:38Z

I have a done a quick implementation of the allow_missing_exit_status at #22180 as described by @apparentlymart to handle this case, tested on both linux and windows systems.

I'm not totally sold on this or something more general as "ignore_errors" that would allow more use cases and weird stuff.

AndrewSav · 2019-07-25T19:18:34Z

Today got panic in terraform on VM reboot (Terraform version: 0.12.3)

2019-07-26T07:06:02.722+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [ERROR] scp stderr: "Sink: C0644 32 terraform_1671735816.sh\n"
2019-07-26T07:06:02.722+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] opening new ssh session
2019-07-26T07:06:02.725+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] starting remote command: chmod 0777 /tmp/terraform_1671735816.sh
2019-07-26T07:06:02.731+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] remote command exited with '0': chmod 0777 /tmp/terraform_1671735816.sh
2019-07-26T07:06:02.732+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] opening new ssh session
2019-07-26T07:06:02.734+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] starting remote command: /tmp/terraform_1671735816.sh
2019-07-26T07:06:02.759+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] remote command exited with '0': /tmp/terraform_1671735816.sh
2019-07-26T07:06:02.760+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] opening new ssh session
2019-07-26T07:06:02.760+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] Starting remote scp process:  scp -vt /tmp
2019-07-26T07:06:02.763+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] Started SCP session, beginning transfers...
2019-07-26T07:06:02.763+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] Copying input data into temporary file so we can read the length
2019-07-26T07:06:02.764+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] Beginning file upload...
2019-07-26T07:06:02.768+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] SCP session complete, closing stdin pipe.
2019-07-26T07:06:02.768+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] Waiting for SSH session to complete.
2019-07-26T07:06:02.769+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [ERROR] scp stderr: "Sink: C0644 0 terraform_1671735816.sh\n"
2019/07/26 07:06:02 [TRACE] EvalApplyProvisioners: provisioning module.node.vsphere_virtual_machine.machine with "remote-exec"
2019/07/26 07:06:02 [TRACE] GetResourceInstance: vsphere_virtual_machine.machine is a single instance
2019-07-26T07:06:02.771+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] connecting to TCP connection for SSH
2019-07-26T07:06:02.772+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] handshaking with SSH
2019-07-26T07:06:02.849+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] starting ssh KeepAlives
2019-07-26T07:06:02.849+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:02 [DEBUG] opening new ssh session
2019-07-26T07:06:03.137+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:03 [WARN] ssh session open error: 'ssh: unexpected packet in response to channel open: <nil>', attempting reconnect
2019-07-26T07:06:03.137+1200 [DEBUG] plugin.terraform.exe: remote-exec-provisioner (internal) 2019/07/26 07:06:03 [DEBUG] connecting to TCP connection for SSH
2019-07-26T07:06:04.853+1200 [DEBUG] plugin.terraform.exe: panic: runtime error: invalid memory address or nil pointer dereference
2019-07-26T07:06:04.853+1200 [DEBUG] plugin.terraform.exe: [signal 0xc0000005 code=0x0 addr=0x0 pc=0x17a8b7c]
2019-07-26T07:06:04.853+1200 [DEBUG] plugin.terraform.exe: 
2019-07-26T07:06:04.853+1200 [DEBUG] plugin.terraform.exe: goroutine 258 [running]:
2019-07-26T07:06:04.854+1200 [DEBUG] plugin.terraform.exe: github.com/hashicorp/terraform/communicator/ssh.(*Communicator).Connect.func1(0xc000180b40, 0x223fe40, 0xc000519300)
2019-07-26T07:06:04.854+1200 [DEBUG] plugin.terraform.exe: 	/opt/teamcity-agent/work/9e329aa031982669/src/github.com/hashicorp/terraform/communicator/ssh/communicator.go:235 +0x12c
2019-07-26T07:06:04.854+1200 [DEBUG] plugin.terraform.exe: created by github.com/hashicorp/terraform/communicator/ssh.(*Communicator).Connect
2019-07-26T07:06:04.854+1200 [DEBUG] plugin.terraform.exe: 	/opt/teamcity-agent/work/9e329aa031982669/src/github.com/hashicorp/terraform/communicator/ssh/communicator.go:227 +0x519
2019/07/26 07:06:04 [WARN] Errors while provisioning vsphere_virtual_machine.machine with "remote-exec", so aborting
2019/07/26 07:06:04 [TRACE] EvalApplyProvisioners: module.node.vsphere_virtual_machine.machine provisioning failed, but we will continue anyway at the caller's request
2019/07/26 07:06:04 [TRACE] module.node: eval: *terraform.EvalMaybeTainted
2019/07/26 07:06:04 [TRACE] EvalMaybeTainted: module.node.vsphere_virtual_machine.machine encountered an error during creation, so it is now marked as tainted
2019/07/26 07:06:04 [TRACE] module.node: eval: *terraform.EvalWriteState
2019/07/26 07:06:04 [TRACE] EvalWriteState: writing current state object for module.node.vsphere_virtual_machine.machine
2019/07/26 07:06:04 [TRACE] module.node: eval: *terraform.EvalIf
2019/07/26 07:06:04 [TRACE] module.node: eval: *terraform.EvalIf
2019/07/26 07:06:04 [TRACE] module.node: eval: *terraform.EvalWriteDiff
2019/07/26 07:06:04 [TRACE] module.node: eval: *terraform.EvalApplyPost
2019/07/26 07:06:04 [ERROR] module.node: eval: *terraform.EvalApplyPost, err: 1 error occurred:
	* rpc error: code = Unavailable desc = transport is closing

2019/07/26 07:06:04 [ERROR] module.node: eval: *terraform.EvalSequence, err: rpc error: code = Unavailable desc = transport is closing
2019/07/26 07:06:04 [TRACE] [walkApply] Exiting eval tree: module.node.vsphere_virtual_machine.machine
2019/07/26 07:06:04 [TRACE] vertex "module.node.vsphere_virtual_machine.machine": visit complete
2019/07/26 07:06:04 [TRACE] dag/walk: upstream of "provisioner.file (close)" errored, so skipping
2019/07/26 07:06:04 [TRACE] dag/walk: upstream of "meta.count-boundary (EachMode fixup)" errored, so skipping
2019/07/26 07:06:04 [TRACE] dag/walk: upstream of "provider.vsphere (close)" errored, so skipping
2019/07/26 07:06:04 [TRACE] dag/walk: upstream of "provisioner.remote-exec (close)" errored, so skipping
2019/07/26 07:06:04 [TRACE] dag/walk: upstream of "root" errored, so skipping
2019/07/26 07:06:04 [TRACE] statemgr.Filesystem: reading latest snapshot from terraform.tfstate
2019/07/26 07:06:04 [TRACE] statemgr.Filesystem: snapshot file has nil snapshot, but that's okay
2019/07/26 07:06:04 [TRACE] statemgr.Filesystem: read nil snapshot
2019/07/26 07:06:04 [TRACE] statemgr.Filesystem: no original state snapshot to back up
2019/07/26 07:06:04 [TRACE] statemgr.Filesystem: state has changed since last snapshot, so incrementing serial to 1
2019/07/26 07:06:04 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate
2019/07/26 07:06:04 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
2019/07/26 07:06:04 [TRACE] statemgr.Filesystem: unlocked by closing terraform.tfstate
2019-07-26T07:06:04.870+1200 [DEBUG] plugin: plugin process exited: path=C:\Users\asavinykh\scoop\apps\terraform\current\terraform.exe pid=20112 error="exit status 2"
2019-07-26T07:06:04.870+1200 [DEBUG] plugin: plugin exited
2019-07-26T07:06:04.887+1200 [DEBUG] plugin: plugin process exited: path=C:\Users\asavinykh\scoop\apps\terraform\current\terraform.exe pid=25320
2019-07-26T07:06:04.887+1200 [DEBUG] plugin: plugin process exited: path=C:\Users\asavinykh\scoop\apps\terraform\current\terraform.exe pid=24520
2019-07-26T07:06:04.887+1200 [DEBUG] plugin: plugin exited
2019-07-26T07:06:04.887+1200 [DEBUG] plugin: plugin exited
2019-07-26T07:06:04.889+1200 [DEBUG] plugin: plugin process exited: path=C:\Users\asavinykh\scoop\apps\terraform\current\terraform.exe pid=19932
2019-07-26T07:06:04.889+1200 [DEBUG] plugin: plugin exited
2019-07-26T07:06:04.891+1200 [DEBUG] plugin: plugin process exited: path=C:\Users\asavinykh\scoop\apps\terraform\current\terraform.exe pid=18572
2019-07-26T07:06:04.891+1200 [DEBUG] plugin: plugin exited
2019-07-26T07:06:04.892+1200 [DEBUG] plugin: plugin process exited: path=C:\Users\asavinykh\scoop\apps\terraform\current\terraform.exe pid=16888
2019-07-26T07:06:04.892+1200 [DEBUG] plugin: plugin exited
2019-07-26T07:06:04.893+1200 [DEBUG] plugin: plugin process exited: path=E:\Sources\docker_ops\terraform\instances\t-ap-test-01\.terraform\plugins\windows_amd64\terraform-provider-vsphere_v1.12.0_x4.exe pid=27036
2019-07-26T07:06:04.893+1200 [DEBUG] plugin: plugin exited

brunotm · 2019-07-25T21:02:18Z

Today got panic in terraform on VM reboot (Terraform version: 0.12.3)

@AndrewSav it does not look related, but could you try the the change on #22180 ?

frafra · 2019-10-09T18:05:19Z

New proposed solution:

  provisioner "remote-exec" {
    inline = [ "reboot" ]
    on_failure = "continue"
    connection { host = self.ipv4_address }
  }

Provisioner waits for exit code from shutdown command and fails because reboot is performed too fast. Fixes bsc#1135937 Upstream issue: hashicorp/terraform#17844

AndrewSav · 2020-04-30T21:11:01Z

@frafra for what it's worth, I'm still getting connection errors intermittently even with on_failure = "continue" with next provisioned not being able to execute.

moqmar · 2020-06-01T12:58:59Z

I found systemctl reboot to work fine, while reboot throws an error.

AndrewSav · 2020-06-01T21:20:57Z

The problem is that it's a race. So you change something, timing slightly changes and it works once and you think you fixed it, but it intermittently keeps failing.

roshanp85 · 2020-08-17T23:54:33Z

allow_missing_exit_status

Is this available for terraform 0.12.24 ? I am running into issue : An argument named "allow_missing_exit_status" is not expected here. I am using the provider null 2.1.2.

AndrewSav · 2020-08-18T00:12:30Z

@roshanp85 no.

mysticaltech · 2021-07-10T01:11:47Z

Hey folks, just to confirm that rebooting with shutdown -r +0 at the end of a remote-exec bloc, does work! Look no further, that is your solution! Thanks again @frafra 🙏

AndrewSav · 2021-07-10T07:48:37Z

@mysticaltech as I explained above, this is a race, sometimes it works, sometimes it does not. We need a stable solution that always works.

mysticaltech · 2021-07-10T09:00:56Z

Thanks for the clarification @AndrewSav, so at least it seems to be winning that race most often than not. But maybe giving it some buffer time, to allow for the node to "calm down" would maximize those bets of winning. So adding a sleep 10 before for instance. Either way, as you said, we need a stable solution.

And IMHO, it should not be within the remote-exec provisioner, but be a different one, specifically built to handle that scenario and the remote behavior that results from it (it would be a lot simpler to achieve that way, as the intent and expected outcome would be clear).

Ansible does it well, Terraform should already have a solution for that, it's long overdue! Of course, instances need to reboot, especially after upgrades! I wish I knew Golang, it should be pretty quick, just copying the remote-exec provisioner code and modifying it a little, I would imagine.

MJSanfelippo · 2021-08-26T14:37:26Z

Are there any updates on this? It feels like something that Terraform should have.

mysticaltech · 2021-08-31T04:15:47Z

Yes, not only that, it doesn't seem like sorcery to implement!

AndrewSav · 2021-08-31T20:51:32Z

@mysticaltech cool, I'm glad it looks easy for you, I hope to see your PR implementing it soon!

mysticaltech · 2021-08-31T22:28:13Z

@AndrewSav I'm like a true TF newbie, so probably not suited for this. But I am sure the folks at @hashicorp-cloud can make this happen in the blink of an eye!

tomchomiak · 2022-01-18T05:34:19Z

Bump

zcemycl · 2022-05-24T10:13:41Z

Refer to this solved issue https://github.com/hashicorp/terraform/issues/18517#issue-343471291, you have to change your ssh settings to allow that. For example, there are 3 bash scripts, 2 reboots in-between,

Bash script1,

sudo apt update
sudo apt -y upgrade
do_something()
sudo shutdown -r now

Bash script2,

do_something()
sudo shutdown -r now

Bash script3,

do_something()

With a normal way of running these provisioners remote-exec in aws instance, it will return wait: remote command exited without exit status or exit signal.

To solve this, you should change settings in /etc/ssh/sshd_config,

...
ClientAliveInterval 120
ClientAliveCountMax 720
...

This allows your system to keep pinging your ec2 instance until ssh is connected again after 120s x 720 times.

Hope this works again.

mysticaltech · 2022-05-24T10:19:50Z

On my end, I circumvented the problem using this method:

Issue a reboot command and wait for MicroOS to reboot and be ready

  provisioner "local-exec" {
    command = <<-EOT
      ssh ${local.ssh_args} root@${self.ipv4_address} '(sleep 2; reboot)&'; sleep 3
      until ssh ${local.ssh_args} -o ConnectTimeout=2 root@${self.ipv4_address} true 2> /dev/null
      do
        echo "Waiting for OS to reboot and become available..."
        sleep 3
      done
    EOT
  }

apparentlymart added enhancement provisioner/remote-exec labels Jun 16, 2018

pjskennedy mentioned this issue Jun 25, 2018

Fix exit continue terraform issue, also did style things pjskennedy/terravpn#1

Merged

kravciak mentioned this issue May 22, 2019

ci: Don't wait for exit code from reboot command SUSE/skuba#247

Merged

brunotm mentioned this issue Jul 23, 2019

Add allow_missing_exit_status to remote-exec provisioner #22180

Open

Raviadonis mentioned this issue Jul 29, 2019

Provisioner remote-exec power-shell restart and continue #22233

Closed

hashibot mentioned this issue Jan 28, 2020

WinRM Reconnection Logic? #23950

Closed

marissaeinhorn mentioned this issue Nov 17, 2021

Terraform deploy Fundamentals enclave dev4vater/vater#67

Closed

5 tasks

Can no longer reboot and continue. #17844

Can no longer reboot and continue. #17844

Comments

AndrewSav commented Apr 11, 2018 • edited Loading

apparentlymart commented Apr 16, 2018

jwadolowski commented Apr 17, 2018 • edited Loading

apparentlymart commented Apr 17, 2018

lamont commented Apr 17, 2018

apparentlymart commented Apr 17, 2018

AndrewSav commented Apr 18, 2018

haxorof commented Apr 18, 2018

AndrewSav commented Apr 21, 2018

apparentlymart commented Apr 23, 2018

lamont commented Apr 24, 2018

haxorof commented Apr 24, 2018

jbardin commented Apr 26, 2018

andrewsav-bt commented May 3, 2018 • edited Loading

jbardin commented May 4, 2018

pasikarkkainen commented May 21, 2018

karl-barbour commented Aug 23, 2018 • edited Loading

GMZwinge commented Sep 19, 2018

chakatz commented Dec 12, 2018

mohamaa commented Dec 13, 2018

frafra commented Dec 15, 2018

AndrewSav commented Dec 15, 2018

frafra commented Dec 15, 2018 • edited Loading

djoos commented Jan 9, 2019 • edited Loading

palfaiate commented Apr 2, 2019

ghost commented Apr 30, 2019

andrewsav-bt commented Jun 12, 2019

frafra commented Jun 13, 2019 • edited Loading

invidian commented Jun 13, 2019

brunotm commented Jul 23, 2019

AndrewSav commented Jul 25, 2019

brunotm commented Jul 25, 2019

frafra commented Oct 9, 2019

AndrewSav commented Apr 30, 2020

moqmar commented Jun 1, 2020

AndrewSav commented Jun 1, 2020

roshanp85 commented Aug 17, 2020

AndrewSav commented Aug 18, 2020

mysticaltech commented Jul 10, 2021

AndrewSav commented Jul 10, 2021

mysticaltech commented Jul 10, 2021 • edited Loading

MJSanfelippo commented Aug 26, 2021

mysticaltech commented Aug 31, 2021

AndrewSav commented Aug 31, 2021

mysticaltech commented Aug 31, 2021

tomchomiak commented Jan 18, 2022

zcemycl commented May 24, 2022

mysticaltech commented May 24, 2022

Issue a reboot command and wait for MicroOS to reboot and be ready

AndrewSav commented Apr 11, 2018 •

edited

Loading

jwadolowski commented Apr 17, 2018 •

edited

Loading

andrewsav-bt commented May 3, 2018 •

edited

Loading

karl-barbour commented Aug 23, 2018 •

edited

Loading

frafra commented Dec 15, 2018 •

edited

Loading

djoos commented Jan 9, 2019 •

edited

Loading

frafra commented Jun 13, 2019 •

edited

Loading

mysticaltech commented Jul 10, 2021 •

edited

Loading