-
-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
waiting for the machine to finish rebooting does not work when the machine reboots too quickly #856
Comments
I can trigger this race condition reliably by inserting a def reboot_sync(self, hard=False):
"""Reboot this machine and wait until it's up again."""
self.reboot(hard=hard)
# SLEEP INSERTED HERE
import time
time.sleep(20)
self.log_start("waiting for the machine to finish rebooting...")
nixops.util.wait_for_tcp_port(self.get_ssh_name(), self.ssh_port, open=False, callback=lambda: self.log_continue(".")) So indeed, when the race condition triggers in my issue, the machine has already rebooted by the time we call |
nh2
added a commit
to nh2/nixops
that referenced
this issue
Jan 27, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
Feb 3, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
Feb 3, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
Apr 17, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
May 8, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
May 26, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
May 26, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
May 26, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
May 26, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
Jun 28, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
Jul 2, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
nh2
added a commit
to nh2/nixops
that referenced
this issue
Oct 28, 2018
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
tolbrino
pushed a commit
to tolbrino/nixops
that referenced
this issue
Jun 25, 2020
…OS#856. The old approach, waiting for the machine to not having an open port, and then waiting for it to be open again, was insufficient, because of the race condition that the machine rebooted so quickly that the port was immediately open again without nixops noticing that it went down. I experienced this on a Hetzner cloud server. The new approach checks the `last reboot` on the remote side to change, which is not racy.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Using
nixops deploy --force-reboot
.Good case:
Bad case:
Note the
packet_write_wait
in the above.The machine is perfectly up and running in that case; running
nixops ssh
at this point works.I suspect there is a race condititon that if the machine shuts down so quickly that
packet_write_wait
appears beforewaiting for the machine to finish rebooting
appears, then the reboot detection doesn't work.The text was updated successfully, but these errors were encountered: