-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait for SSH Connectivity before delegating to SSH Launcher. #150
Conversation
@@ -57,8 +56,6 @@ private static SSHLauncher getSSHLauncher(InspectContainerResponse detail, Docke | |||
String host = hostUrl.getHost(); | |||
|
|||
LOGGER.log(Level.INFO, "Creating slave SSH launcher for " + host + ":" + port); | |||
|
|||
PortUtils.waitForPort(host, port); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear why this is any better than what's already there :
PortUtils.waitForPort is already attempting to wait for the remote SSH to be available before continuing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Our jenkins server and docker server is on the same box. As soon as the docker container is started, the container ssh port forward on 127.0.0.1, and the port, is already up. But the SSH daemon may not have finished starting up. When I tested this scenario, PortUtils.waitForPort does not block, since the port is already up. It only does a port is open check. The new change tries to do an ssh connection, which fails if the SSH daemon is not ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also had issues that jenkins slave failed to connect to ssh. I think it was because of ssh wasn't started. Then slave and image waited timeouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So did you get the slave node showing as offline/suspended, and when you look at the slave log, you get a connection terminated on Ssh Slave plugin 1.6, or IllegalStateException on Ssh Slave plugin 1.9?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had no chance to collect info, but sounds like your described. I also tried press manually launch slave and it connected successfully, but jobs and slave state wasn't provisioned further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO ssh connection itself should be more responsive and should have timeout and retry that we should be able to handle
It's true that SSH seems to be a PITA! I think that, by rights, SSHLauncher ought to be dealing with waiting for SSH ports to become available / responsive -- it isn't currently but might one day as I can't believe docker-cloud is the only plugin to have this as an issue. I think it'd be the best way to 'hide' the waitForPort (which seems to work for many people) and 'ssh actually responding' bits in a DelegatingComputerLauncher (maybe rename RetryingComputerLauncher to something like WaitForSSHComputerLauncher) |
As temp solution i thought about some delay before opening connection after port becomes available. But i think somebody should check with ssh-slaves devs how we can handle this retries or why they doesn't work now. |
I did explore upgrading to the latest ssh-slave and delegating the retry logic to it. See #149. However, it didn't fix the issue I was experiencing because of it only retries if the ssh port is closed. |
I checked in the refactored version. |
Closing...The fix for this issue was incorporated into #230 |
No description provided.