-
-
Notifications
You must be signed in to change notification settings - Fork 272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-46515] exit the Launcher process on 4XX errors #193
[JENKINS-46515] exit the Launcher process on 4XX errors #193
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns about this PR, but I need to do a deeper review of the code. It will be delayed a bit due to the conferences.
@jenkinsci/code-reviewers , some feedback would be really useful
@@ -539,6 +543,11 @@ private SSLSocketFactory getSSLSocketFactory() | |||
throw x; | |||
} else | |||
throw e; | |||
} catch (FileNotFoundException e) { | |||
System.err.println("Failing to obtain "+slaveJnlpURL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there is direct usage of System I/O below as well :( Maybe needs refactoring (not in this PR)
System.err.println("Failing to obtain "+slaveJnlpURL); | ||
e.printStackTrace(System.err); | ||
System.err.println("Will silently exit without errors"); | ||
System.exit(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that Exiting without errors && with the zero error code is fine here To be reviewed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea after this zero exit code is to take advantage of KostyaSha/yet-another-docker-plugin#186: Defining a docker template with a restart policy (on-failure) would keep the agent alive while it is registered at the Jenknins master.
Once (If ever) it gets de-registered, this modified code would stop retrying and let the container die (wont be deleted)
Nevertheless, if anyone finds an objection, the exit code could be setup via a new flag. I would keep in mind that this PR tries to cover a case where agents are being erased after a Jenkins restart, while the agent itself is/will be back alive (leaving zombie agents trying to reconnect). As the real issue is almost unreproducible, this PR is about a defensive behaviour.
I share Oleg's concern here. It is weird to exit without warnings and with a zero exit code. This may help in some limited situations, but it introduces unusual and dramatic behavior in all situations. As this is intended to introduce a defensive behavior for certain, hard-to-reproduce situations the proposed fix seems too broad. I would need to understand the situations and how this proposal resolves them better before approving this approach. I would want to see about alternatives. |
Forgot about this PR... @jeffret-b: true, it is weird exiting with a zero exit code but the current situation involves keeping forever a remoting connection attempt if the Computer at the Jenkins master has been erased. Terminating the connection reattempts on a 404 seems safe to me (disregarding the exit status value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm inclined to accept this PR. It looks like it may help in the specific scenario intended and probably won't harm in others.
I'd like to get a second approval before proceeding, though. @oleg-nenashev or @jvz, what do you think?
@oleg-nenashev @jvz What do you think? |
No strong opinion from me on this. I'd probably be more at ease with a feature flag added. |
noting that this introduced a regression in a cloud plugin. |
Reverting in #328, with the intention of creation Remoting 3.31 without it. |
while keep trying connecting on 5XX.
manually tested via deletion of a nodes/agentId folder at a jenkins installation and restart of the Jenkins service: the JNLP process stops.