Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-39835] - Be extra defensive about Errors and Exceptions #133

Merged
merged 1 commit into from
Dec 22, 2016

Conversation

jtnord
Copy link
Member

@jtnord jtnord commented Nov 17, 2016

JENKINS-39835 Be even more defensive then against leaving connections dangling.

NB: no testing performed of this fix - it is a direct port from the internal code that was submitted to make the JNLP v4 protocol.
@reviewbybees esp @stephenc @oleg-nenashev

[JENKINS-39835] Be even more defensive then against leaving connections dangling.
@ghost
Copy link

ghost commented Nov 17, 2016

This pull request originates from a CloudBees employee. At CloudBees, we require that all pull requests be reviewed by other CloudBees employees before we seek to have the change accepted. If you want to learn more about our process please see this explanation.

@stephenc
Copy link
Member

🐝

} finally {
// incase this was an OOMErr and logging caused another OOMErr
recvKey.cancel();
onRecvClosed();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this code should rethrow errors afterwards

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not doing it before the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not catching Errors before the change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it does that you could kill the thread / leave the system in a bad state.

@amuniz
Copy link
Member

amuniz commented Nov 18, 2016

🐝

Copy link
Member

@oleg-nenashev oleg-nenashev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not feel suppressing Errors is a good idea here. After closing the connection, we should try to rethrow it. 🐛

@oleg-nenashev oleg-nenashev changed the title Be extra defensive about Errors and Exceptions [JENKINS-39835] - Be extra defensive about Errors and Exceptions Nov 18, 2016
@oleg-nenashev
Copy link
Member

@jtnord Do you think we need a similar fix in stable-2.x?

@jtnord
Copy link
Member Author

jtnord commented Nov 18, 2016

@oleg-nenashev if you throw the error will likely kill the thread and you may not end up in the state you think you want to end up in.

I'm not comfortable re-throwing Errors, so if you would like that then I will leave it up to yourself, cloudbees oss-team or someone else to provide.

@jtnord
Copy link
Member Author

jtnord commented Nov 18, 2016

Do you think we need a similar fix in stable-2.x?

This does not contain the JNLP4v protocol (and hence this code) does it?

@stephenc
Copy link
Member

W.r.t. Killing the thread. Thinking about this more... You would only ever kill a worker thread, which should be ok as they are just assigned to this operation and the threadpool will replace if they die

@oleg-nenashev
Copy link
Member

@jtnord @stephenc Should we proceed with this PR?

@jtnord
Copy link
Member Author

jtnord commented Dec 16, 2016

Yes this is 100% needed - and I feel this is a regression in remoting and the new LTS will be picking this up so we are likely to see reports of this more.

Basically ops read is no longer registered and only adds it later on (when all data has been read), leaving hanging network. Has been observed in the wild on the pre OSS version of the code.

@oleg-nenashev
Copy link
Member

I still do not like the handling of some fatal errors, but I agree to merge it taking the potential impact into account.

@oleg-nenashev
Copy link
Member

BTW I also think that killing the worker thread is fine

@jtnord
Copy link
Member Author

jtnord commented Dec 19, 2016

@oleg-nenashev if the worker thread dies handling a read then whatever is connected is dead as far as OP_READ. as OP_READ is removed when we start the thread and only removed when we have read all of the data available in the callback (ie the thread that dies).

See line 147 in this block

Did you see my proposal on the Jenkins-dev list for handling uncaught exceptions :-)

@oleg-nenashev
Copy link
Member

@jtnord

Did you see my proposal on the Jenkins-dev list for handling uncaught exceptions

Show me the link

LogRecord record = new LogRecord(Level.SEVERE, "[{0}] Uncaught {1}");
record.setThrown(t);
record.setParameters(new Object[]{stack().name(), t.getClass().getSimpleName()});
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole block could be written more simply as

LOGGER.log(Level.SEVERE, "Uncaught error in " + stack().name(), t);

without any real loss.

record.setParameters(new Object[]{stack().name(), t.getClass().getSimpleName()});
}
} finally {
// incase this was an OOMErr and logging caused another OOMErr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easier and probably safer to set a boolean flag at the top of the method indicating whether it completed normally; wrap everything in a try-finally block that closes the channel if not.

@oleg-nenashev
Copy link
Member

OK, let's go forward with this PR as is.
@jtnord , my plan is to have remoting-3.2.1 with this fix only if @olivergondza agrees. Actually #138 may also deserve backporting due to the quite wide exposure of the issue. @olivergondza WDYT?

@olivergondza
Copy link
Member

I am ok with the code as it is - logging the errors is fine. It seems reasonable safe to have it backported into .2.

@oleg-nenashev
Copy link
Member

OK, going forward with the merge. Will be in this weekly if @kohsuke is available

@jeffret-b jeffret-b deleted the jtnord-patch-1 branch January 28, 2020 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants