Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of idle timeouts when the handling itself is idle #12301

Merged
merged 10 commits into from
Oct 9, 2024

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Sep 24, 2024

Currently if a handler is idle (or blocked) when an idle timeout occurs, it is essentially a noop, unless the handler has an idle timeout listener, or subsequently reads from the request.
This change ensures that all subsequent writes will also fail.

@gregw gregw requested review from sbordet and lorban September 24, 2024 04:04
@gregw gregw added the Sponsored This issue affects a user with a commercial support agreement label Sep 24, 2024
Copy link
Contributor

@lorban lorban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lacks a test to make sure we always record the failure, but this looks like a proper fix for a genuine bug.

@sbordet
Copy link
Contributor

sbordet commented Sep 25, 2024

This change ensures that all subsequent writes will also fail.

I don't think this would work, because we need to be able to write a response if an idle timeout or a failure happens.

What is the reason behind this change?

@gregw gregw requested a review from lorban October 2, 2024 02:19
@gregw gregw marked this pull request as ready for review October 2, 2024 02:19
@gregw gregw requested a review from lorban October 2, 2024 20:43
lorban
lorban previously approved these changes Oct 3, 2024
Copy link
Contributor

@lorban lorban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the new method's name.

lorban
lorban previously approved these changes Oct 8, 2024
Copy link
Contributor

@lorban lorban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but @sbordet had concerns.

@@ -1161,17 +1159,17 @@ private boolean lockedIsWriting()
return _writeCallback != null;
}

private Runnable lockedFailWrite(Throwable x)
private Runnable lockedFailWrite(Throwable x, boolean alwaysFailFutureWrites)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameter is always false, so remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced that we should never do this. But I will remove for now.

@sbordet
Copy link
Contributor

sbordet commented Oct 8, 2024

@gregw so with the removal of the parameter alwaysFailFutureWrites (always false), then writes are not failed.

I don't think failing writes is correct, because we want to be able to write an error response in case of failures or idle timeouts.

@gregw
Copy link
Contributor Author

gregw commented Oct 8, 2024

@sbordet

@gregw so with the removal of the parameter alwaysFailFutureWrites (always false), then writes are not failed.

I don't think failing writes is correct, because we want to be able to write an error response in case of failures or idle timeouts.

I'm not understanding your comment? This PR is no longer failing writes after a failure. It did for a while, but after discussions we couldn't convince ourselves that it was right to do so. I'm not entirely convinced it is wrong to do so, but keeping the behaviour the same as currently except ensuring the Runnable is executed is a good step.

Why is it so important to write a response when there is a failure? Most times when this happens, the response will never make it to the client, as the connection will have failed in some way, hence the failure/timeout. I know we have always strived to do this, but I'm wondering what is the value? The cost is that we can sometimes ignore real errors and not deliver them to an application, which then doesn't see those failures on subsequent writes. Perhaps this PR to improve the invocation of error listeners will avoid the problem, but I'm still not 100% sure.

@gregw gregw merged commit ea258a3 into jetty-12.0.x Oct 9, 2024
10 checks passed
@olamy olamy mentioned this pull request Oct 9, 2024
olamy pushed a commit that referenced this pull request Oct 30, 2024
…12301)

Always execute (without queuing) the task to call error listeners
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Sponsored This issue affects a user with a commercial support agreement
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

3 participants