-
-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JENKINS-70531] Apply timeout on WebSocket write operations (and simplify AbstractByteBufferCommandTransport
)
#621
Conversation
Actually in this case I cannot see
where the agent did receive a WebSocket close event but failed to close in response. (For over a week, apparently!) #595 might have fixed this as well; unclear. |
Seeing some flakes, but I get some in trunk too, so I am not sure they are related. |
Just realized that https://docs.oracle.com/en/java/javase/11/docs/api/java.net.http/java/net/http/WebSocket.html could potentially be used to implement the WebSocket client without needing the Tyrus dep. Worth experimenting with. |
FYI this patch has probed to be effective in a production controller. |
AbstractByteBufferCommandTransport
)
AbstractByteBufferCommandTransport
)AbstractByteBufferCommandTransport
)
Filing this as https://issues.jenkins.io/browse/JENKINS-70531. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not expert on the field of remoting but changes looks straightforward
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/main/java/hudson/remoting/AbstractByteBufferCommandTransport.java
Outdated
Show resolved
Hide resolved
…we can see by code inspection it is safe
Downstream passed; releasing. |
* Apply timeout on WebSocket write operations * jenkinsci/remoting#621 released
…i#7596) * Apply timeout on WebSocket write operations * jenkinsci/remoting#621 released (cherry picked from commit e0aee59)
A user reported a WebSocket agent hanging indefinitely after a reload of nginx configuration
despite
PingThread
being explicitly enabled on the agent side (contra #85, pending jenkinsci/jenkins#7580). Hypothesis: Tyrus is failing to detect the loss of the connection cleanly, andFuture.get()
without timeout never returns, andPingThread
does not receive aTimeoutException
or any other response. If true, we can try to apply a timeout (currently hard-coded to 5m), though this is a bit tricky since the asynch variant of the endpoint does not appear to support transmission of a sequence of buffers as part of a single binary frame.Corresponding jenkinsci/jenkins#7596 as well, though in this case it is the agent-side write that appears to be the culprit.
Iterative testing via
WebSocketAgentsTest
:mvnd -Pquick-build install mvnd -f ../jenkins -pl core,war -Pquick-build install mvnd -f ../jenkins -pl test -Dtest=WebSocketAgentsTest
Not tested in any realistic context.