[grid] Distributor retry session when RemoteNode executor shutting down#17109
[grid] Distributor retry session when RemoteNode executor shutting down#17109
Conversation
Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
PR TypeBug fix Description
|
| Relevant files | |||
|---|---|---|---|
| Bug fix |
| ||
| Tests |
|
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||
There was a problem hiding this comment.
Pull request overview
This PR fixes an intermittent session creation failure (#17044) that occurs when a Grid Node restarts or drains while a session request is in flight. The JDK HTTP client throws a RejectedExecutionException wrapped in nested exceptions, which was previously surfacing to clients as a cryptic error. The fix detects this specific scenario and signals the distributor to retry on a healthy node instead.
Changes:
- Added exception handling in
RemoteNode.newSession()to walk the cause chain and detectRejectedExecutionException - When detected, returns
RetrySessionRequestExceptionto signal the distributor to retry - Added comprehensive test coverage for both the fix scenario and to ensure other IOExceptions are not incorrectly caught
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| java/src/org/openqa/selenium/grid/node/remote/RemoteNode.java | Added UncheckedIOException catch block that walks exception cause chain to detect RejectedExecutionException and convert to RetrySessionRequestException |
| java/test/org/openqa/selenium/grid/node/RemoteNodeTest.java | Added new test class with two tests: one validating the fix converts executor shutdown to retry, and one ensuring other IOExceptions are not affected |
Signed-off-by: Viet Nguyen Duc <nguyenducviet4496@gmail.com>
🔗 Related Issues
Fixes #17044 - intermittently fails to create new sessions (HTTP 500): SequentialScheduler task rejected / ThreadPoolExecutor “Shutting down”
💥 What does this PR do?
When a Node restarts or drains while a newSession request is in flight, the JDK HTTP client on the Hub throws a RejectedExecutionException (executor shutting down), which propagates as:
UncheckedIOException ← JdkHttpClient.execute0()
Caused by: IOException ← HttpClientImpl.send()
Caused by: RejectedExecutionException ← ThreadPoolExecutor$AbortPolicy
This propagated uncaught through RemoteNode.newSession() and was wrapped by LocalDistributor.startSession() into a SessionNotCreatedException with the cryptic internal JDK message surfaced directly to the client:
Update RemoteNode.newSession() now catches UncheckedIOException and walks the cause chain. If a RejectedExecutionException is found, it returns RetrySessionRequestException with a clear message - signalling the
distributor to retry on a healthy node instead of immediately failing the client.
🔧 Implementation Notes
💡 Additional Considerations
🔄 Types of changes