-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent grid from creating sessions that are about to timeout in queue (corrects issue #11881) #12014
Prevent grid from creating sessions that are about to timeout in queue (corrects issue #11881) #12014
Conversation
This allow the extended waiting to be done without being interrupted by timeoutSessions() method
…uet/selenium into grid_concurrency_corrected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR!
I think you have a good point, where the current logic allows for a window where the timeout is triggered but still the Distributor can create a session that will end up being orphaned.
However, I do not think the approach in the PR is completely correct, or clean at least. This PR is basically waiting 5 seconds more for the session request to resolve.
A nicer approach, I think, is to put the logic you are putting in isRequestInQueue
as a filter in timeoutSessions
. With that, we timeout only session requests that are present in the queue
.
Something around the lines of:
private void timeoutSessions() {
Instant now = Instant.now();
Lock readLock = lock.readLock();
readLock.lock();
Set<RequestId> ids;
try {
ids =
requests.entrySet().stream()
.filter(
entry ->
queue.stream()
.anyMatch(
sessionRequest ->
sessionRequest.getRequestId().equals(entry.getKey())))
.filter(entry -> isTimedOut(now, entry.getValue()))
.map(Map.Entry::getKey)
.collect(ImmutableSet.toImmutableSet());
} finally {
readLock.unlock();
}
ids.forEach(this::failDueToTimeout);
}
(Might need testing, I just quickly tried this).
What do you think?
Hello I think that my PR needed to be improved but, correct me if I'm wrong, the timeoutSessions method is only called once every 10 seconds by default and would not prevent the current problematic behaviour
|
I changed the code as suggested |
I think you are right to point out that we were timing out session requests that were being processed by the Distributor. Now that this PR is taking that into account, it should not happen anymore. Thank you. |
Codecov ReportPatch and project coverage have no change.
❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more. Additional details and impacted files@@ Coverage Diff @@
## trunk #12014 +/- ##
=======================================
Coverage 54.85% 54.85%
=======================================
Files 86 86
Lines 5728 5728
Branches 233 233
=======================================
Hits 3142 3142
Misses 2353 2353
Partials 233 233 ☔ View full report in Codecov by Sentry. |
Oh wait, somehow when I reviewed the change set I only saw the filter added, not the initial code. I will revert the piece that adds the 5 seconds and go with the filter only. If the bug is still present, we can consider the other part of the PR. |
Well, the tests you added are now failing. So maybe all the code you added is needed 😄 I will leave it as it is, but hopefully you can be around in case bugs are raised. |
Yes, I'll be happy to contribute improve it if necessary |
Description
As stated in issue #11881, when Selenium grid receives more session requests than it can handle, some sessions may be created whereas the client that requests it receives a timeout response.
This bug only shows up when
--session-request-timeout
option is setMotivation and Context
On our grid setup, for some reasons, many grid nodes may become unavailable (upgrade, electrical failure, network failure, ...)
In this case the remaining nodes are not able to handle all the incoming session requests. These requests goes to queue, but we observed that at some point, a browser may be created on node and never used (it remains in this state 5 minutes) before session is killed by grid.
This is due to the fact (details and logs can be found in mentionned issue: #11881 (comment))
Correction takes place in LocalNewSessionQueue. When session request timeout expires, we first check if the session is still in queue or not (session not in queue means distributor is about to create it). In the latter case we wait a bit more so that distributor has time to create the session.
Types of changes
Checklist
Remark: added tests are a bit long to run to test timeout behaviour