-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: linkcheck: add client connection-pool contention coverage #11402
Merged
AA-Turner
merged 10 commits into
sphinx-doc:master
from
jayaddison:issue-11324-prep/linkcheck-connection-contention-test-coverage
Jul 23, 2023
Merged
Changes from 6 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
60765d3
tests: linkcheck: add client connection-pool contention coverage
jayaddison 589bca5
tests: linkcheck: increase workload size in connection-pool contentio…
jayaddison 2ba1856
tests: linkcheck: refactor connection-pool contention test to use int…
jayaddison 0184271
Revert "tests: linkcheck: increase workload size in connection-pool c…
jayaddison 3c0b04a
tests: linkcheck: further excess-workload revert cleanup
jayaddison 8990061
tests: linkcheck: update explanatory comment
jayaddison f10fd14
Merge branch 'master' into issue-11324-prep/linkcheck-connection-cont…
AA-Turner d818ee8
Merge branch 'master' into issue-11324-prep/linkcheck-connection-cont…
AA-Turner 64c50e1
Fix
AA-Turner b04d993
Fix
AA-Turner File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to remove this because I think it has side-effects that outlast the unit test.
It was introduced to avoid indefinitely-blocked socket I/O when the test linkcheck client and server fail to communicate reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this could be a global setting for the test suite, I don’t see why an indefinitely-blocked socket I/O would be reasonable in that context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, that would make sense. It would also be nice if the tests could fail if that timeout limit is ever reached, because that'd almost certainly be a sign of a problem with the application code, given that we now entirely request local, non-Internet resources during tests (I believe that's true?).
Another relevant note is that, in practice, I think we already approximate this behaviour -- perhaps unintentionally -- from the time at which a linkchecker object is first constructed. That's because there is a call to the same
setdefaulttimeout
method here:sphinx/sphinx/builders/linkcheck.py
Lines 119 to 120 in d48cc78
It does probably make sense to have a timeout in production environments, because in most cases it would be bad for a single long-delayed link to block the entire linkcheck process, but at the same time it would be good to confirm why the global timeout is in place given that there is an existing -- and configurable -- linkcheck timeout setting:
sphinx/sphinx/builders/linkcheck.py
Lines 270 to 271 in d48cc78
(I also should explain that it seems OK to me that both timeouts exist: I sometimes have a tendency to remove redundancy in software to such an extent that the resulting code becomes fragile in other ways, and I'm keen that we avoid that risk here. but I think it would be good to understand the reasoning why both exist. I think it's basically to try to avoid the linkchecker ever stalling indefinitely, even in the absence of a configured
linkcheck_timeout
value)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. My reading of the code (and https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-linkcheck_timeout) is:
linkcheck_timeout=None
: make sure linkcheck bails out after a reasonable amout of time (5.0 seconds), sincerequests
won’t handle the timeoutlinkcheck_timeout=3.14
: specify the timeout torequests
So, by default,
linkcheck
will not get stuck. When the default does not fit a project (developers want slower or longer timeouts), they can uselinkcheck_timeout
. I don’t see a way to specify no timeout, but I don’t think that’s an issue. A noteworthy surprise is that settinglinkcheck_timeout=5.0
is different from using the default, as the default only matters for the socket connection, whereasrequests
timeout also considers the time to generate the response on the server.I think the side-effect is not intentional but is pretty nice to have and should not be changed. There aren’t other
socket.setdefaulttimeout()
in the app, so linkcheck sets that default for other extensions (thinking of extlinks). My suggestion is to move thesocket.setdefaulttimeout()
tosphinx.application.Sphinx.__init__()
. That makes the call explicit and should not break BC (since thesocket.setdefaulttimeout()
happened when the extensions were being setup, assetup_extension
imports the module).I believe the test suite could use the same setting as the application, or even a shorter one, considering that waiting 5 seconds is pretty long in that context. Anything above a second would probably indicate an issue IMO.
Coming back to
linkcheck
, we could remove the call tosocket.setdefaulttimeout()
(that has a side effect on the Sphinx application), and set a default of5.0
forlinkcheck_timeout
, but it would break BC:socket.setdefaulttimeout()
will timeout only on whatrequests
calls “connect”, not while reading data from the server. So a user who connected quickly but waited a long time to receive the response would see failures where it previously worked. I’m guessing we could maybe (and I don’t wanna) abuse the current implementation to pass a tuple(5.0, None)
and preserve the current behavior.socket.setdefaulttimeout()
afterSphinx.__init__()
, to use their own values. That would be a bit surprising consideringlinkcheck_timeout
exists, but who knows...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That’s actually not true.
socket.setdefaulttimeout(5.0)
is only called duringCheckExternalLinksBuilder.__init__()
, which happens when the builder is created, not when the module is imported and the builder registered.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure about this - my reading of the documentation:
... is that the unmodified global timeout for sockets is to be expected unless the operator changes the configuration value.
What we have currently is a 5-second timeout set by the application when no configuration is applied. I do think that having a timeout in production makes sense, but either the documentation or the code may need updating to bring them into consistency.
It'll take me a while longer to process the rest of your reply and to think about any further changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll create a separate issue soon to track resolution for this (relocation and/or removal of the
socket.setdefaulttimeout
calls).