-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linkcheck builder: begin using requests.Session functionality during linkchecking #11324
Comments
I think that the I'm experimenting with it locally at the moment; initially it seems to be good at mocking the client-side sockets, but when mocking the communication between the linkcheck (client) and test HTTP server, something isn't working as expected. cc @mindflayer (I'll likely open an issue report or two for |
Remember that using |
Thanks! The test HTTP server does run on a separate thread, so that's likely the problem. The recording feature looks perfect to determine whether the use of session-based requests has the intended effect. |
That's exactly why I mentioned that. Let me know if you need more support, I am always happy to help. |
Thanks @mindflayer - I'd be grateful for your help. I'd been planning to try switching to Does that sound like a sensible approach, and/or do you have other suggestions? |
You won't need to reach the actual server when introducing |
(a shorter summary of a longer draft message) Are you suggesting that instantiating a server is not necessary during these tests? |
Correct, |
Ok, thanks. If possible I would like to retain the existing client/server communication during the tests, so I'm weighing up the options available. I ran into problems using the recording mode of During that same attempt, there was also a problem with a call to |
I am not sure why would you want to use |
re: socket mocksOk, it seems I didn't match the use-case for @mindflayer: I have some ideas about whether it might be able to support re: connection pool tracingI have been able to (hackily) confirm the expected result locally by collecting a This did require use of (upgrade to) an HTTP/1.1 protocol server during testing -- HTTP/1.0 doesn't keep connections open by default, so the connection count does not change under HTTP/1.0. |
Feel free to open an issue on |
For the linkcheck builder, since each check is a task performed by a
Ideally, if there are a lot of domains but only checking a single link everytime, the chunk approach is not good. In this case, we should do it normally. If there are a lot of links for a single domain, this would reduce the number of sessions being created. |
I have a branch in progress at https://github.com/jayaddison/sphinx/tree/issue-11324/linkcheck-sessioned-requests that:
My changes seem naive and don't function particularly well when the tests run. In particular: without adjusting the timeout, many of the tests fail due to timeout errors that appear in the report results:
With the timeouts removed, then most of the tests succeed, but run extremely slowly (20 seconds or so per unit test). Running
Socket-level networking and syscall tracing aren't things I have much expertise with, but It's interesting to me that the tests do eventually pass -- that seems to indicate that both consumer (client) and producer (server) threads are co-operating; my guesses are:
Perhaps there's something unusual going on in the socket block/timeout configuration, and/or polling. It feels like it should be possible to get these two threads to co-operate without these performance penalties, and then to restore the timeout configuration to ensure that the tests run quickly (in terms of elapsed time). |
An additional finding: sending the |
Could it be that the HTTP requests opened in streaming mode ( |
Disabling streaming requests from the HTTP client-side here doesn't appear to affect the duration for completion of the tests. |
Depends on how your server responds. If the server does not close the connection or does not send some EOF sentinel, both connections will hang until timing out (metaphorically speaking, both participants are staring at each other, waiting for the other to do something). Concerning
By the way, since we want to speed-up tests, can you actually profile what is doing most of the work (e.g., using |
Thanks @picnixz - I'll take a look at Introducing Python's
|
…st_defaults' test case With thanks to @picnixz for mentioning this configuration setting: sphinx-doc#11324 (comment)
FWIW: switching from the |
@picnixz thank you for this suggestion - I've applied it using the I'm a little more reserved/cautious about the sorting and chunking suggestions. Certainly it could make sense in terms of resource-usage to group together similar requests and to make them around the same time.. I think it should be a separate changeset though. Would you like to add any more detail, or should I open that as a follow-up issue? |
I'll write a separate issue this weekend. I will be offline for a few days. |
Sorry @AA-Turner - #11402 doesn't complete this, could we reopen it? It's my mistake for using the phrase |
Thanks! Most of the groundwork should be in place for this migration/feature now, I think. |
Is your feature request related to a problem? Please describe.
At the moment, the
linkcheck
builder performs individualrequest.get
(or similar HTTP request method) operations during linkchecking, without any explicit connection or session pooling.This may be inefficient, because it seems likely that for many use cases, linkchecking will make multiple requests to the same host (because documentation references are likely to have host-locality).
Describe the solution you'd like
Confirmation that connection pooling is not currently in use would be a good starting point; in other words: we should confirm that linkchecking of multiple URLs on a single host results in multiple TCP connections. Ideally this should be written as a test case.
If we can confirm that the problem exists, then we may be able to use some of the
Session
object functionality from therequests
library that's already in use here to enable connection pooling.Describe alternatives you've considered
None so far, although open to suggestions (and improvements on the definition of this feature request).
Additional context
The text was updated successfully, but these errors were encountered: