Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests/test_builders/test_build_linkcheck.py::test_defaults flaky for docutils HEAD CI #12159

Closed
chrisjsewell opened this issue Mar 21, 2024 · 13 comments · Fixed by #12166
Closed

Comments

@chrisjsewell
Copy link
Member

chrisjsewell commented Mar 21, 2024

I've just re-run this 5 flipping times on #12153, and @picnixz has run into it as well

tests/test_builders/test_build_latex.py::test_one_parameter_per_line PASSED [ 31%]
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 692, in process_request_thread
    self.finish_request(request, client_address)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 362, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 761, in __init__
    self.handle()
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/http/server.py", line 436, in handle
    self.handle_one_request()
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/http/server.py", line 424, in handle_one_request
    method()
  File "/home/runner/work/sphinx/sphinx/tests/test_builders/test_build_linkcheck.py", line 62, in do_GET
    self.wfile.write(content)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 840, in write
    self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe
----------------------------------------
tests/test_builders/test_build_linkcheck.py::test_defaults FAILED        [ 31%]

it seems the server breaks when exiting test_one_parameter_per_line, then this leads to test_defaults not working, with:

links.rst:3: [timeout] http://localhost:7777/: HTTPConnectionPool(host='localhost', port=7777): Read timed out. (read timeout=0.05)

@jayaddison will your PRs fix this?

or should we temporarily disable the test for Docutils HEAD?

@picnixz

This comment was marked as resolved.

@chrisjsewell
Copy link
Member Author

I think you had a wrong PR reference

changed, but yeh it seems once it happens once it just will not fix itself

@picnixz
Copy link
Member

picnixz commented Mar 21, 2024

Better to fix it then. Maybe it has something to do with caching actually?

@chrisjsewell
Copy link
Member Author

and its only for Docutils HEAD; I don't know if there is a change in dependencies

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 21, 2024

Maybe it has something to do with caching actually?

Probably, it's ludicrous I literally can't get #12153 to pass anymore, and now #12154 as well 😒

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Mar 21, 2024

I think it is now failing every time: sphinx-doc/sphinx/actions/runs/8376347449/job/22935843069

@chrisjsewell chrisjsewell changed the title tests/test_builders/test_build_linkcheck.py::test_defaults is flaky with docutils HEAD tests/test_builders/test_build_linkcheck.py::test_defaults fails for docutils HEAD CI Mar 21, 2024
@chrisjsewell chrisjsewell changed the title tests/test_builders/test_build_linkcheck.py::test_defaults fails for docutils HEAD CI tests/test_builders/test_build_linkcheck.py::test_defaults flaky for docutils HEAD CI Mar 21, 2024
@jayaddison
Copy link
Contributor

I've just re-run this 5 flipping times on #12153, and @picnixz has run into it as well

tests/test_builders/test_build_latex.py::test_one_parameter_per_line PASSED [ 31%]
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 692, in process_request_thread
    self.finish_request(request, client_address)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 362, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 761, in __init__
    self.handle()
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/http/server.py", line 436, in handle
    self.handle_one_request()
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/http/server.py", line 424, in handle_one_request
    method()
  File "/home/runner/work/sphinx/sphinx/tests/test_builders/test_build_linkcheck.py", line 62, in do_GET
    self.wfile.write(content)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 840, in write
    self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe
----------------------------------------
tests/test_builders/test_build_linkcheck.py::test_defaults FAILED        [ 31%]

it seems the server breaks when exiting test_one_parameter_per_line, then this leads to test_defaults not working, with:

links.rst:3: [timeout] http://localhost:7777/: HTTPConnectionPool(host='localhost', port=7777): Read timed out. (read timeout=0.05)

@jayaddison will your PRs fix this?

It seems not, nope - I've encountered one of these failures for the test-HTTP-port-isolation branch, the only branch I expected might affect this: https://github.com/sphinx-doc/sphinx/actions/runs/8379421681/job/22946289546?pr=12126

or should we temporarily disable the test for Docutils HEAD?

Maybe. Has anyone taken a look at the latest changes in docutils to see whether there is something relevant?

Under some circumstances, docutils does make HTTP network requests while processing documents; but I can't initially think of any reason why that would affect test_defaults in particular.

The other possibility is that this is somehow contention-related; we and/or GitHub may be more active than usual, or the runners may be lower-performance or constrained for some reasoning, producing timeouts. But for that to only seem to affect the docutils HEAD jobs would seem unusual.

@jayaddison
Copy link
Contributor

I'm going to reluctantly suggest (with accompanying pull request) that we increase the timeouts for the linkcheck tests.

Although #12126 doesn't fix the timeouts, it does unlock running the linkcheck tests in parallel, and that should reduce the end-to-end time-to-complete for the test suite.

@jayaddison
Copy link
Contributor

FWIW: I really dislike workarounds when it feels possible to resolve a root cause. But something I dislike more is disorder in continuous integration results and the surrounding collaboration problems that causes.

Also I have to admit that despite working on these tests for a while, I don't feel any closer to having been able to pinpoint the problem. I could make an excuse and say that that's due to not having access to profiling/tracing on the runner hosts.. it certainly makes things more difficult, but I feel like there would be a way to achieve better network/performance tracing using GitHub Actions with a bit of determined effort.

@jayaddison
Copy link
Contributor

Although #12126 doesn't fix the timeouts, it does unlock running the linkcheck tests in parallel, and that should reduce the end-to-end time-to-complete for the test suite.

Also I have to admit that despite working on these tests for a while, I don't feel any closer to having been able to pinpoint the problem. I could make an excuse and say that that's due to not having access to profiling/tracing on the runner hosts.. it certainly makes things more difficult, but I feel like there would be a way to achieve better network/performance tracing using GitHub Actions with a bit of determined effort.

I wanted to check myself about this claim that parallelism should reduce the linkcheck tests time-to-completion. It seems not true on my machine when running pytest -n auto tests/test_builders/test_build_linkcheck.py with pytest-xdist when compared to a simple pytest tests/test_builders/test_build_linkcheck.py (2s vs 1.5s).

I understand that the parallelism may consume some more resources and require some more setup time, so that may be the reason. In theory it could still be beneficial to allow more of these tests to run in parallel and isolated.. but let's be cautious anyway.

This did help me to find the --durations parameter to pytest. It seems that the setup time for the first linkcheck test to run takes longer than the others. When running in serial, that first test is currently test_defaults -- the one that generally fails due to timeouts.

$ pytest --durations=0 tests/test_builders/test_build_linkcheck.py  # without pytest-randomly installed
...
============================================= slowest durations ==============================================
0.09s setup    tests/test_builders/test_build_linkcheck.py::test_defaults
0.08s call     tests/test_builders/test_build_linkcheck.py::test_defaults
0.08s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_verify_false
0.06s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_requests_env_var
0.06s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_cacerts
...
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_requests_timeout
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_linkcheck_request_headers_no_slash
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_cacerts

(44 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================= 39 passed in 1.46s =============================================
$ pytest --durations=0 tests/test_builders/test_build_linkcheck.py  # with pytest-randomly installed
...
============================================= slowest durations ==============================================
0.11s setup    tests/test_builders/test_build_linkcheck.py::test_limit_rate_user_max_delay
0.07s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_verify_false
0.07s call     tests/test_builders/test_build_linkcheck.py::test_defaults
0.06s call     tests/test_builders/test_build_linkcheck.py::test_connection_contention
0.06s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_cacerts
...
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_requests_timeout
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_unauthorized_broken
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_nonexistent_cert_file
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_fails

(44 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================= 39 passed in 1.47s =============================================

@picnixz
Copy link
Member

picnixz commented Mar 22, 2024

Those timings are not relevant in general because 0.02s of difference for setting up is essentially the time for your OS to setup resources. Then after the resources were once inintialized some of them could be cached (e.g., some of them could be in the L1/L2 or L3 cache).

Also, don't forget that the setup could also have import statements and thus you'll have more time just because of imports. But afterwards, modules are in sys.modules and you don't need to import them anymore.

You should worry if you have seconds of differences but not anything below 1s IMO.

@jayaddison
Copy link
Contributor

@picnixz there are two lines of exploration here:

  • Isolating tests and parallelizing them - for these, I agree that small time variances are not hugely relevant.
  • Figuring out the cause of timeouts within the linkcheck tests -- where we had 0.05s timeouts, and have increased those to 0.25s.

Although I think the setup phase in each test should occur strictly before the HTTP client begins waiting for the linkcheck_timeout duration, it seems relevant for me to investigate.

@picnixz
Copy link
Member

picnixz commented Mar 22, 2024

The setup should only concerns the pytest fixtures initializations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants