tests/test_builders/test_build_linkcheck.py::test_defaults flaky for docutils HEAD CI #12159

chrisjsewell · 2024-03-21T13:00:21Z

I've just re-run this 5 flipping times on #12153, and @picnixz has run into it as well

tests/test_builders/test_build_latex.py::test_one_parameter_per_line PASSED [ 31%]
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 692, in process_request_thread
    self.finish_request(request, client_address)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 362, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 761, in __init__
    self.handle()
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/http/server.py", line 436, in handle
    self.handle_one_request()
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/http/server.py", line 424, in handle_one_request
    method()
  File "/home/runner/work/sphinx/sphinx/tests/test_builders/test_build_linkcheck.py", line 62, in do_GET
    self.wfile.write(content)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 840, in write
    self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe
----------------------------------------
tests/test_builders/test_build_linkcheck.py::test_defaults FAILED        [ 31%]

it seems the server breaks when exiting test_one_parameter_per_line, then this leads to test_defaults not working, with:

links.rst:3: [timeout] http://localhost:7777/: HTTPConnectionPool(host='localhost', port=7777): Read timed out. (read timeout=0.05)

@jayaddison will your PRs fix this?

or should we temporarily disable the test for Docutils HEAD?

The text was updated successfully, but these errors were encountered:

chrisjsewell · 2024-03-21T13:02:51Z

I think you had a wrong PR reference

changed, but yeh it seems once it happens once it just will not fix itself

picnixz · 2024-03-21T13:03:11Z

Better to fix it then. Maybe it has something to do with caching actually?

chrisjsewell · 2024-03-21T13:03:28Z

and its only for Docutils HEAD; I don't know if there is a change in dependencies

chrisjsewell · 2024-03-21T13:29:55Z

Maybe it has something to do with caching actually?

Probably, it's ludicrous I literally can't get #12153 to pass anymore, and now #12154 as well 😒

chrisjsewell · 2024-03-21T14:12:25Z

I think it is now failing every time: sphinx-doc/sphinx/actions/runs/8376347449/job/22935843069

jayaddison · 2024-03-21T17:57:02Z

I've just re-run this 5 flipping times on #12153, and @picnixz has run into it as well

tests/test_builders/test_build_latex.py::test_one_parameter_per_line PASSED [ 31%]
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 692, in process_request_thread
    self.finish_request(request, client_address)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 362, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 761, in __init__
    self.handle()
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/http/server.py", line 436, in handle
    self.handle_one_request()
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/http/server.py", line 424, in handle_one_request
    method()
  File "/home/runner/work/sphinx/sphinx/tests/test_builders/test_build_linkcheck.py", line 62, in do_GET
    self.wfile.write(content)
  File "/opt/hostedtoolcache/Python/3.12.2/x64/lib/python3.12/socketserver.py", line 840, in write
    self._sock.sendall(b)
BrokenPipeError: [Errno 32] Broken pipe
----------------------------------------
tests/test_builders/test_build_linkcheck.py::test_defaults FAILED        [ 31%]

it seems the server breaks when exiting test_one_parameter_per_line, then this leads to test_defaults not working, with:

links.rst:3: [timeout] http://localhost:7777/: HTTPConnectionPool(host='localhost', port=7777): Read timed out. (read timeout=0.05)

@jayaddison will your PRs fix this?

It seems not, nope - I've encountered one of these failures for the test-HTTP-port-isolation branch, the only branch I expected might affect this: https://github.com/sphinx-doc/sphinx/actions/runs/8379421681/job/22946289546?pr=12126

or should we temporarily disable the test for Docutils HEAD?

Maybe. Has anyone taken a look at the latest changes in docutils to see whether there is something relevant?

Under some circumstances, docutils does make HTTP network requests while processing documents; but I can't initially think of any reason why that would affect test_defaults in particular.

The other possibility is that this is somehow contention-related; we and/or GitHub may be more active than usual, or the runners may be lower-performance or constrained for some reasoning, producing timeouts. But for that to only seem to affect the docutils HEAD jobs would seem unusual.

jayaddison · 2024-03-21T18:41:25Z

I'm going to reluctantly suggest (with accompanying pull request) that we increase the timeouts for the linkcheck tests.

Although #12126 doesn't fix the timeouts, it does unlock running the linkcheck tests in parallel, and that should reduce the end-to-end time-to-complete for the test suite.

jayaddison · 2024-03-21T18:49:21Z

FWIW: I really dislike workarounds when it feels possible to resolve a root cause. But something I dislike more is disorder in continuous integration results and the surrounding collaboration problems that causes.

Also I have to admit that despite working on these tests for a while, I don't feel any closer to having been able to pinpoint the problem. I could make an excuse and say that that's due to not having access to profiling/tracing on the runner hosts.. it certainly makes things more difficult, but I feel like there would be a way to achieve better network/performance tracing using GitHub Actions with a bit of determined effort.

jayaddison · 2024-03-22T09:24:12Z

Although #12126 doesn't fix the timeouts, it does unlock running the linkcheck tests in parallel, and that should reduce the end-to-end time-to-complete for the test suite.

Also I have to admit that despite working on these tests for a while, I don't feel any closer to having been able to pinpoint the problem. I could make an excuse and say that that's due to not having access to profiling/tracing on the runner hosts.. it certainly makes things more difficult, but I feel like there would be a way to achieve better network/performance tracing using GitHub Actions with a bit of determined effort.

I wanted to check myself about this claim that parallelism should reduce the linkcheck tests time-to-completion. It seems not true on my machine when running pytest -n auto tests/test_builders/test_build_linkcheck.py with pytest-xdist when compared to a simple pytest tests/test_builders/test_build_linkcheck.py (2s vs 1.5s).

I understand that the parallelism may consume some more resources and require some more setup time, so that may be the reason. In theory it could still be beneficial to allow more of these tests to run in parallel and isolated.. but let's be cautious anyway.

This did help me to find the --durations parameter to pytest. It seems that the setup time for the first linkcheck test to run takes longer than the others. When running in serial, that first test is currently test_defaults -- the one that generally fails due to timeouts.

$ pytest --durations=0 tests/test_builders/test_build_linkcheck.py  # without pytest-randomly installed
...
============================================= slowest durations ==============================================
0.09s setup    tests/test_builders/test_build_linkcheck.py::test_defaults
0.08s call     tests/test_builders/test_build_linkcheck.py::test_defaults
0.08s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_verify_false
0.06s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_requests_env_var
0.06s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_cacerts
...
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_requests_timeout
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_linkcheck_request_headers_no_slash
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_cacerts

(44 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================= 39 passed in 1.46s =============================================

$ pytest --durations=0 tests/test_builders/test_build_linkcheck.py  # with pytest-randomly installed
...
============================================= slowest durations ==============================================
0.11s setup    tests/test_builders/test_build_linkcheck.py::test_limit_rate_user_max_delay
0.07s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_verify_false
0.07s call     tests/test_builders/test_build_linkcheck.py::test_defaults
0.06s call     tests/test_builders/test_build_linkcheck.py::test_connection_contention
0.06s call     tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_with_tls_cacerts
...
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_requests_timeout
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_unauthorized_broken
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_nonexistent_cert_file
0.01s setup    tests/test_builders/test_build_linkcheck.py::test_connect_to_selfsigned_fails

(44 durations < 0.005s hidden.  Use -vv to show these durations.)
============================================= 39 passed in 1.47s =============================================

picnixz · 2024-03-22T09:28:31Z

Those timings are not relevant in general because 0.02s of difference for setting up is essentially the time for your OS to setup resources. Then after the resources were once inintialized some of them could be cached (e.g., some of them could be in the L1/L2 or L3 cache).

Also, don't forget that the setup could also have import statements and thus you'll have more time just because of imports. But afterwards, modules are in sys.modules and you don't need to import them anymore.

You should worry if you have seconds of differences but not anything below 1s IMO.

jayaddison · 2024-03-22T09:35:10Z

@picnixz there are two lines of exploration here:

Isolating tests and parallelizing them - for these, I agree that small time variances are not hugely relevant.
Figuring out the cause of timeouts within the linkcheck tests -- where we had 0.05s timeouts, and have increased those to 0.25s.

Although I think the setup phase in each test should occur strictly before the HTTP client begins waiting for the linkcheck_timeout duration, it seems relevant for me to investigate.

picnixz · 2024-03-22T09:38:13Z

The setup should only concerns the pytest fixtures initializations.

chrisjsewell added the type:tests label Mar 21, 2024

This comment was marked as resolved.

Sign in to view

chrisjsewell changed the title ~~tests/test_builders/test_build_linkcheck.py::test_defaults is flaky with docutils HEAD~~ tests/test_builders/test_build_linkcheck.py::test_defaults fails for docutils HEAD CI Mar 21, 2024

chrisjsewell changed the title ~~tests/test_builders/test_build_linkcheck.py::test_defaults fails for docutils HEAD CI~~ tests/test_builders/test_build_linkcheck.py::test_defaults flaky for docutils HEAD CI Mar 21, 2024

jayaddison mentioned this issue Mar 21, 2024

[tests] Increase timeouts for linkcheck build tests. #12166

Merged

chrisjsewell closed this as completed in #12166 Mar 21, 2024

jayaddison mentioned this issue Mar 25, 2024

[tracker] [tests] Python test suite reliability: striving for non-flaky, parallelizable, random-orderable tests. #12191

Open

11 tasks

github-actions bot locked as resolved and limited conversation to collaborators Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/test_builders/test_build_linkcheck.py::test_defaults flaky for docutils HEAD CI #12159

tests/test_builders/test_build_linkcheck.py::test_defaults flaky for docutils HEAD CI #12159

chrisjsewell commented Mar 21, 2024 •

edited

Loading

This comment was marked as resolved.

chrisjsewell commented Mar 21, 2024

picnixz commented Mar 21, 2024

chrisjsewell commented Mar 21, 2024

chrisjsewell commented Mar 21, 2024 •

edited

Loading

chrisjsewell commented Mar 21, 2024 •

edited

Loading

jayaddison commented Mar 21, 2024

jayaddison commented Mar 21, 2024

jayaddison commented Mar 21, 2024

jayaddison commented Mar 22, 2024

picnixz commented Mar 22, 2024

jayaddison commented Mar 22, 2024

picnixz commented Mar 22, 2024

tests/test_builders/test_build_linkcheck.py::test_defaults flaky for docutils HEAD CI #12159

tests/test_builders/test_build_linkcheck.py::test_defaults flaky for docutils HEAD CI #12159

Comments

chrisjsewell commented Mar 21, 2024 • edited Loading

This comment was marked as resolved.

chrisjsewell commented Mar 21, 2024

picnixz commented Mar 21, 2024

chrisjsewell commented Mar 21, 2024

chrisjsewell commented Mar 21, 2024 • edited Loading

chrisjsewell commented Mar 21, 2024 • edited Loading

jayaddison commented Mar 21, 2024

jayaddison commented Mar 21, 2024

jayaddison commented Mar 21, 2024

jayaddison commented Mar 22, 2024

picnixz commented Mar 22, 2024

jayaddison commented Mar 22, 2024

picnixz commented Mar 22, 2024

chrisjsewell commented Mar 21, 2024 •

edited

Loading

chrisjsewell commented Mar 21, 2024 •

edited

Loading

chrisjsewell commented Mar 21, 2024 •

edited

Loading