Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: linkcheck: add client connection-pool contention coverage #11402

48 changes: 47 additions & 1 deletion tests/test_build_linkcheck.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,12 @@

import pytest

from sphinx.builders.linkcheck import HyperlinkAvailabilityCheckWorker, RateLimit
from sphinx.builders.linkcheck import (
CheckRequest,
Hyperlink,
HyperlinkAvailabilityCheckWorker,
RateLimit,
)
from sphinx.testing.util import strip_escseq
from sphinx.util.console import strip_colors

Expand Down Expand Up @@ -661,6 +666,47 @@ def test_limit_rate_bails_out_after_waiting_max_time(app):
assert next_check is None


@mock.patch('sphinx.util.requests.requests.Session.get_adapter')
def test_connection_contention(get_adapter, app, capsys):
# Create a shared, but limited-size, connection pool
import requests
get_adapter.return_value = requests.adapters.HTTPAdapter(pool_maxsize=1)

# Set an upper-bound on socket timeouts globally
import socket
socket.setdefaulttimeout(5)
Comment on lines +778 to +780
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to remove this because I think it has side-effects that outlast the unit test.

It was introduced to avoid indefinitely-blocked socket I/O when the test linkcheck client and server fail to communicate reliably.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this could be a global setting for the test suite, I don’t see why an indefinitely-blocked socket I/O would be reasonable in that context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would make sense. It would also be nice if the tests could fail if that timeout limit is ever reached, because that'd almost certainly be a sign of a problem with the application code, given that we now entirely request local, non-Internet resources during tests (I believe that's true?).

Another relevant note is that, in practice, I think we already approximate this behaviour -- perhaps unintentionally -- from the time at which a linkchecker object is first constructed. That's because there is a call to the same setdefaulttimeout method here:

# set a timeout for non-responding servers
socket.setdefaulttimeout(5.0)

It does probably make sense to have a timeout in production environments, because in most cases it would be bad for a single long-delayed link to block the entire linkcheck process, but at the same time it would be good to confirm why the global timeout is in place given that there is an existing -- and configurable -- linkcheck timeout setting:

if self.config.linkcheck_timeout:
kwargs['timeout'] = self.config.linkcheck_timeout

(I also should explain that it seems OK to me that both timeouts exist: I sometimes have a tendency to remove redundancy in software to such an extent that the resulting code becomes fragile in other ways, and I'm keen that we avoid that risk here. but I think it would be good to understand the reasoning why both exist. I think it's basically to try to avoid the linkchecker ever stalling indefinitely, even in the absence of a configured linkcheck_timeout value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. My reading of the code (and https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-linkcheck_timeout) is:

  1. linkcheck_timeout=None: make sure linkcheck bails out after a reasonable amout of time (5.0 seconds), since requests won’t handle the timeout
  2. linkcheck_timeout=3.14: specify the timeout to requests

So, by default, linkcheck will not get stuck. When the default does not fit a project (developers want slower or longer timeouts), they can use linkcheck_timeout. I don’t see a way to specify no timeout, but I don’t think that’s an issue. A noteworthy surprise is that setting linkcheck_timeout=5.0 is different from using the default, as the default only matters for the socket connection, whereas requests timeout also considers the time to generate the response on the server.

I think the side-effect is not intentional but is pretty nice to have and should not be changed. There aren’t other socket.setdefaulttimeout() in the app, so linkcheck sets that default for other extensions (thinking of extlinks). My suggestion is to move the socket.setdefaulttimeout() to sphinx.application.Sphinx.__init__(). That makes the call explicit and should not break BC (since the socket.setdefaulttimeout() happened when the extensions were being setup, as setup_extension imports the module).

I believe the test suite could use the same setting as the application, or even a shorter one, considering that waiting 5 seconds is pretty long in that context. Anything above a second would probably indicate an issue IMO.

Coming back to linkcheck, we could remove the call to socket.setdefaulttimeout() (that has a side effect on the Sphinx application), and set a default of 5.0 for linkcheck_timeout, but it would break BC:

  • https://docs.python-requests.org/en/latest/user/advanced/#timeouts shows socket.setdefaulttimeout() will timeout only on what requests calls “connect”, not while reading data from the server. So a user who connected quickly but waited a long time to receive the response would see failures where it previously worked. I’m guessing we could maybe (and I don’t wanna) abuse the current implementation to pass a tuple (5.0, None) and preserve the current behavior.
  • users may call socket.setdefaulttimeout() after Sphinx.__init__(), to use their own values. That would be a bit surprising considering linkcheck_timeout exists, but who knows...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes the call explicit and should not break BC (since the socket.setdefaulttimeout() happened when the extensions were being setup, as setup_extension imports the module).

That’s actually not true. socket.setdefaulttimeout(5.0) is only called during CheckExternalLinksBuilder.__init__(), which happens when the builder is created, not when the module is imported and the builder registered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. My reading of the code (and https://www.sphinx-doc.org/en/master/usage/configuration.html#confval-linkcheck_timeout) is:

1. `linkcheck_timeout=None`: make sure linkcheck bails out after a reasonable amout of time (5.0 seconds), since `requests` won’t handle the timeout

I'm not so sure about this - my reading of the documentation:

linkcheck_timeout

A timeout value, in seconds, for the linkcheck builder. The default is to use Python’s global socket timeout.

New in version 1.1.

... is that the unmodified global timeout for sockets is to be expected unless the operator changes the configuration value.

What we have currently is a 5-second timeout set by the application when no configuration is applied. I do think that having a timeout in production makes sense, but either the documentation or the code may need updating to bring them into consistency.

It'll take me a while longer to process the rest of your reply and to think about any further changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll create a separate issue soon to track resolution for this (relocation and/or removal of the socket.setdefaulttimeout calls).


# Place a workload into the linkcheck queue
rqueue, wqueue, link_count = Queue(), Queue(), 10
for _ in range(link_count):
wqueue.put(CheckRequest(0, Hyperlink("http://localhost:7777", "test", 1)))

# Create parallel consumer threads
with http_server(make_redirect_handler(support_head=True)):
begin, threads, checked = time.time(), [], []
threads = [
HyperlinkAvailabilityCheckWorker(
env=app.env,
config=app.config,
rqueue=rqueue,
wqueue=wqueue,
rate_limits={},
)
for _ in range(10)
]
for thread in threads:
thread.start()
while time.time() < begin + 5 and len(checked) < link_count:
checked.append(rqueue.get(timeout=5))
for thread in threads:
thread.join(timeout=0)

# Ensure that all items were consumed within the time limit
_, stderr = capsys.readouterr()
assert len(checked) == link_count
assert "TimeoutError" not in stderr


class ConnectionResetHandler(http.server.BaseHTTPRequestHandler):
def do_HEAD(self):
self.connection.close()
Expand Down