Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PYTHON-3926 Add more information to connection errors and timeouts #1375

Merged
merged 15 commits into from
Oct 30, 2023

Conversation

NoahStapp
Copy link
Contributor

No description provided.

@NoahStapp NoahStapp requested a review from a team as a code owner September 14, 2023 21:40
@NoahStapp NoahStapp requested review from ShaneHarvey and removed request for a team September 14, 2023 21:40
Comment on lines 892 to 897
if "socketTimeout" in details:
result += f" socketTimeout: {details['socketTimeout']}ms,"
if "timeout" in details:
result += f" timeout: {details['timeout']}ms,"
if "connectTimeout" in details:
result += f" connectTimeout: {details['connectTimeout']}ms,"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

Suggested change
if "socketTimeout" in details:
result += f" socketTimeout: {details['socketTimeout']}ms,"
if "timeout" in details:
result += f" timeout: {details['timeout']}ms,"
if "connectTimeout" in details:
result += f" connectTimeout: {details['connectTimeout']}ms,"
for timeout in ["socketTimeout", "timeout", "connectTimeout"]:
if timeout in details:
result += f" {timeout}: {details[timeout]}ms,"

Copy link
Contributor

@Jibola Jibola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, this looks great and I love how concise an improvement it is!

Left some commentary, but only one blocking comment around topology

pymongo/common.py Outdated Show resolved Hide resolved
pymongo/pool.py Outdated Show resolved Hide resolved
@NoahStapp NoahStapp requested a review from Jibola September 18, 2023 15:55
@NoahStapp
Copy link
Contributor Author

NoahStapp commented Sep 18, 2023

I just noticed that this doesn't cover the Connection socket timeouts like

self._raise_connection_failure(error)

and

self._raise_connection_failure(error)

and

self._raise_connection_failure(error)

We already have socket_timeout and connect_timeout in the Connection's PoolOptions, thoughts on adding timeout there as well? That will allow us to use those values without need for a MongoClientErrorHandler. toplogy may require more extensive refactoring if we want to include that always as well.

timeout = client.options.timeout
socket_timeout = client.options.pool_options.socket_timeout
connect_timeout = client.options.pool_options.connect_timeout
topology = client.topology_description.server_descriptions()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be client.topology_description which is what we include in ServerSelectionTimeoutError:

f"{self._error_message(selector)}, Timeout: {timeout}s, Topology Description: {self.description!r}"

connect_timeout = client.options.pool_options.connect_timeout
topology = client.topology_description.server_descriptions()
if timeout:
details["timeout"] = timeout * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make more sense to add _csot.get_timeout() here? That would reflect the last CSOT timeout that was set.

pymongo/pool.py Outdated
@@ -391,13 +396,14 @@ def _raise_connection_failure(
if msg_prefix:
msg = msg_prefix + msg
if isinstance(error, socket.timeout):
raise NetworkTimeout(msg) from error
print(f"DETAILS: {details}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errant print here.

pymongo/pool.py Outdated
timeout_details = None
if handler:
timeout_details = _get_timeout_details(handler.client)
_raise_connection_failure(self.address, error, details=timeout_details)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm is there a central place we could catch and re-raise these exceptions (rather than sprinkling format_timeout_details in various places)?

Copy link
Contributor Author

@NoahStapp NoahStapp Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd be able to access the _MongoClientErrorHandler whenever we catch a timeout exception. Are there existing limitations to our internal network interfaces that prevent us from passing an instance of the handler around more widely? That would allow us to add details to far more timeout errors as well as centralize the logic required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up @ShaneHarvey

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we do already wrap all operations with _MongoClientErrorHandler. Does that answer your question?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_MongoClientErrorHandler is only wrapped around _checkout and _run_operation in MongoClient, are those the only two operations relevant for this issue? If so, we could maybe move the logic in format_timeout_details into the error handler itself to centralize the processing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered in Zoom but summarizing here:
Yes that's true but it's better we add the info at the point the exception is raised to ensure that event publishing and (soon) command logging also include the same info. If we catch and reraise the exception in _MongoClientErrorHandler then the events/logs will see a different error message than the user.

@NoahStapp NoahStapp requested a review from ShaneHarvey October 12, 2023 22:16
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a few tests for this behavior (asserting the error message includes the expected timeout info)?

pymongo/pool.py Outdated
if timeout:
details["timeout"] = timeout * 1000
if socket_timeout:
details["socketTimeout"] = socket_timeout * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest socketTimeoutMS to match the URI option name.

pymongo/pool.py Outdated
if socket_timeout:
details["socketTimeout"] = socket_timeout * 1000
if connect_timeout:
details["connectTimeout"] = connect_timeout * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest connectTimeoutMS to match the URI option name.

pymongo/pool.py Outdated
socket_timeout = options.socket_timeout
connect_timeout = options.connect_timeout
if timeout:
details["timeout"] = timeout * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect "timeout" will be confusing to users. Perhaps "operation timeout"? Also if _csot.get_timeout is enabled then should we avoid adding socketTimeoutMS here since CSOT overrides socketTimeoutMS?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with operationTimeoutMS. Agreed, if CSOT is present then socketTimeoutMS should not be displayed.

@NoahStapp NoahStapp requested a review from ShaneHarvey October 17, 2023 23:22
@@ -1549,7 +1586,8 @@ def connect(self, handler: Optional[_MongoClientErrorHandler] = None) -> Connect
)

if isinstance(error, (IOError, OSError, SSLError)):
_raise_connection_failure(self.address, error)
details = _get_timeout_details(self.opts)
_raise_connection_failure(self.address, error, timeout_details=details)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see _configured_socket() also calls _raise_connection_failure(). Will this then add timeout_details to the error twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I'll add a check in _raise_connection_failure to ensure it doesn't add timeout_details if they're already in the message.


with self.assertRaises(Exception) as error:
client.db.t.find_one({"$where": delay(2)})
self.assertTrue(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line needs to be de-indented otherwise it never runs.


with self.assertRaises(Exception) as error:
client.db.t.find_one({"$where": delay(2)})
self.assertTrue("(configured timeouts: socketTimeoutMS: 1.0ms)" in str(error.exception))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here.

client = rs_or_single_client(timeoutMS=1)

with self.assertRaises(Exception) as error:
client.db.t.find_one({"$where": delay(2)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use pymongo.timeout() where instead of creating a new client.

pymongo/pool.py Outdated
socket_timeout = options.socket_timeout
connect_timeout = options.connect_timeout
if timeout:
details["operationTimeoutMS"] = timeout * 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest remaining this to "timeoutMS" to match the URI option name. operationTimeoutMS doesn't appear anywhere else so this could be confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it should match the URI option name. Is timeoutMS generally specific enough for users, or should we consider changing that option name to reflect its functionality better?


with self.assertRaises(Exception) as error:
client.db.t.find_one({"$where": delay(2)})
self.assertTrue(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

self.assertTrue("(configured timeouts: socketTimeoutMS: 1.0ms)" in str(error.exception))

def test_connection_timeout_message(self):
client = rs_or_single_client("badhost", connectTimeoutMS=1, serverSelectionTimeoutMS=10)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to use a failpoint to fail connection creation, otherwise this will result in a ServerSelectionTimeoutError, not a pool checkout error.

Copy link
Contributor Author

@NoahStapp NoahStapp Oct 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't failing connection creation result in a ServerSelectionTimeoutError? In this case we want to have the client successfully select the server and then timeout when it tries to execute a command, correct?

EDIT: Actually for this specific test, I believe we want it to timeout when the client tries to connect to the server but after it's selected it.

@NoahStapp NoahStapp requested a review from ShaneHarvey October 20, 2023 21:32
@@ -411,6 +411,41 @@ def find_one():
# maxConnecting = unbounded: 30+ connections in ~0.140+ seconds
print(len(pool.conns))

def test_csot_timeout_message(self):
with self.assertRaises(Exception) as error:
with timeout(0.001):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to increase this to 500ms to make sure there's enough time to actually run the command (instead of timing out in server selection for example).

self.assertTrue("(configured timeouts: timeoutMS: 1.0ms" in str(error.exception))

def test_socket_timeout_message(self):
client = rs_or_single_client(socketTimeoutMS=0.01)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

___________________ TestPooling.test_socket_timeout_message ____________________

self = <test.test_pooling.TestPooling testMethod=test_socket_timeout_message>

    def test_socket_timeout_message(self):
        client = rs_or_single_client(socketTimeoutMS=0.01)
    
>       with self.assertRaises(Exception) as error:
E       AssertionError: Exception not ### raised

Not sure how to explain the above error but could you increase the timeout to 500ms to make sure there's enough time to actually run the command (instead of timing out in server selection for example).

test/test_pooling.py Show resolved Hide resolved
def test_csot_timeout_message(self):
with self.assertRaises(Exception) as error:
with timeout(0.5):
self.client.db.t.find_one({"$where": delay(2)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is failing because the query ($where) will only be evaluated on non-empty collections. On missing/empty collections the server returns an empty result without executing the query. Same for test_socket_timeout_message below.

@NoahStapp NoahStapp requested a review from ShaneHarvey October 30, 2023 20:45
Copy link
Member

@ShaneHarvey ShaneHarvey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!

@NoahStapp NoahStapp merged commit 06653c9 into mongodb:master Oct 30, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants