-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PYTHON-3926 Add more information to connection errors and timeouts #1375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pymongo/common.py
Outdated
if "socketTimeout" in details: | ||
result += f" socketTimeout: {details['socketTimeout']}ms," | ||
if "timeout" in details: | ||
result += f" timeout: {details['timeout']}ms," | ||
if "connectTimeout" in details: | ||
result += f" connectTimeout: {details['connectTimeout']}ms," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT:
if "socketTimeout" in details: | |
result += f" socketTimeout: {details['socketTimeout']}ms," | |
if "timeout" in details: | |
result += f" timeout: {details['timeout']}ms," | |
if "connectTimeout" in details: | |
result += f" connectTimeout: {details['connectTimeout']}ms," | |
for timeout in ["socketTimeout", "timeout", "connectTimeout"]: | |
if timeout in details: | |
result += f" {timeout}: {details[timeout]}ms," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, this looks great and I love how concise an improvement it is!
Left some commentary, but only one blocking comment around topology
I just noticed that this doesn't cover the Connection socket timeouts like mongo-python-driver/pymongo/pool.py Line 956 in db526f7
and mongo-python-driver/pymongo/pool.py Line 972 in db526f7
and mongo-python-driver/pymongo/pool.py Line 982 in db526f7
We already have |
pymongo/common.py
Outdated
timeout = client.options.timeout | ||
socket_timeout = client.options.pool_options.socket_timeout | ||
connect_timeout = client.options.pool_options.connect_timeout | ||
topology = client.topology_description.server_descriptions() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be client.topology_description
which is what we include in ServerSelectionTimeoutError:
mongo-python-driver/pymongo/topology.py
Line 259 in 6f4e617
f"{self._error_message(selector)}, Timeout: {timeout}s, Topology Description: {self.description!r}" |
pymongo/common.py
Outdated
connect_timeout = client.options.pool_options.connect_timeout | ||
topology = client.topology_description.server_descriptions() | ||
if timeout: | ||
details["timeout"] = timeout * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make more sense to add _csot.get_timeout() here? That would reflect the last CSOT timeout that was set.
pymongo/pool.py
Outdated
msg = msg_prefix + msg | ||
if isinstance(error, socket.timeout): | ||
raise NetworkTimeout(msg) from error | ||
print(f"DETAILS: {details}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
errant print here.
pymongo/pool.py
Outdated
timeout_details = None | ||
if handler: | ||
timeout_details = _get_timeout_details(handler.client) | ||
_raise_connection_failure(self.address, error, details=timeout_details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm is there a central place we could catch and re-raise these exceptions (rather than sprinkling format_timeout_details in various places)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we'd be able to access the _MongoClientErrorHandler
whenever we catch a timeout exception. Are there existing limitations to our internal network interfaces that prevent us from passing an instance of the handler around more widely? That would allow us to add details to far more timeout errors as well as centralize the logic required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following up @ShaneHarvey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we do already wrap all operations with _MongoClientErrorHandler. Does that answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_MongoClientErrorHandler
is only wrapped around _checkout
and _run_operation
in MongoClient, are those the only two operations relevant for this issue? If so, we could maybe move the logic in format_timeout_details
into the error handler itself to centralize the processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Answered in Zoom but summarizing here:
Yes that's true but it's better we add the info at the point the exception is raised to ensure that event publishing and (soon) command logging also include the same info. If we catch and reraise the exception in _MongoClientErrorHandler then the events/logs will see a different error message than the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a few tests for this behavior (asserting the error message includes the expected timeout info)?
pymongo/pool.py
Outdated
if timeout: | ||
details["timeout"] = timeout * 1000 | ||
if socket_timeout: | ||
details["socketTimeout"] = socket_timeout * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest socketTimeoutMS to match the URI option name.
pymongo/pool.py
Outdated
if socket_timeout: | ||
details["socketTimeout"] = socket_timeout * 1000 | ||
if connect_timeout: | ||
details["connectTimeout"] = connect_timeout * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest connectTimeoutMS to match the URI option name.
pymongo/pool.py
Outdated
socket_timeout = options.socket_timeout | ||
connect_timeout = options.connect_timeout | ||
if timeout: | ||
details["timeout"] = timeout * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect "timeout" will be confusing to users. Perhaps "operation timeout"? Also if _csot.get_timeout is enabled then should we avoid adding socketTimeoutMS here since CSOT overrides socketTimeoutMS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with operationTimeoutMS
. Agreed, if CSOT is present then socketTimeoutMS should not be displayed.
if isinstance(error, (IOError, OSError, SSLError)): | ||
_raise_connection_failure(self.address, error) | ||
details = _get_timeout_details(self.opts) | ||
_raise_connection_failure(self.address, error, timeout_details=details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see _configured_socket() also calls _raise_connection_failure(). Will this then add timeout_details to the error twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'll add a check in _raise_connection_failure
to ensure it doesn't add timeout_details
if they're already in the message.
test/test_pooling.py
Outdated
|
||
with self.assertRaises(Exception) as error: | ||
client.db.t.find_one({"$where": delay(2)}) | ||
self.assertTrue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line needs to be de-indented otherwise it never runs.
test/test_pooling.py
Outdated
|
||
with self.assertRaises(Exception) as error: | ||
client.db.t.find_one({"$where": delay(2)}) | ||
self.assertTrue("(configured timeouts: socketTimeoutMS: 1.0ms)" in str(error.exception)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here.
test/test_pooling.py
Outdated
client = rs_or_single_client(timeoutMS=1) | ||
|
||
with self.assertRaises(Exception) as error: | ||
client.db.t.find_one({"$where": delay(2)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use pymongo.timeout() where instead of creating a new client.
pymongo/pool.py
Outdated
socket_timeout = options.socket_timeout | ||
connect_timeout = options.connect_timeout | ||
if timeout: | ||
details["operationTimeoutMS"] = timeout * 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest remaining this to "timeoutMS" to match the URI option name. operationTimeoutMS doesn't appear anywhere else so this could be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed it should match the URI option name. Is timeoutMS
generally specific enough for users, or should we consider changing that option name to reflect its functionality better?
test/test_pooling.py
Outdated
|
||
with self.assertRaises(Exception) as error: | ||
client.db.t.find_one({"$where": delay(2)}) | ||
self.assertTrue( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
test/test_pooling.py
Outdated
self.assertTrue("(configured timeouts: socketTimeoutMS: 1.0ms)" in str(error.exception)) | ||
|
||
def test_connection_timeout_message(self): | ||
client = rs_or_single_client("badhost", connectTimeoutMS=1, serverSelectionTimeoutMS=10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might need to use a failpoint to fail connection creation, otherwise this will result in a ServerSelectionTimeoutError, not a pool checkout error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't failing connection creation result in a ServerSelectionTimeoutError
? In this case we want to have the client successfully select the server and then timeout when it tries to execute a command, correct?
EDIT: Actually for this specific test, I believe we want it to timeout when the client tries to connect to the server but after it's selected it.
test/test_pooling.py
Outdated
|
||
def test_csot_timeout_message(self): | ||
with self.assertRaises(Exception) as error: | ||
with timeout(0.001): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll need to increase this to 500ms to make sure there's enough time to actually run the command (instead of timing out in server selection for example).
test/test_pooling.py
Outdated
self.assertTrue("(configured timeouts: timeoutMS: 1.0ms" in str(error.exception)) | ||
|
||
def test_socket_timeout_message(self): | ||
client = rs_or_single_client(socketTimeoutMS=0.01) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
___________________ TestPooling.test_socket_timeout_message ____________________
self = <test.test_pooling.TestPooling testMethod=test_socket_timeout_message>
def test_socket_timeout_message(self):
client = rs_or_single_client(socketTimeoutMS=0.01)
> with self.assertRaises(Exception) as error:
E AssertionError: Exception not ### raised
Not sure how to explain the above error but could you increase the timeout to 500ms to make sure there's enough time to actually run the command (instead of timing out in server selection for example).
test/test_pooling.py
Outdated
def test_csot_timeout_message(self): | ||
with self.assertRaises(Exception) as error: | ||
with timeout(0.5): | ||
self.client.db.t.find_one({"$where": delay(2)}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is failing because the query ($where) will only be evaluated on non-empty collections. On missing/empty collections the server returns an empty result without executing the query. Same for test_socket_timeout_message below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests!
No description provided.