-
-
Notifications
You must be signed in to change notification settings - Fork 31.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115514: Fix incomplete writes after close in asyncio._SelectorSocketTransport #128037
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
fe8b79c
to
113440a
Compare
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
9ec4d34
to
f55fcb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing! Some high-level triage comments, and then otherwise LGTM.
Misc/NEWS.d/next/Library/2024-12-17-16-48-02.gh-issue-115514.1yOJ7T.rst
Outdated
Show resolved
Hide resolved
ConnectionError("Connection is closed by peer")) | ||
try: | ||
super()._call_connection_lost(exc) | ||
finally: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good idea to add a comment explaining why we need this (and link the issue number as well).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean the try/finally? It calls user-supplied callback internally, I'm just doing the same thing the parent SelectorTransport
is:
cpython/Lib/asyncio/selector_events.py
Lines 904 to 916 in d0ecbdd
def _call_connection_lost(self, exc): | |
try: | |
if self._protocol_connected: | |
self._protocol.connection_lost(exc) | |
finally: | |
self._sock.close() | |
self._sock = None | |
self._protocol = None | |
self._loop = None | |
server = self._server | |
if server is not None: | |
server._detach(self) | |
self._server = None |
Do you think I should write a comment here about that?
# not called, we still have data in write buffer | ||
self.assertFalse(self.protocol.connection_lost.called) | ||
|
||
self.loop.writers[7]._run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handlers should not be run manually, please change test to not rely on it
Also I think this test should really be in ssl tests not here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, could you please suggest on how to do that? The TestLoop
used here does not process the writers when it is ran, so for example self.loop.run_until_complete(asyncio.sleep(0))
does not work.
Also I think this test should really be in ssl tests not here
I don't think so, this is behavior of the SocketTransport, SSL just happens to be the one prevalent user that relies on this behavior.
There already happens to be a more real-world-like test in SSL that almost triggers the bad path here, but I would somehow have to make socket.send
not consume the whole data
to actually trigger this bug. IMHO its nicer to have the more artificial test here that is easier to understand and triggers the bugged path reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already happens to be a more real-world-like test in SSL that almost triggers the bad path here, but I would somehow have to make socket.send not consume the whole data to actually trigger this bug. IMHO its nicer to have the more artificial test here that is easier to understand and triggers the bugged path reliably.
You can use mock for socket.send
to trigger the bug right? That would be a more realistic test than this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, the socket still needs to work because it sends data through it. I can shove this there however, artificially limit the socket buffer size which triggers the problem on my computer.
I'm not convinced this is really better, but I can do that in addition to or instead of the mock-based test, what do you think?
index 125a6c35793..b694b4d38db 100644
--- a/Lib/test/test_asyncio/test_ssl.py
+++ b/Lib/test/test_asyncio/test_ssl.py
@@ -12,6 +12,7 @@
import tempfile
import threading
import time
+import unittest.mock
import weakref
import unittest
@@ -1410,10 +1411,22 @@ async def client(addr):
except (BrokenPipeError, ConnectionResetError):
pass
- await future
+ socket_transport = writer.transport._ssl_protocol._transport
- writer.close()
- await self.wait_closed(writer)
+ def _shrink_sock_buffer(data):
+ if socket_transport._read_ready_cb is None:
+ socket_transport._sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1)
+ return unittest.mock.DEFAULT
+
+ with unittest.mock.patch.object(
+ socket_transport, "write",
+ wraps=socket_transport.write,
+ side_effect=_shrink_sock_buffer
+ ):
+ await future
+
+ writer.close()
+ await self.wait_closed(writer)
def run(meth):
def wrapper(sock):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think limiting the buffer with SO_SNDBUF
makes sense, you can perhaps access the socket using get_extra_info
method as well instead of private attrs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in latest commit. Still kept the original mocked test I added - the interaction from SSLProtocol
is really hard to track down, and this benefits from having a more straightforward illustration of what is expected from SocketTransport, even if it is all mocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test isn't failing for me even if I remove your fix in ssl test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On what platform, and what exactly are you changing? On MacOS,while on the branch from this PR, I only have to roll back the fix in selector_events.py
like this to make both new tests fail:
EDIT: ha, just tested on Linux and it does succeed there anyway - probably the socket behaves differently. Will debug it and let you know.
diff --git a/Lib/asyncio/selector_events.py b/Lib/asyncio/selector_events.py
index 22147451fa7..60bef420331 100644
--- a/Lib/asyncio/selector_events.py
+++ b/Lib/asyncio/selector_events.py
@@ -1188,7 +1188,7 @@ def _call_connection_lost(self, exc):
try:
super()._call_connection_lost(exc)
finally:
- self._write_ready = None
+ #self._write_ready = None
if self._empty_waiter is not None:
self._empty_waiter.set_exception(
ConnectionError("Connection is closed by peer"))
@@ -1206,6 +1206,7 @@ def _reset_empty_waiter(self):
def close(self):
self._read_ready_cb = None
+ self._write_ready = None
super().close()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, it's fixed now - problem was that on Linux, the minimum write buffer size is 1024, and it can only be set before the socket is connected anyway.
I switched it to use a wrapper class that fakes the full buffer by only writing half the data.
On linux, the ssl test was also not deterministic, it was not sending enough data and they actually got through before close
call most of the time, so I doubled the size and now I can reliably trigger the bug with them on both Linux and MacOS.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Change introduced in pythongh-106503 cause the transport to miss writes scheduled after it is closed. Signed-off-by: Vojtěch Boček <vbocek@gmail.com>
f55fcb2
to
9a7be93
Compare
…ing_data_on_slow_socket
I have made the requested changes; please review again |
Thanks for making the requested changes! @kumaraditya303: please review the changes made to this pull request. |
Thanks @Tasssadar for the PR, and @kumaraditya303 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13. |
… asyncio(pythonGH-128037) (cherry picked from commit 4e38eea) Co-authored-by: Vojtěch Boček <vbocek@gmail.com> Co-authored-by: Kumar Aditya <kumaraditya@python.org>
GH-129581 is a backport of this pull request to the 3.13 branch. |
… asyncio(pythonGH-128037) (cherry picked from commit 4e38eea) Co-authored-by: Vojtěch Boček <vbocek@gmail.com> Co-authored-by: Kumar Aditya <kumaraditya@python.org>
GH-129582 is a backport of this pull request to the 3.12 branch. |
… asyncio(python#128037) Co-authored-by: Kumar Aditya <kumaraditya@python.org>
was this fixed in python 3.12.9 ? I have the feeling I am still experiencing the issue in high concurrency scenario with redis client |
Yes, this was fixed in 3.12.9. Your stack trace is different though, this looks like more like "writing after close" scenario rather than bug in Python. |
Change introduced in gh-106503 cause the transport to miss writes scheduled after it is closed.
Issue: #115514
Please backport this to 3.12 and 3.13 branch as well.