Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-115514: Fix incomplete writes after close in asyncio._SelectorSocketTransport #128037

Merged
merged 7 commits into from
Feb 2, 2025

Conversation

Tasssadar
Copy link
Contributor

@Tasssadar Tasssadar commented Dec 17, 2024

Change introduced in gh-106503 cause the transport to miss writes scheduled after it is closed.

Issue: #115514

Please backport this to 3.12 and 3.13 branch as well.

Copy link

cpython-cla-bot bot commented Dec 17, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Dec 17, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@bedevere-app
Copy link

bedevere-app bot commented Dec 17, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@Tasssadar Tasssadar force-pushed the fix-socket-writes branch 2 times, most recently from 9ec4d34 to f55fcb2 Compare December 18, 2024 09:13
@ZeroIntensity ZeroIntensity added topic-asyncio needs backport to 3.12 bug and security fixes needs backport to 3.13 bugs and security fixes labels Dec 23, 2024
@ZeroIntensity ZeroIntensity self-requested a review December 26, 2024 08:53
Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing! Some high-level triage comments, and then otherwise LGTM.

ConnectionError("Connection is closed by peer"))
try:
super()._call_connection_lost(exc)
finally:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good idea to add a comment explaining why we need this (and link the issue number as well).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the try/finally? It calls user-supplied callback internally, I'm just doing the same thing the parent SelectorTransport is:

def _call_connection_lost(self, exc):
try:
if self._protocol_connected:
self._protocol.connection_lost(exc)
finally:
self._sock.close()
self._sock = None
self._protocol = None
self._loop = None
server = self._server
if server is not None:
server._detach(self)
self._server = None

Do you think I should write a comment here about that?

# not called, we still have data in write buffer
self.assertFalse(self.protocol.connection_lost.called)

self.loop.writers[7]._run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handlers should not be run manually, please change test to not rely on it

Also I think this test should really be in ssl tests not here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, could you please suggest on how to do that? The TestLoop used here does not process the writers when it is ran, so for example self.loop.run_until_complete(asyncio.sleep(0)) does not work.

Also I think this test should really be in ssl tests not here

I don't think so, this is behavior of the SocketTransport, SSL just happens to be the one prevalent user that relies on this behavior.

There already happens to be a more real-world-like test in SSL that almost triggers the bad path here, but I would somehow have to make socket.send not consume the whole data to actually trigger this bug. IMHO its nicer to have the more artificial test here that is easier to understand and triggers the bugged path reliably.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already happens to be a more real-world-like test in SSL that almost triggers the bad path here, but I would somehow have to make socket.send not consume the whole data to actually trigger this bug. IMHO its nicer to have the more artificial test here that is easier to understand and triggers the bugged path reliably.

You can use mock for socket.send to trigger the bug right? That would be a more realistic test than this one

Copy link
Contributor Author

@Tasssadar Tasssadar Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, the socket still needs to work because it sends data through it. I can shove this there however, artificially limit the socket buffer size which triggers the problem on my computer.

I'm not convinced this is really better, but I can do that in addition to or instead of the mock-based test, what do you think?

index 125a6c35793..b694b4d38db 100644
--- a/Lib/test/test_asyncio/test_ssl.py
+++ b/Lib/test/test_asyncio/test_ssl.py
@@ -12,6 +12,7 @@
 import tempfile
 import threading
 import time
+import unittest.mock
 import weakref
 import unittest

@@ -1410,10 +1411,22 @@ async def client(addr):
             except (BrokenPipeError, ConnectionResetError):
                 pass

-            await future
+            socket_transport = writer.transport._ssl_protocol._transport

-            writer.close()
-            await self.wait_closed(writer)
+            def _shrink_sock_buffer(data):
+                if socket_transport._read_ready_cb is None:
+                    socket_transport._sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1)
+                return unittest.mock.DEFAULT
+
+            with unittest.mock.patch.object(
+                socket_transport, "write",
+                wraps=socket_transport.write,
+                side_effect=_shrink_sock_buffer
+            ):
+                await future
+
+                writer.close()
+                await self.wait_closed(writer)

         def run(meth):
             def wrapper(sock):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think limiting the buffer with SO_SNDBUF makes sense, you can perhaps access the socket using get_extra_info method as well instead of private attrs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in latest commit. Still kept the original mocked test I added - the interaction from SSLProtocol is really hard to track down, and this benefits from having a more straightforward illustration of what is expected from SocketTransport, even if it is all mocks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test isn't failing for me even if I remove your fix in ssl test

Copy link
Contributor Author

@Tasssadar Tasssadar Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what platform, and what exactly are you changing? On MacOS,while on the branch from this PR, I only have to roll back the fix in selector_events.py like this to make both new tests fail:

EDIT: ha, just tested on Linux and it does succeed there anyway - probably the socket behaves differently. Will debug it and let you know.

diff --git a/Lib/asyncio/selector_events.py b/Lib/asyncio/selector_events.py
index 22147451fa7..60bef420331 100644
--- a/Lib/asyncio/selector_events.py
+++ b/Lib/asyncio/selector_events.py
@@ -1188,7 +1188,7 @@ def _call_connection_lost(self, exc):
         try:
             super()._call_connection_lost(exc)
         finally:
-            self._write_ready = None
+            #self._write_ready = None
             if self._empty_waiter is not None:
                 self._empty_waiter.set_exception(
                     ConnectionError("Connection is closed by peer"))
@@ -1206,6 +1206,7 @@ def _reset_empty_waiter(self):

     def close(self):
         self._read_ready_cb = None
+        self._write_ready = None
         super().close()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, it's fixed now - problem was that on Linux, the minimum write buffer size is 1024, and it can only be set before the socket is connected anyway.

I switched it to use a wrapper class that fakes the full buffer by only writing half the data.

On linux, the ssl test was also not deterministic, it was not sending enough data and they actually got through before close call most of the time, so I doubled the size and now I can reliably trigger the bug with them on both Linux and MacOS.

@bedevere-app
Copy link

bedevere-app bot commented Jan 10, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Change introduced in pythongh-106503 cause the transport to miss writes
scheduled after it is closed.

Signed-off-by: Vojtěch Boček <vbocek@gmail.com>
@Tasssadar
Copy link
Contributor Author

I have made the requested changes; please review again

@bedevere-app
Copy link

bedevere-app bot commented Jan 27, 2025

Thanks for making the requested changes!

@kumaraditya303: please review the changes made to this pull request.

@bedevere-app bedevere-app bot requested a review from kumaraditya303 January 27, 2025 12:54
@kumaraditya303 kumaraditya303 enabled auto-merge (squash) February 2, 2025 14:11
@kumaraditya303 kumaraditya303 merged commit 4e38eea into python:main Feb 2, 2025
39 checks passed
@miss-islington-app
Copy link

Thanks @Tasssadar for the PR, and @kumaraditya303 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.12, 3.13.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 2, 2025
… asyncio(pythonGH-128037)

(cherry picked from commit 4e38eea)

Co-authored-by: Vojtěch Boček <vbocek@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@bedevere-app
Copy link

bedevere-app bot commented Feb 2, 2025

GH-129581 is a backport of this pull request to the 3.13 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.13 bugs and security fixes label Feb 2, 2025
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 2, 2025
… asyncio(pythonGH-128037)

(cherry picked from commit 4e38eea)

Co-authored-by: Vojtěch Boček <vbocek@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@bedevere-app
Copy link

bedevere-app bot commented Feb 2, 2025

GH-129582 is a backport of this pull request to the 3.12 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.12 bug and security fixes label Feb 2, 2025
kumaraditya303 added a commit that referenced this pull request Feb 2, 2025
…n asyncio(GH-128037) (#129582)

gh-115514: Fix incomplete writes after close while using ssl in asyncio(GH-128037)

(cherry picked from commit 4e38eea)

Co-authored-by: Vojtěch Boček <vbocek@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
kumaraditya303 added a commit that referenced this pull request Feb 2, 2025
…n asyncio(GH-128037) (#129581)

gh-115514: Fix incomplete writes after close while using ssl in asyncio(GH-128037)

(cherry picked from commit 4e38eea)

Co-authored-by: Vojtěch Boček <vbocek@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Feb 7, 2025
… asyncio(python#128037)

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@aelamran
Copy link

aelamran commented Mar 6, 2025

 Traceback (most recent call last):
  File "/opt/venv/lib/python3.12/site-packages/redis/asyncio/client.py", line 616, in execute_command
    return await conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
    return await do()
           ^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/redis/asyncio/client.py", line 589, in _send_command_parse_response
    await conn.send_command(*args)
  File "/opt/venv/lib/python3.12/site-packages/redis/asyncio/connection.py", line 505, in send_command
    await self.send_packed_command(
  File "/opt/venv/lib/python3.12/site-packages/redis/asyncio/connection.py", line 480, in send_packed_command
    self._writer.writelines(command)
  File "/usr/local/lib/python3.12/asyncio/streams.py", line 349, in writelines
    self._transport.writelines(data)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 1182, in writelines
    self._write_ready()
TypeError: 'NoneType' object is not callable 

was this fixed in python 3.12.9 ? I have the feeling I am still experiencing the issue in high concurrency scenario with redis client

@Tasssadar
Copy link
Contributor Author

Yes, this was fixed in 3.12.9. Your stack trace is different though, this looks like more like "writing after close" scenario rather than bug in Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants