Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious "Exception in callback None()" after upgrading to 3.12 #115514

Closed
Tasssadar opened this issue Feb 15, 2024 · 9 comments
Closed

Spurious "Exception in callback None()" after upgrading to 3.12 #115514

Tasssadar opened this issue Feb 15, 2024 · 9 comments
Labels
stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error

Comments

@Tasssadar
Copy link
Contributor

Tasssadar commented Feb 15, 2024

Bug report

Bug description:

Hello, after upgrading our application to 3.12.1 that uses aiohttp to download stuff via HTTPS as well as connects to some RabbitMQ servers via ssl, we get this spurious exception to stderr, dozens-hundres of them in close succession.

They happen after several minutes of running (so after dozens of ssl connections are used).

Unfortunately, I can't tell what is the culprit, the trace contains nothing other than the runtime :( I can provide more info if you give me some pointers on where to look. It looks like write-after-close, but the "writing" is done by internal asyncio mechanisms.

Exception in callback None()
handle: <Handle created at /usr/local/lib/python3.12/asyncio/selector_events.py:313>
source_traceback: Object created at (most recent call last):
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 671, in run_until_complete
    self.run_forever()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 638, in run_forever
    self._run_once()
  File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1963, in _run_once
    handle._run()
  File "/usr/local/lib/python3.12/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 1111, in _write_sendmsg
    self._maybe_resume_protocol()  # May append to buffer.
  File "/usr/local/lib/python3.12/asyncio/transports.py", line 300, in _maybe_resume_protocol
    self._protocol.resume_writing()
  File "/usr/local/lib/python3.12/asyncio/sslproto.py", line 907, in resume_writing
    self._process_outgoing()
  File "/usr/local/lib/python3.12/asyncio/sslproto.py", line 715, in _process_outgoing
    self._transport.write(data)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 1084, in write
    self._loop._add_writer(self._sock_fd, self._write_ready)
  File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 313, in _add_writer
    handle = events.Handle(callback, args, self, None)

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

@Tasssadar Tasssadar added the type-bug An unexpected behavior, bug, or error label Feb 15, 2024
@terryjreedy
Copy link
Member

3.12.2 was released a week ago. Try that?

Perhaps you should ask on https://discuss.python.org/c/users/7 as it seems at least half likely that the bug is in aiohttp or your code as in asyncio.

@gvanrossum
Copy link
Member

There is nothing I can do for you unless you debug this further, sorry. Above suggestions are right.

@gvanrossum gvanrossum added the pending The issue will be closed if no feedback is provided label Feb 16, 2024
@kumaraditya303 kumaraditya303 closed this as not planned Won't fix, can't repro, duplicate, stale May 27, 2024
@github-project-automation github-project-automation bot moved this from Todo to Done in asyncio May 27, 2024
@kumaraditya303
Copy link
Contributor

Closing because of no further comments and reproducer.

@Tasssadar
Copy link
Contributor Author

Tasssadar commented Dec 17, 2024

Hello, sorry for very late response - unfortunately, this is very hard to reproduce reliably because it only happens in high-traffic situations with complex, closed-source apps.

I now think that this is caused by 3.12 switch to optional sendmsg in asyncio and #106504

The code in

def close(self):
if self._closing:
return
self._closing = True
self._loop._remove_reader(self._sock_fd)
if not self._buffer:
self._conn_lost += 1
self._loop._remove_writer(self._sock_fd)
self._loop.call_soon(self._call_connection_lost, None)
clearly suggests that if close is called while the buffer is not empty, we should keep writing - but the PR breaks that, preventing the buffer to be written immeaditelly after close is called.

I think the reference to _write_ready should be erased in _call_connection_lost instead, where the connection is truly closed.

Now, I have no idea how is it possible that other people are not running into this. Perhaps it is caused by our internal company network environment or something, but the behavior here clearly changed in 3.12.

I will prepare a PR with fix and a test, can you please re-open this issue?

@gvanrossum
Copy link
Member

I have no opinion on this, because how do I know that the bug isn’t in your code? But I will reopen the issue — clearly we need to consider that it is on our side.

@gvanrossum gvanrossum reopened this Dec 17, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in asyncio Dec 17, 2024
@picnixz picnixz added the stdlib Python modules in the Lib dir label Dec 17, 2024
@Tasssadar
Copy link
Contributor Author

Tasssadar commented Dec 18, 2024

I spent a few hours trying to prepare some reasonable reproduction code, but it does not seem possible - the corner case depends on particular network state. I did try at least outlining the state in the test in the PR though.

But I think it boils down to a simple question - should Transport.write() after Transport.close() throw an exception? If not, as I'm convinced after looking at the code already present, then I think my fix is correct, and it does not really matter what my code is doing.

@asvetlov
Copy link
Contributor

asvetlov commented Dec 18, 2024

Yes, Transport.write() should raise RuntimeError if it is called after Transport.close().
Currently, Transport.write() raises RuntimeError after Transport.write_eof(), why is Transport.close() different?
In other words, why is half-closing stricter than full-closing?

@Tasssadar
Copy link
Contributor Author

Tasssadar commented Dec 18, 2024

The .close() documentation does say that The transport should not be used once it is closed..

However, there is pretty complex interaction with SSLProtocol, where writes are scheduled just before its shutdown, and eventually the one originating from SSLProtocol._process_outgoing is where I'm getting these exceptions from.

I've dug even more, and it looks like the interaction is this:

  1. We open a HTTPS connection to a server
  2. Start uploading file using standard PUT request
  3. The server decides based on headers that it does not want this request, sends response and request TCP connection close (sends FIN)
  4. SSLProtocol enters shutdown state, actual trace of that looks like [1]
  5. We are in the middle of uploading the file, meaning the SSLProtocol has data in flight in the ssl encryption and SocketTransport write buffer is full. SSLProtocol is also paused due to flow control.
  6. SocketTransport buffer is written out, SSLProtocol.resume_writing -> SSLProtocol._process_outgoing -> SocketTransport.write is called

Indeed it does not seem to make sense to keep writing SSLProtocol buffer out after it has been shutdown, so perhaps more sensible fix is this in SSLProtocol:

diff --git a/Lib/asyncio/sslproto.py b/Lib/asyncio/sslproto.py
index 74c5f0d5ca0..1c05bb3e47a 100644
--- a/Lib/asyncio/sslproto.py
+++ b/Lib/asyncio/sslproto.py
@@ -716,6 +716,9 @@ def _do_write(self):
         self._process_outgoing()

     def _process_outgoing(self):
+        if self._transport.is_closing():
+            return
+
         if not self._ssl_writing_paused:
             data = self._outgoing.read()
             if len(data):

EDIT: except there is a test that checks exactly this, it verifies that whole SSLProtocol buffer is sent after shutdown:

def test_remote_shutdown_receives_trailing_data(self):
CHUNK = 1024 * 128
SIZE = 32
sslctx = self._create_server_ssl_context(
test_utils.ONLYCERT,
test_utils.ONLYKEY
)
client_sslctx = self._create_client_ssl_context()
future = None
def server(sock):
incoming = ssl.MemoryBIO()
outgoing = ssl.MemoryBIO()
sslobj = sslctx.wrap_bio(incoming, outgoing, server_side=True)
while True:
try:
sslobj.do_handshake()
except ssl.SSLWantReadError:
if outgoing.pending:
sock.send(outgoing.read())
incoming.write(sock.recv(16384))
else:
if outgoing.pending:
sock.send(outgoing.read())
break
while True:
try:
data = sslobj.read(4)
except ssl.SSLWantReadError:
incoming.write(sock.recv(16384))
else:
break
self.assertEqual(data, b'ping')
sslobj.write(b'pong')
sock.send(outgoing.read())
time.sleep(0.2) # wait for the peer to fill its backlog
# send close_notify but don't wait for response
with self.assertRaises(ssl.SSLWantReadError):
sslobj.unwrap()
sock.send(outgoing.read())
# should receive all data
data_len = 0
while True:
try:
chunk = len(sslobj.read(16384))
data_len += chunk
except ssl.SSLWantReadError:
incoming.write(sock.recv(16384))
except ssl.SSLZeroReturnError:
break
self.assertEqual(data_len, CHUNK * SIZE)
# verify that close_notify is received
sslobj.unwrap()
sock.close()
def eof_server(sock):
sock.starttls(sslctx, server_side=True)
self.assertEqual(sock.recv_all(4), b'ping')
sock.send(b'pong')
time.sleep(0.2) # wait for the peer to fill its backlog
# send EOF
sock.shutdown(socket.SHUT_WR)
# should receive all data
data = sock.recv_all(CHUNK * SIZE)
self.assertEqual(len(data), CHUNK * SIZE)
sock.close()
async def client(addr):
nonlocal future
future = self.loop.create_future()
reader, writer = await asyncio.open_connection(
*addr,
ssl=client_sslctx,
server_hostname='')
writer.write(b'ping')
data = await reader.readexactly(4)
self.assertEqual(data, b'pong')
# fill write backlog in a hacky way - renegotiation won't help
for _ in range(SIZE):
writer.transport._test__append_write_backlog(b'x' * CHUNK)
try:
data = await reader.read()
self.assertEqual(data, b'')
except (BrokenPipeError, ConnectionResetError):
pass
await future
writer.close()
await self.wait_closed(writer)
def run(meth):
def wrapper(sock):
try:
meth(sock)
except Exception as ex:
self.loop.call_soon_threadsafe(future.set_exception, ex)
else:
self.loop.call_soon_threadsafe(future.set_result, None)
return wrapper
with self.tcp_server(run(server)) as srv:
self.loop.run_until_complete(client(srv.addr))
with self.tcp_server(run(eof_server)) as srv:
self.loop.run_until_complete(client(srv.addr))

For this test to pass, I think write after close has to work. This test actually mimics my scenario except the final write fits into kernel socket buffer, so the self._loop._add_writer is not called.

Currently, Transport.write() raises RuntimeError after Transport.write_eof(), why is Transport.close() different?

I think the difference is that with write_eof you're intentionally closing the connection from your end, whereas close gets called during close initiated by the other side - in which case, you want to still send your buffers.

[1]:

``` File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) File "/usr/local/lib/python3.12/asyncio/base_events.py", line 674, in run_until_complete self.run_forever() File "/usr/local/lib/python3.12/asyncio/base_events.py", line 641, in run_forever self._run_once() File "/usr/local/lib/python3.12/asyncio/base_events.py", line 1978, in _run_once handle._run() File "/usr/local/lib/python3.12/asyncio/events.py", line 88, in _run self._context.run(self._callback, *self._args) File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 970, in _read_ready self._read_ready_cb() File "/usr/local/lib/python3.12/asyncio/selector_events.py", line 1002, in _read_ready__get_buffer self._protocol.buffer_updated(nbytes) File "/usr/local/lib/python3.12/asyncio/sslproto.py", line 442, in buffer_updated self._do_read() File "/usr/local/lib/python3.12/asyncio/sslproto.py", line 737, in _do_read self._do_read__copied() File "/usr/local/lib/python3.12/asyncio/sslproto.py", line 803, in _do_read__copied self._start_shutdown() File "/usr/local/lib/python3.12/asyncio/sslproto.py", line 628, in _start_shutdown self._do_flush() File "/usr/local/lib/python3.12/asyncio/sslproto.py", line 642, in _do_flush self._set_state(SSLProtocolState.SHUTDOWN) ```

@stalkerg
Copy link

I think we have the same issue on Python 3.12.2, during heavy upload to AWS S3 by aiobotocore which use aiohttp client with SSL. We are using short live sessions and I believe S3 time to time close TCP connections to update session. I will try to find a way to test a PR from @Tasssadar

@picnixz picnixz removed the pending The issue will be closed if no feedback is provided label Dec 27, 2024
Tasssadar added a commit to Tasssadar/cpython that referenced this issue Jan 20, 2025
kumaraditya303 added a commit that referenced this issue Feb 2, 2025
#128037)

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 2, 2025
… asyncio(pythonGH-128037)

(cherry picked from commit 4e38eea)

Co-authored-by: Vojtěch Boček <vbocek@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 2, 2025
… asyncio(pythonGH-128037)

(cherry picked from commit 4e38eea)

Co-authored-by: Vojtěch Boček <vbocek@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
kumaraditya303 added a commit that referenced this issue Feb 2, 2025
…n asyncio(GH-128037) (#129582)

gh-115514: Fix incomplete writes after close while using ssl in asyncio(GH-128037)

(cherry picked from commit 4e38eea)

Co-authored-by: Vojtěch Boček <vbocek@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
kumaraditya303 added a commit that referenced this issue Feb 2, 2025
…n asyncio(GH-128037) (#129581)

gh-115514: Fix incomplete writes after close while using ssl in asyncio(GH-128037)

(cherry picked from commit 4e38eea)

Co-authored-by: Vojtěch Boček <vbocek@gmail.com>
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
@github-project-automation github-project-automation bot moved this from In Progress to Done in asyncio Feb 2, 2025
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this issue Feb 7, 2025
… asyncio(python#128037)

Co-authored-by: Kumar Aditya <kumaraditya@python.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
Status: Done
Development

No branches or pull requests

8 participants