-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
open_websocket context manager doesn't exit if there are pending receives #74
Comments
(it's surprising to me, and I couldn't find reference to this behavior in the docs) |
Hmm, I don't know why it's doing that. It definitely should not hang there. I will look into further. |
In the test run at #77, I'm observing that the server exits its aclose() first, and the client aclose() blocks indefinitely on |
debug log
|
The reader task is processing handlers sequentially. Since the TextReceived handler is waiting for the echoed message to be consumed, the reader task never gets the ConnectionClosed event. Besides fixing that, it seems like aclose should have some timeout when waiting for ConnectionClosed. |
Thanks for debugging this! We can alleviate this by increasing the size of the memory channel, but it will always be possible for the reader task to get blocked by unread messages. I suggest that we make the size of the memory channel a configurable setting and a maximum size per message. I think Nathaniel already suggested this somewhere in another open issue. It's also how Aaugstin's websockets library handles userspace buffering:
The only change I would make to this is to remove We should also add a note in the documentation that highlights this issue, because it's a surprise to people who don't frequently write networking code.
This is a good idea. Timeouts in general have not been adequately addressed in this library to date. I opened up #64 to work on timeouts throughout the library. |
In fact Trio channels support
It seems like it might be possible to unblock it when shutting down, though? Note that if you close a channel handle, then any tasks that are blocked on that handle immediately resume with a |
Between the two (buffering and timeout control), the latter is higher priority for me-- specifically the guarantee that aclose() doesn't take an inordinate amount of time. This issue is blocking migration of my app's ws client to trio-websocket. (I would not want to configure a large queue, and given reasonable queue sizes there is always the case that it will get filled.) Nathaniel's suggestion sounds promising. I think it means that aclose() should close the receive channel used by get_message() before waiting on the close handshake. Either the data-received handler or reader task would then need to deal with ClosedResourceError if we want to advance to the ConnectionClosed event-- I'm not sure how that should look. |
As described in the issue, get_message() was raising connection closed even if there were pending messages. Per Nathaniel's suggestion, the proper behavior is this: 1. If the remote endpoint closed the connection, then the local endpoint may continue reading all messages sent prior to closing. 2. If the local endpoint closed the connection, then the local endpoint may not read any more messages. I added tests for these two conditions and implemented the behavior by closing the ReceiveChannel inside the connection's `aclose()`. This requires a bit of additional exception handling inside `get_message()` and inside the reader task. One slight surprise is that the test can't be written due to the bug in #74! The client would hang because the reader task is blocked by the unconsumed messages. So I changed the channel size to 32, which allows this test to work, and I will replace this hard-coded value when I fix #74.
This is now done on PR #79.
I'd like to keep this ticket dedicated to configuring the message buffer and address timeouts in #64. Can you please take a look at #64 and share your opinion on the various options presented there? |
Nathaniel was implying that this would unblock the reader task and allow it to receive the ConnectionClosed, resolving the original issue. I'd like to understand this better.
Please organize things as you see fit, but I'd like to reiterate that this bug is about indefinite blocking of the context manager, and it can't be fully addressed by offering buffer configuration. I don't want to use a buffer size of more than 0 or 1, and even reasonable sizes beyond that can trivially become filled. (E.g. an app might do some blocking work between the last ws read and exiting the cm scope, during which time a lot of messages could have been received.) The priorities I see, in order, are: 1) confirm whether Nathaniel's suggestion can allow normal shutdown, at least in the case where the server is being responsive, 2) add some configurable timeout to waiting for close handshake from the server, 3) provide buffering configuration. |
With PR #79, but buffer size reverted to 0, indeed the test I wrote in #77 for this bug now passes. Regarding test_read_messages_after_remote_close in #79, in the absence of this bug it does still hang, but that's due to a deadlock in the test as it's written. With a buffer size of 0, the server handler can never send its 2nd message and set the event, because the client is waiting for the event to consume the messages. So the buffer workaround in #79 should be removed, and the test deadlock fixed. That's enough to close this bug, and the timeout / buffer enhancements are future work. (Timeout still sounds higher priority.) |
The server sends all of its messages (they end up in some kernel buffer, I guess), but its context manager cannot exit because its waiting for the client to handshake. The client won't handshake because the reader task is blocked trying to deliver the first message. I don't think this test deadlock can be "fixed", because this is exactly the behavior that we want to test: the client has done the closing handshake but the caller can still get pending messages. I have disabled the test for the time being and will re-enable it when the buffer size can be configured. |
This is not what I observed in the debugger. With a buffer size of 0, the server handler actually blocked on the very first send_message(). This was very surprising-- I'd appreciate if you would confirm and perhaps make sense of it. |
As described in the issue, get_message() was raising connection closed even if there were pending messages. Per Nathaniel's suggestion, the proper behavior is this: 1. If the remote endpoint closed the connection, then the local endpoint may continue reading all messages sent prior to closing. 2. If the local endpoint closed the connection, then the local endpoint may not read any more messages. I added tests for these two conditions and implemented the behavior by closing the ReceiveChannel inside the connection's `aclose()`. This requires a bit of additional exception handling inside `get_message()` and inside the reader task. One slight surprise is that the test can't be written due to the bug in python-trio#74! The client would hang because the reader task is blocked by the unconsumed messages. So I changed the channel size to 32, which allows this test to work, and I will replace this hard-coded value when I fix python-trio#74.
As described in the issue, get_message() was raising connection closed even if there were pending messages. Per Nathaniel's suggestion, the proper behavior is this: 1. If the remote endpoint closed the connection, then the local endpoint may continue reading all messages sent prior to closing. 2. If the local endpoint closed the connection, then the local endpoint may not read any more messages. I added tests for these two conditions and implemented the behavior by closing the ReceiveChannel inside the connection's `aclose()`. This requires a bit of additional exception handling inside `get_message()` and inside the reader task. One slight surprise is that the test can't be written due to the bug in python-trio#74! The client would hang because the reader task is blocked by the unconsumed messages. So I changed the channel size to 32, which allows this test to work, and I will replace this hard-coded value when I fix python-trio#74.
I haven't tried in the debugger yet. I added a bunch of logging statements and a trio timeout to trace the code: async def test_read_messages_after_remote_close(nursery):
'''
When the remote endpoint closes, the local endpoint can still read all
of the messages sent prior to closing. Any attempt to read beyond that will
raise ConnectionClosed.
'''
server_closed = trio.Event()
import logging
with trio.fail_after(2):
async def handler(request):
server = await request.accept()
async with server:
logging.debug('server A')
await server.send_message('1')
logging.debug('server B')
await server.send_message('2')
logging.debug('server C')
server_closed.set()
logging.debug('server D')
server = await nursery.start(
partial(serve_websocket, handler, HOST, 0, ssl_context=None))
async with open_websocket(HOST, server.port, '/', use_ssl=False) as client:
logging.debug('client A')
await server_closed.wait()
logging.debug('client B')
assert await client.get_message() == '1'
logging.debug('client C')
assert await client.get_message() == '2'
logging.debug('client D')
with pytest.raises(ConnectionClosed):
logging.debug('client E')
await client.get_message()
logging.debug('client F') Running this produces the following logs:
The client prints "A" and then hangs: it is waiting for the The server prints "A", "B", "C", and then hangs: it must be stuck in the CM exit. The logs also indicate that the client I'll try this out in the debugger next. |
Trying this out in the debugger. I reverted the logging code from the previous post and added
This looks like it sends two messages. |
I'm debugging in PyCharm (uses Pdb I believe). I tried again and it's matching what you observed now. Sorry for taking your time! |
is this behaving as intended?
The text was updated successfully, but these errors were encountered: