-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task pool connection is not dropped in case of error #2661
Comments
If I recall correctly, the reasoning why
What do you mean by leaked? The task is terminating given the rust-libp2p/swarm/src/connection/pool/task.rs Line 249 in ef2afcd
The connection should be owned by the task and thus dropped as well. Am I missing something @Pj50? |
If you don't close the connection how can a transport layer be notified in case of error ? I mean the TCP connection will be eventually closed on timeout or by the kernel but what if the underlying transport does not support it (in our example UDP) ? You're right, the connection should be dropped however in our UDP transport layer the drop on the connection is never called, we don't understand why and are still investigating. |
In this case it is the transport layer notifying the muxer that an error occured, right? If one ignores the layers, there would be no need to notify the notifier, right?
Thank you! |
Tested with QUIC v4 attempt. In case of a connection error the muxer is not awaited thus the poll_close StreamMuxer trait function of QuicMuxer is never called. Isn't it an issue ? The QUIC connection is not closed by libp2p but will eventually be closed later by Quinn on timeout. |
Thanks for the ping. Until now it was also handled by a timeout, i.e. the endpoint was not able do forward it to the connection and thus no ack was sent to the remote, eventually leading to a timeout. But I changed it now so that when the connection was dropped, the endpoint gets directly injected a |
@elenaf9 sorry but the patches don't apply neither on v3 or v4 branches. |
Thanks for testing @Pj50! Could you please share the code with which you are testing, and the debug output that shows this? |
@elenaf9 I cannot share the code I am working on, but you can reproduce the issue with two instances of libp2p using Quic, one as a listener and the other as a dialer. I use ping + identify as behaviour. Just kill the dialer, and the you will see that the connection is not closed on the swarm side, but the connection timeouts in Quinn. The following log should trigger on timeout:
|
There might be a misunderstanding here. What is implemented (in quic-v3) is that if a connection is dropped on one side we send a RESET frame to the remote for that connection. That's possible because our quic endpoint is still polled after an individual connection was dropped and can continue sending packets to the remote (e.g. the RESET for the dropped connection) and handle / ack incoming ones. If the whole dialer is killed then not only the connection is dropped also the endpoint, so we can not sent a RESET to the remote or ACK any packets anymore. Thus this then results at a timeout at the remote, which imo makes sense. |
Related: #1682 |
I think this is now fixed with the new connection management. |
Summary
In swarm/src/connection/pool/task.rs:240 the call to the future closing_muxer is missing, thus the connection is never dropped and finally leaked.
Expected behaviour
The closing_muxer future should be awaited.
Actual behaviour
The closing_muxer future is not awaited. However the bug is not triggered with the usual transport layers but I am able to reproduce it while using my under development UDP transport layer.
Possible Solution
But don't really know which error should be passed to EstablishedConnectionEvent::Closed below.
Version
Master
Would you like to work on fixing this bug?
The bug report should be enough.
The text was updated successfully, but these errors were encountered: