-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
webrtc: test pion fixes for state change callbacks ordering #2732
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
sukunrt
commented
Mar 12, 2024
Comment on lines
78
to
82
const ( | ||
DefaultDisconnectedTimeout = 20 * time.Second | ||
DefaultFailedTimeout = 30 * time.Second | ||
DefaultKeepaliveTimeout = 15 * time.Second | ||
DefaultKeepaliveTimeout = 5 * time.Second | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to have 4 keep alives before timing out than just 1 as before.
sukunrt
force-pushed
the
webrtc-flaky-fix
branch
8 times, most recently
from
March 12, 2024 17:37
8bb55dc
to
8ce861f
Compare
sukunrt
force-pushed
the
webrtc-flaky-fix
branch
7 times, most recently
from
March 13, 2024 06:31
123ec1f
to
4bd7d63
Compare
sukunrt
force-pushed
the
webrtc-flaky-fix
branch
from
March 13, 2024 06:31
4bd7d63
to
8f1fc36
Compare
sukunrt
force-pushed
the
webrtc-flaky-fix
branch
10 times, most recently
from
March 13, 2024 11:15
4a46d71
to
96620ae
Compare
sukunrt
force-pushed
the
webrtc-flaky-fix
branch
6 times, most recently
from
March 13, 2024 13:19
fa9396a
to
4d76e74
Compare
sukunrt
force-pushed
the
webrtc-flaky-fix
branch
from
March 13, 2024 13:39
4d76e74
to
e1837e7
Compare
sukunrt
changed the title
webrtc: make context timeout consistent with pion timeout
webrtc: test pion fixes for state change callbacks ordering
Mar 13, 2024
the webrtc fixes have landed and will be included in v0.34 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes: #2614
This is a problem with pion/webrtc's OnConnectionStateChange handler. It should wait for the handler to finish. Right now connection notifications might get reordered leading to listener thinking the connection has established and the dialer thinks connection hasn't established yet.
The fix in pion/webrtc is here:
https://github.com/pion/webrtc/pull/2702/files
The same error in pion/ice
https://github.com/pion/ice/pull/656/files
Update: the text below helped me explains how I went about debugging this
I think this fixes the problemThis might help debug the problem of tests taking 10m. My theory is that the ICE connection succeeds on dialer and times out on listener(20s). So the listener removes the ufrag from its udp mux. After keepalive time, the dialer sends a STUN Request and initiates a new connection on the listener. This of course will fail because the dialer has exited the ice connection establishment mode and is simply sending STUN packets to keep the connection alive.If we look at the failure here: https://github.com/libp2p/go-libp2p/actions/runs/8244362080/job/22546464639
listener is stuck on
2024-03-12T06:24:47.2310488Z /home/runner/work/go-libp2p/go-libp2p/p2p/transport/webrtc/listener.go:234 +0x886
This context expires after 20 seconds.
And all the goroutines on the dialer side are stuck for 9 minutes. And all the goroutines on the listener side are stuck for < minute(which is why that's not shown in the logs)
The logs also show that the dialer believes that ice connection on its side has completed.
Unfortunately this is very difficult to test. This has not flaked in the 16 tries I have done after that.
We still have to explain why the dialer doesn't exit on DTLS failure. DTLS failure should reflect in the OnConnectionStateChange callback.