-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid infinite loop in CreateOffer #1657
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1657 +/- ##
==========================================
- Coverage 75.18% 75.04% -0.15%
==========================================
Files 79 79
Lines 5694 5698 +4
==========================================
- Hits 4281 4276 -5
- Misses 1035 1041 +6
- Partials 378 381 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
If the local description keeps getting changed, or in case of a but in Pion, CreateOffer never terminates, which could cause client software to hang. Set an arbitrary bound on the number of iterations.
a41a54e
to
9a44cf9
Compare
🔥 LGTM! I will cherry-pick and include a test. Would also like to figure out the root cause |
Absolutely. With this patch, Galène no longer hangs, but it still fails streams when |
@@ -576,6 +577,8 @@ func (pc *PeerConnection) hasLocalDescriptionChanged(desc *SessionDescription) b | |||
return false | |||
} | |||
|
|||
var errExcessiveRetries = errors.New("excessive retries in CreateOffer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give more context here, this message may not be helpful, maybe we can say that transceivers were modified during negotiation, or something that can help users to know that they are on a racing issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not happen in normal usage — if transceivers are being modified, then we'll only loop once or twice, while the error only appears after 128 iterations. If this message appears, it is indicative of a bug, either in the client application or in Pion.
In other words, this allows the application to recover after it hits a bug, rather than hanging and needing to be restarted.
So perhaps we could add something like "this is probably a bug somewhere", but I feel it's overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have nerver seen this error in ion-sfu, but once I was trying to reuse transceiver I see it happen once, so I thought it was a racing issue related to mids that could not be found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The race I describe only happens when the client is slow sending its answer, so it's the kind of bug that you typically only see in production.
Related to #1656.