Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] "Cannot create so many PeerConnections" on Chrome after 500 accepted calls #145

Closed
7 of 8 tasks
asmeikal opened this issue Feb 24, 2023 · 6 comments
Closed
7 of 8 tasks
Labels
bug Something isn't working jira added

Comments

@asmeikal
Copy link

  • I have verified that the issue occurs with the latest twilio.js release and is not marked as a known issue in the CHANGELOG.md.
  • I reviewed the Common Issues and open GitHub issues and verified that this report represents a potentially new issue.
  • I verified that the Quickstart application works in my environment.
  • I am not sharing any Personally Identifiable Information (PII) or sensitive account information (API keys, credentials, etc.) when reporting this issue.

Code to reproduce the issue:

An example is available here: https://github.com/asmeikal/voice-javascript-sdk-quickstart-node

After starting up the device, click on the "Start loop" button. The script will connect and then disconnect calls in a loop, keeping a counter.

Expected behavior:

An unlimited number of calls can be made.

Actual behavior:

As soon as the counter reaches 500, no more calls can be made. The stack trace of the error is the following:

twilio.min.js:1 Uncaught (in promise) DOMException: Failed to construct 'RTCPeerConnection': Cannot create so many PeerConnections
    at RTCPC.create (http://localhost:3000/twilio.min.js:1:184344)
    at PeerConnection._setupPeerConnection (http://localhost:3000/twilio.min.js:1:171217)
    at PeerConnection._initializeMediaStream (http://localhost:3000/twilio.min.js:1:173968)
    at PeerConnection.makeOutgoingCall (http://localhost:3000/twilio.min.js:1:176510)
    at connect (http://localhost:3000/twilio.min.js:1:36367)
    at http://localhost:3000/twilio.min.js:1:36907

The root of the problem is this Chromium bug, that is still open: https://bugs.chromium.org/p/chromium/issues/detail?id=825576
RTCPeerConnections are not garbage collected quickly enough by Chrome, but workarounds are available, as shown in this comment: https://bugs.chromium.org/p/chromium/issues/detail?id=825576#c40
Forcing garbage collection through the workaround in the comment, or by relying directly on the garbage collector in Electron, reclaims part of the RTCPeerConnections, and other RTCPeerConnections can be made.

However, the workarounds do not work when using the Twilio JS Voice SDK, even after waiting up to an hour and after recreating the Twilio device.
This may be caused by references to the RTCPeerConnection being leaked by the Twilio JS Voice SDK, as pointed out in some of the comments: https://bugs.chromium.org/p/chromium/issues/detail?id=825576#c39
This prevents even a manual trigger of the garbage collection from cleaning up the closed RTCPeerConnections.

We detected this problem in our application, where call center operators usually reach the 500 calls after roughly 6 hours.

Software versions:

  • Browser(s): Chromium 108, Electron 18
  • Operating System: Windows 10, 11
  • twilio.js: 2.1.2, 2.2.0
  • Third-party libraries (e.g., Angular, React, etc.):
@asmeikal asmeikal added the bug Something isn't working label Feb 24, 2023
@asmeikal
Copy link
Author

As requested by the Twilio support team, here is a Call SID for which we detected the problem: CA72b33d8da7bdd27d331f4f6283910143.

Twilio logs for this call only show the following log:

Canceled: CA72b33d8da7bdd27d331f4f6283910143

This call was received by our application after the 500 calls. What happened was the following:

  1. The Device received the "incoming" event
  2. We programmatically called the "accept" method on the Call object inside the "incoming" event
  3. We detected the error shown in the first message
  4. After 5-10 seconds, the Call object received the "cancel" event

@charliesantos
Copy link
Collaborator

@asmeikal , thanks for submitting! I can see that we are indeed closing the peer connections (by looking at webrtc-internals) , but they don't go away if you continue to monitor the webrtc-internals tab. However, I can see the same behavior in when I load the webrtc example applications. https://webrtc.github.io/samples/. I can see that peer connections are also getting closed but they don't go away. I believe this is still an issue on Chrome. But to make sure, I will submit an internal ticket to track this down and maybe there are references that we are not cleaning up. For now, whenever this issue happens, can you programmatically reload the page?

@asmeikal
Copy link
Author

@charliesantos the example application you linked is actually leaking references to the connection. Using the "Basic peer connection demo in a single tab" I see too the peer connections number growing without limits by clicking the "Call" and "Hang Up" buttons, and "forcing" the garbage collector to run does nothing to free up the closed connections. However, references to the connections can be found in the console (as the "currentTarget" of some of the events logged by the example application). Clearing the console and then forcing the garbage collector to run again clears up the old connections. See this comment on the Chromium issue for reference: https://bugs.chromium.org/p/chromium/issues/detail?id=825576#c38

Since garbage collection works when no references to the peer connections are present, I believe the problem we are currently experiencing is caused by similar reference leaks inside the Twilio JS Voice SDK. Unfortunately I still haven't had the chance to go over the SDK code to try and identify the leak.

As for the solution you proposed (programmatically reloading the page), it's something we are using and that mitigates the problem (as all connections made before the reload are garbage collected), but it's quite disruptive for the user flow due to other features of our application, and as such it's only used when we are sure that the user is inactive. The problem still arises when the user makes 500 calls in a single session, without giving us the chance to reload the application.

@charliesantos
Copy link
Collaborator

Thanks for the additional information @asmeikal . We'll keep this in mind once we start looking into this more.

@charliesantos
Copy link
Collaborator

Related to #87
We will add the fix in the next release.

@charliesantos
Copy link
Collaborator

Should be fixed in 2.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working jira added
Projects
None yet
Development

No branches or pull requests

2 participants