-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Videoroom - subscriber unable to switch feeds after publisher's peer connection goes down, even though close_pc is set to false on the subscriber #1761
Comments
Very likely this is the cause: https://github.com/meetecho/janus-gateway/blob/master/plugins/janus_videoroom.c#L4753 You can try commenting out those two lines to see if it works as expected for you. I'm not sure if that will introduce leaks, or other issues, so please let me know. |
UPDATE: |
I won't be able to help until you provide libasan dumps or gdb stacktraces. |
In the unified-plan branch, renegotiations are much more frequent, and in control of the application. |
Here's a link to a tgz containing stacktraces, core files of crashes as well as our modified videoroom.c file (containing the minor code change from above) and our Janus binary. We used the most recent Janus master from today. We captured a total of 5 crashes, 2 of which contain the same stacktraces. In the tgz there is a crash.number file for each crash and inside of it there is a stacktrace and a core file name reference (core.). This is the stacktrace that we have two of:
core.22409 Please let us know what you find, thank you for the help. |
Please don't forget to put code snippets in quotes, or the gdb trace links will be misread as references to existing github issues. A libasan dump would help as well (probably more, if it is a memory issue). |
Sorry about that, thanks for fixing it. https://pastebin.com/dzM1utWm |
This seems to confirm it's a memory/reference counter issue. The plugin is trying to access a subscriber after it's been freed because the count of references dropped to 0. Next step would be uncommenting |
That said, please do it on master and not on a modified/old version. |
We uncommented |
This is not master, lines don't match. What commit do the logs refer to? |
Anyway, you probably want to remove/comment the That should solve that specific crash, but you may want to be sure there aren't any leaks now: keep the refcount-debugging active, and if it's not crashing anymore, wait until all users have left to cleanly close Janus. This will tell you if there are resources that haven't been properly deallocated. If they have, we can be confident that this fix can land in master. |
Just to confirm, this is the code that we have changed in janus_videoroom.c:
We commented the room dereference and the refcount decrease. This change fixes the "No such room" error and also keeps Janus from crashing, so it seems to be the correct fix. What do you mean by "cleanly close Janus"? How do we shut down Janus in a way that makes it print objects that are still left? Is there a way to match up the refcount increments and decrements to determine if there are still objects leaking? |
CTRL+C or a SIGINT
If refcount debugging is enabled, a summary will be printed on the console/logs before the application exits. |
The commit that this scenario refers to is 7df23ef There are definitely some memory leaks caused by this fix. We verified this by first running Janus with the original janus_videoroom.c code from the above commit, in which the only remaining objects in memory were valid as they referenced the transport, which was still being used. Then we used the modified janus_videoroom.c code with the commented lines above; there were no crashes, but there were 56 objects leaked when we closed Janus. Attached is a zip containing the Janus log that shows the leaks at the very end of the file as well as the janus_videoroom.c file that we are running. What are the best steps for troubleshooting this further? |
We have come up with a solution for this, the updated janus_videoroom.c file can be seen in the pastebin below: The changes were made in janus_videoroom_hangup_subscriber starting on line 4775, as well as in janus_videoroom_destroy_session lines 2442 to 2446. This change stops the memory leaks reported in the posts above. in janus_videoroom_hangup_subscriber, we are now only dereferencing the room and decreasing the refcount if close_pc is true. Otherwise, we are leaving the reference to the room so that we can still switch the subscriber to a new publisher even if the original publisher gets destroyed. Once janus_videoroom_destroy_session gets called, we will then properly cleanup the lingering subscriber references if close_pc is false. Please let us know if this is an appropriate fix for this problem |
The 2442-2446 lines seem overkill to me. There already is a check for |
You are correct, those lines were overkill. We tried it with your suggestion and it still worked properly, no crashes and no leaks. We tested with subscribers having close_pc set to true and false to make sure it works properly for both cases. Below are the two functions we have modified in janus_videoroom.c. If these look good to you, could you please commit them to master?
|
Any chance you can prepare a diff patch, or a pull request? Would make it easier to identify the changes... Thanks! |
Here's a patch file: This patch file is relative to the most recent commit 38d29d0 |
Done, thanks! |
NOTE: I posted about this issue in the google group, but I believe there is a bug so I am posting again here with more specifics as to what is happening.
Overview
My team and I are using the videoroom plugin to create a basic video conferencing demo. Each participant has a publisher handle and multiple subscriber handles subscribed to feeds that should be displayed on their screen. Every subscriber has close_pc = false when they start subscribing. When we swap the feed we want to show on someone's screen, we do a switch request to the new feed. Everything works fine with normal use cases and we are able to switch feeds properly.
Bug
The problem arises when a publisher's peer connection gets destroyed and then we try to do a switch request from one of the destroyed publisher's subscribers to another active feed. When we make this request after the publisher is gone, we get an error back from janus saying "No such room". I have confirmed with "list" requests and "listparticipants" requests to the plugin that the room does exist and that the publisher I am trying to switch to also exists, so the "No such room" error is incorrect. Also, I have confirmed that my subscriber peer connection is still around after the publisher goes down, which is expected with close_pc = false. I believe there is a bug here since I should still be able to switch my subscriber feed to an active one since I have specified close_pc = false. There may be something that I am doing wrong or misunderstanding about switching and the close_pc field, but at the very least the "No such room" error is not correct in this scenario.
The text was updated successfully, but these errors were encountered: