-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.x] segmentation faults on videoroom hangup #3154
Comments
The current version of Janus master is 1.1.3, please try with that one too. |
okey, we will install it and i will return with logs from it. |
tha same happened on master |
@spscream thanks for checking, would you please share an ASan stack trace from the current master? |
@spscream if you also have ways that we can use to consistently replicate it ourselves that would help too! |
I don't know hot to make repetition of this, it happens only on one environment |
maybe this related to some buggy participant which calls unpublish twice in parallel, we are trying to figure it out |
We got some more crashes again, they are in the same place as i provided before. I can't provide any additional logs at moment. I'm trying to repeat crash myself by integration tests, but I have no success. |
Since I suspect it's an issue with reference counting, where something is freed sooner than it should, one thing that might help is if you uncommented this line and recompiled Janus. It will make the logs explode in size, since it will be tracking all refs and unrefs, but should the problem occur again it would give us much more info to track where and why the affected resources was freed before its time. You may only want to do that in your own local deployment, maybe, rather than the one in production: your choice! But that would definitely help, especially if you're able to replicate it eventually. |
Hi! in janus.log is a stdout of janus, you can find exact moments of crash by janus version banner(Janus version: 1103 (1.1.3)). |
is not a valid commit hash, I guess we should refer to current master since the version is mentioning 1.1.3 ? |
https://github.com/spscream/janus-gateway it refers to my fork with refs debug enabled |
@spscream are you running janus on a 32-bit architecture? |
no, we are running on 64bit |
do you have any glue what is happening or how can I help to debug it? |
Still investigating. |
@spscream please test the PR above. |
thank you! I build image with this pr and gave it to our customer |
it is crashed again on 2 different sites 2 or 3 times. |
@spscream thanks for reporting, the patch has been updated with another fix. |
@spscream have you had the chance to test the updated patch ? |
yes. We await new crush log from customer. It is crashed again |
we got the crash and log: https://drive.google.com/drive/folders/1ATMh-cWhvcVNOJ22El_bFk8DudlsKLdI?usp=sharing |
@spscream thanks for the logs! |
The reference that made it crash is not present in the log file, probably because it has been removed before the instant the file starts. |
no, they have they logs limited by size and this is all we have. We will wait for another crash |
@atoppi i added archive with full log to the same folder |
@spscream patch updated according to latest crash data, please test and report. |
two days since running on latest patch and no crashes, we will run it for few more days |
crashed again twice yesterday, do you need any additional info? |
We're busy on other activities these days, sorry. We'll have a look as soon as possible. |
@spscream please share the core dumps and logs |
unfortunately, we didn't get crash logs for yesterday crashes. If we will have any new crashes I will share logs for it. |
I got crash from 17.03: https://drive.google.com/drive/folders/1qD3T4n77sz5e8--TGbSrL26WY2Aats-O?usp=sharing |
This one happened into the core:
The thread is trying to read the atomic
This is the trace of the involved reference:
The session refcount goes to 0 when handling a Lines 1685 to 1693 in 6ddaaa9
We are facing many different race conditions here, but each of them is related to VR session teardown. |
As anticipated, I merged #3167, so when we have a possible fix for these other issues we'll open a new PR. |
Please test the PR above. |
I'll merge and close then. |
What version of Janus is this happening on?
1.1.0, 1.1.1
Have you tested a more recent version of Janus too?
problem occured first on 1.1.0 and we upgraded to 1.1.1 and it steal appears.
Was this working before?
this is occuring only on one of our customers servers and not occuring on others with the same version(1.1.0 and 1.1.1).
Is there a gdb or libasan trace of the issue?
https://gist.github.com/spscream/6ee8265dbc7d0c8f7fa24c2c9c0f8eef
Additional context
We use videoroom plugin mainly and this segmentation fault appers only on one customer.
This customer also having some random issues with turn/stun servers and connectivity between janus and webrtc clients,
so this could not be directly related to the crash, but maybe this leads to this segmentation fault.
maybe this duplicates #3124
The text was updated successfully, but these errors were encountered: