Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.x] PC not closing server side on normal hangup #3430

Closed
adnanel opened this issue Sep 19, 2024 · 5 comments
Closed

[1.x] PC not closing server side on normal hangup #3430

adnanel opened this issue Sep 19, 2024 · 5 comments
Labels
multistream Related to Janus 1.x

Comments

@adnanel
Copy link
Contributor

adnanel commented Sep 19, 2024

What version of Janus is this happening on?
Newest master, e.g. 504daf5aef333d6f37e41c30b00be24cfb6c83bf

Have you tested a more recent version of Janus too?
Yes, master branch is affected.

Was this working before?
Yes, this was broken with the change in this commit:
0f32c32

Additional context
Given a session with janus SIP plugin:

  • We send "hangup" request, but don't close the PC on client side.
  • Janus sets the session as closing
  • The SIP Relay thread loop terminates, and calls janus_sip_media_cleanup
  • Inside this method, has_audio and has_video are both set to FALSE
  • After this janus_sip_sofia_callback is invoked with an nua_i_state event, but the PC doesnt close because has_audio and has_video are both FALSE

Result: PC remains open, after a while we receive DTLS alert which causes PC closure.

@adnanel adnanel added the multistream Related to Janus 1.x label Sep 19, 2024
@lminiero
Copy link
Member

  • We send "hangup" request, but don't close the PC on client side

Yours is a good analysis, but why aren't you closing the PC on the client side too when sending the "hangup"? Our SIP demo does, and I would have expected everyone to do the same.

@adnanel
Copy link
Contributor Author

adnanel commented Sep 19, 2024

That's an option, I guess. If you think this behaviour is fine we can close the peer connections client side sooner than before.

Currently we do cleanup once we receive the hangup event back from janus (the "janus" : "hangup", not the SIP plugin hangup). I agree that there is no need for this to be done sequentially, but I'm not a big fan of relying on client side for this flow to finish as expected.

@lminiero
Copy link
Member

lminiero commented Sep 19, 2024

Makes sense. Starting from the assumption that I'm not going to revert the PR/commit you mentioned (which had a much serious impact on the status of sessions), I think the main issue here is related to timing and the order of things happening, that in this specific case lead to an internal cleanup in the plugin (janus_sip_hangup_media_internal) but not to a cleanup in the core (close_pc) that would be needed in this case though.

Rather than overcomplicate things, maybe there's an easier fix: in your last bullet point, always call both close_pc and janus_sip_hangup_media_internal. In fact, in cases where a PC was available, close_pc will schedule a call to hangup_media on the plugin, which in turn will call janus_sip_hangup_media_internal: if there was no PC, it won't, and so we have to do it ourselves (main reason why we made that patch you mentioned). Considering that janus_sip_hangup_media only calls janus_sip_hangup_media_internal protected by a mutex, calling it twice shouldn't be an issue: the first call (whether it's our own internal call, or the one scheduled by close_pc) will clean up things internally, and the second will do nothing since the state will have been changed by the call before.

Can you try changing this block here:

if(g_atomic_int_get(&session->establishing) || g_atomic_int_get(&session->established)) {
	if(session->media.has_audio || session->media.has_video) {
		/* Get rid of the PeerConnection in the core */
		gateway->close_pc(session->handle);
	} else {
		/* No SDP was exchanged, just clean up locally */
		janus_sip_hangup_media_internal(session->handle);
	}
}

to something like this instead

if(g_atomic_int_get(&session->establishing) || g_atomic_int_get(&session->established)) {
	/* Get rid of the PeerConnection in the core */
	gateway->close_pc(session->handle);
	/* Also clean up locally, in case there was no PC */
	janus_sip_hangup_media_internal(session->handle);
}

and let me know how that works for you? If I'm right, it should address your issue and at the same time not introduce any regression (due to the idempotent nature of janus_sip_hangup_media_internal), but it's a good idea to check if I'm missing anything on the top of my head.

@adnanel
Copy link
Contributor Author

adnanel commented Sep 19, 2024

That seems to have fixed our problem, I did a few tests myself and kept a single janus instance running automated tests for the past ~8 hours and no other problems were observed.

Do you want to commit that to master directly or should I create a PR?

@lminiero
Copy link
Member

No need, I'll push the commit myself to both master and 0.x shortly.
Thanks for the feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multistream Related to Janus 1.x
Projects
None yet
Development

No branches or pull requests

2 participants