Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various changes of RTP timeout handling to handle timeouts on single streams #1403

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

arnd-s
Copy link

@arnd-s arnd-s commented Nov 22, 2021

  • introduced timeout-mode parameter to control mode of operation off/any/all during session
    off - don't monitor timeouts / turn monitoring off
    any - react if a timeout occur on any RTP stream
    all - react if timeout occur on all streams (old behavior)
  • Don't monitor RTCP streams for timeout (to prevent false alarms if RTP timeout is very low)

This patch introduces primary the possibility to react on RTP timeouts of a single stream.

During call establishment / early-media, there is often only one stream direction active. To prevent false alarms
for this case, the timeout-mode parameter for the call-interface was added. The logic is, that a call starts with
a disabled timeout monitoring and when the session is established in both directions (for example 200OK Reply for
an answer) the timeout will be activated.

When activated, timeout_activated element of struct call will be set to earliest possible timeout occurence
(activation time + timeout time). This should prevent false alarms immediatliy after session establishment

There is still an unresolved problem at the moment. Even timeout_activated is used, it sometimes happens
that on first check a timeout is detected. This occurs even, when timeout value (and timeout_activated
in succession) is increased.
To circumvent this, missed_packet_counter of struct packet_stream is used, so that the checks fails on the third
missed package in a row.

Signed-off-by: Arnd Schmitter arnd@lgcm.de

- introduced timeout-mode parameter to control mode of operation off/any/all during session
  off - don't monitor timeouts / turn monitoring off
  any - react if a timeout occur on any RTP stream
  all - react if timeout occur on all streams (old behavior)
- Don't monitor RTCP streams for timeout (to prevent false alarms if RTP timeout is very low)

This patch introduces primary the possibility to react on RTP timeouts of a single stream.

During call establishment / early-media, there is often only one stream direction active. To prevent false alarms
for this case, the timeout-mode parameter for the call-interface was added. The logic is, that a call starts with
a disabled timeout monitoring and when the session is established in both directions (for example 200OK Reply for
an answer) the timeout will be activated.

When activated, timeout_activated element of struct call will be set to earliest possible timeout occurence
(activation time + timeout time). This should prevent false alarms immediatliy after session establishment

There is still an unresolved problem at the moment. Even timeout_activated is used, it sometimes happens
that on first check a timeout is detected. This occurs even, when timeout value (and timeout_activated
in succession) is increased.
To circumvent this, missed_packet_counter of struct packet_stream is used, so that the checks fails on the third
missed package in a row.

Signed-off-by: Arnd Schmitter <arnd@lgcm.de>
@rfuchs
Copy link
Member

rfuchs commented Nov 23, 2021

I'm not sure about your exact use case, but I'm wondering if this can be solved in a more automatic way? (Not opposed to having this as a settable value, but we could do both...)

Since rtpengine has a vague concept of which sides/parties to expect RTP from, and so I'm wondering if the timeout check can simply be made conditional on that? (There's some exceptions here, e.g. there's no distinction between an answer from a 18x and an answer from a 200, so that could be something else that could be added new)

There is still an unresolved problem at the moment. Even timeout_activated is used, it sometimes happens that on first check a timeout is detected. This occurs even, when timeout value (and timeout_activated in succession) is increased. To circumvent this, missed_packet_counter of struct packet_stream is used, so that the checks fails on the third missed package in a row.

I did come across this issue in the past and thought that resetting the last_packet timestamp to the current time during a signalling event could solve this. Or perhaps taking the last_signal timestamp into account somehow. What do you think?

@@ -206,6 +213,16 @@ static void call_timer_iterator(struct call *c, struct iterator_helper *hlp) {

/* valid stream */

// ignore RTCP Streams
if (PS_ISSET(ps, RTCP))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be more appropriate to use if (!PS_ISSET(ps, RTP)) here? (Streams can be both RTP and RTCP in case of RTCP-mux)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this sounds better. I hadn't RTCP-mux in mind, because it's not used in our case.

@@ -4022,6 +4047,7 @@ int call_delete_branch(const str *callid, const str *branch,
"(via-branch '" STR_FORMAT_M "') in %d seconds",
STR_FMT_M(&ml->tag), STR_FMT0_M(branch), delete_delay);
ml->deleted = rtpe_now.tv_sec + delete_delay;
ml->mark_deleted = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but could ml->deleted != 0 be used as a test for this instead of adding a new flag?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as i remember i first tried it this way but there was an issue with it but i currently don't know what it was.
I need to make some tests and see, if i can reproduce the issues i had.

@arnd-s
Copy link
Author

arnd-s commented Nov 23, 2021

Since rtpengine has a vague concept of which sides/parties to expect RTP from, and so I'm wondering if the timeout check can simply be made conditional on that? (There's some exceptions here, e.g. there's no distinction between an answer from a 18x and an answer from a 200, so that could be something else that could be added new)

In our use case, UAC often don't send any RTP traffic until the receive of a 200 answer. So in case of a sdp-reply via 18x a timeout will occur.
It's also in my experience feasible to have more control in case of call forking scenarios with multiple early media streams or to disable timeouts during call hold.

There is still an unresolved problem at the moment. Even timeout_activated is used, it sometimes happens that on first check a timeout is detected. This occurs even, when timeout value (and timeout_activated in succession) is increased. To circumvent this, missed_packet_counter of struct packet_stream is used, so that the checks fails on the third missed package in a row.

I did come across this issue in the past and thought that resetting the last_packet timestamp to the current time during a signalling event could solve this. Or perhaps taking the last_signal timestamp into account somehow. What do you think?

I'll look into it and try if i find a better solution.

Please be aware, that this PR should in my opinion get extensive testing from users with other use cases before merging. Especially with forking scenarios or more than one stream in each direction. I also don't tested it in webrtc scenarios.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants