Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Videoroom: Fix RID memory leak #2995

Merged
merged 6 commits into from
Jun 27, 2022
Merged

Conversation

OxleyS
Copy link
Contributor

@OxleyS OxleyS commented Jun 7, 2022

When processing an offer from a publisher using RID-based simulcast, Videoroom calls janus_rtp_simulcast_prepare() for the stream associated with every m-line in the offer, even if that m-line had already appeared in a previous offer. This means the RID array is repopulated every time, but the existing values were not being freed before being overwritten. This leads to a memory leak for every successive renegotiation.

Here I just unconditionally free the previous values beforehand, assuming they need to be completely repopulated anyway. While we're at it here, should we also clear rid_ext_id or the SSRC array somehow? (Although they're not leaking memory, obviously)

…tream appeared in successive Videoroom publisher offers.
Copy link
Member

@lminiero lminiero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that's something that should be done there: janus_rtp_simulcasting_prepare doesn't own the rid array, which is passed to it by someone who calls the method (e.g., plugins). For all the method knows, those might not be malloc-ed, but allocated in the heap, which as a result would crash Janus. On the other end, the method does do a g_strdup when a rid is found, meaning it's allocating stuff on behalf of the caller, so it might indeed make sense.

I've added a small editorial note inline. On your question, I guess that if we reset rids, it does indeed make sense to reset rid_ext_id to -1 as well.

src/rtp.c Outdated
g_free(rids[i]);
rids[i] = NULL;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please get rid of both empty lines, before and after, since as it is the method is written in a compact way.

src/rtp.c Outdated
@@ -1020,11 +1020,18 @@ void janus_rtp_simulcasting_context_reset(janus_rtp_simulcasting_context *contex
void janus_rtp_simulcasting_prepare(json_t *simulcast, int *rid_ext_id, uint32_t *ssrcs, char **rids) {
if(simulcast == NULL)
return;
/* Clear existing RIDs in case this array was reused from a previous call */
size_t i = 0;
for(i=0; i<3; i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your new commit made me realize that you should have an if(rids) for this block as well, since it may be NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right, sorry, good catch.

@OxleyS
Copy link
Contributor Author

OxleyS commented Jun 8, 2022

I agree that there is some ambiguity here with regards to who owns the memory, as currently the caller is also responsible for freeing the RID array contents when e.g. the session is destroyed. Probably the most unambiguous and least code-duplicating resolution would to have the janus_rtp_simulcast function family handle the entire lifetime of both the RID array and its contents, but that would be a much larger change.

@lminiero
Copy link
Member

lminiero commented Jun 8, 2022

I think there may be race conditions with this change, since if we free those strings while we call janus_rtp_header_extension_parse_rid (which would happen in a different thread) this may crash. This wasn't a problem before since even during a change the method would still be using a pointer that still existed (and leaked). We'll probably need a mutex at the plugins level any time janus_rtp_simulcasting_prepare and janus_rtp_header_extension_parse_rid are called (or any time those rids are modified in the plugin not through those methods). Not sure if this should be done as part of this PR or after it's merged, but it's definitely needed.

@OxleyS
Copy link
Contributor Author

OxleyS commented Jun 8, 2022

Hmm, that is indeed a possible race. We could mitigate/maybe-solve the issue by not touching the output RIDs array in janus_rtp_simulcasting_prepare if the set of RIDs has not changed between calls. As I understand it, both libwebrtc and browser APIs will not let you change the RIDs nor simulcast layer count for a transceiver after they have been set anyway, so this may be a relatively safe assumption to make (we could even make it an error case?).

@lminiero
Copy link
Member

lminiero commented Jun 8, 2022

I'm not sure we can ensure that: rids can be added (because you add substreams), and if you add a lower substream than the ones you had before, the existing rids would be shifted. Besides, m-lines can be recycled (simulcast video published, then video removed, then another simulcast video published), which means there can be bigger changes.

@OxleyS
Copy link
Contributor Author

OxleyS commented Jun 8, 2022

In the case of Videoroom specifically, I see three points of potential RID array racing:

  1. janus_videoroom_incoming_rtp, reads when a packet on an unknown SSRC comes in
  2. janus_rtp_simulcasting_context_process_rtp, also called within janus_videoroom_incoming_rtp, reads for the same reason
  3. janus_rtp_simulcasting_prepare, writes when handling an offer

The second one is tricky because if we simply protect all access to the RID array via a plugin-level mutex, then we're forced to either take the mutex on every incoming simulcast RTP packet, which would hurt performance, or pass that mutex into janus_rtp_simulcasting_context_process_rtp and have it lock it only when needed, which is not pretty.

I want to propose wrapping the RID array with ref-counting so (3) can just make an entirely new array instead, but implementing that looks like a complicated endeavor.

Besides, m-lines can be recycled

They can - a different video stream could be put on a previously used m-line - but as I understand it, that cannot change the set of RIDs, at least as far as WebRTC APIs are concerned. Note that RIDs are a readonly property of encodings, and that setParameters cannot change the number of encodings.

rids can be added (because you add substreams), and if you add a lower substream than the ones you had before, the existing rids would be shifted

I couldn't find an instance of where this could happen in the code, could you point one out? Or do you mean a subsequent offer adding a new RID? My impression is that if this was the case, it would interfere with the SSRC<->substream mapping cache we maintain in places (such as vssrc on publisher streams). If that is the case, that would mean that the current implementation does not support such a scenario anyway.

@lminiero lminiero added the multistream Related to Janus 1.x label Jun 17, 2022
@lminiero
Copy link
Member

The second one is tricky because if we simply protect all access to the RID array via a plugin-level mutex, then we're forced to either take the mutex on every incoming simulcast RTP packet, which would hurt performance, or pass that mutex into janus_rtp_simulcasting_context_process_rtp and have it lock it only when needed, which is not pretty.

I agree it's not pretty but I think it's not that ugly either... and would ensure we only use the mutex when we really have to, which wouldn't be often. It would indeed mean a change in the signature of the function, though, so a breaking change for custom plugins that rely on it, but if we update all our plugins to use it I don't particularly care. It would be important to bump the plugin version in plugin.h, though, so that older plugins have to be recompiled and so need to be aware of the change.

@OxleyS
Copy link
Contributor Author

OxleyS commented Jun 21, 2022

Alright, I'll give that a shot.

- Moved the freeing of existing RIDs from janus_rtp_simulcasting_prepare() to Videoroom.
- janus_rtp_simulcasting_context_process_rtp() now takes an optional argument to a mutex to lock when reading the RIDs array.
- Updated all builtin plugins to be compatible with this change, and bumped the plugin version.
@OxleyS
Copy link
Contributor Author

OxleyS commented Jun 22, 2022

I've been looking through the other plugins, and it seems the assumption that the RIDs array can be safely read from unprotected is fairly deep-set into them. I don't have confidence that this is something I can address without either being lock-happy and hurting performance, or risking breaking something. Some plugins (namely, Duktape and Lua) I don't have the means to properly test in any case.

In light of this, I believe it's best for me to back out of freeing the RIDs in janus_rtp_simulcasting_prepare(), and to instead do so at the plugin level just before calling this function. This means the memory leak will be fixed in Videoroom but remain in other plugins, which will have to be fixed further down the road.

I have, however, implemented the change to janus_rtp_simulcasting_context_process_rtp() as discussed, where it takes a new mutex argument that it locks when it has to read from the RIDs array. I've accordingly bumped the plugin version and updated non-Videoroom plugins to just pass NULL for the argument, preserving existing behavior.

@lminiero
Copy link
Member

Looking at the patch this seems to be making things a bit more complicated than I expected. Maybe an easier way to address this might be to change the janus_rtp_simulcasting_context struct so that it hosts the SSRC and rid array itself, and so the mutex as well, using it automatically when needed. This may make it more transparent to plugins, no matter which it is, and better contain the simulcast properties where they belong. Of course I'm not asking you to look into this: I'll try to have a stab at it later today, to see if it could indeed be simpler or not, and in case propose an alternative PR we can discuss.

@lminiero
Copy link
Member

Maybe an easier way to address this might be to change the janus_rtp_simulcasting_context struct so that it hosts the SSRC and rid array itself

Mh I don't think that will work, mostly because the ssrc and rid array have nothing to do with the janus_rtp_simulcasting_context struct: the fact the function is called janus_rtp_simulcasting_prepare is just because the "topic" is the same, but they're not related at all. This becomes evident when the same simulcast source is shared among multiple recipients (e.g., VideoRoom), and so where the same arrays are used for all recipients, each with their own janus_rtp_simulcasting_context struct, while the publisher doesn't have one.

I'll add a few notes inline to your latest changes shortly.

Copy link
Member

@lminiero lminiero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a few notes, mostly editorial, but also on where I don't think we need a mutex. I'll have another look at the finalized changes after that, so that I can check what I'll need to change in the other plugins. I think I'll probably add a helper for cleaning up rids, as it feels like something we can make a function of (a bit like the prepare one).

@@ -2142,6 +2142,7 @@ typedef struct janus_videoroom_publisher_stream {
gboolean opusstereo; /* Whether this publisher is doing stereo Opus */
gboolean simulcast, svc; /* Whether this stream uses simulcast or VP9 SVC */
uint32_t vssrc[3]; /* Only needed in case VP8 (or H.264) simulcasting is involved */
janus_mutex rid_mutex; /* Protects access to the rid array and the extmap ID */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's spaces before the comment? We use tabs to (mostly) align comments in structs.

forward->sim_context.rid_ext_id = ps->rid_extmap_id;
janus_mutex_unlock(&ps->rid_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need a lock here, it feels overkill to me.

stream->sim_context.rid_ext_id = ps->rid_extmap_id;
janus_mutex_unlock(&ps->rid_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

stream->sim_context.rid_ext_id = ps->rid_extmap_id;
janus_mutex_unlock(&ps->rid_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@@ -6991,21 +7001,25 @@ void janus_videoroom_incoming_rtp(janus_plugin_session *handle, janus_plugin_rtp
sc = 1;
else if(ssrc == ps->vssrc[2])
sc = 2;
else if(ps->rid_extmap_id > 0) {
else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the reason above, I don't think you need to split the if that way to protect the ID, it's enough to just protect the "parse rid" method.

src/rtp.c Outdated
@@ -1057,7 +1049,7 @@ void janus_rtp_simulcasting_prepare(json_t *simulcast, int *rid_ext_id, uint32_t

gboolean janus_rtp_simulcasting_context_process_rtp(janus_rtp_simulcasting_context *context,
char *buf, int len, uint32_t *ssrcs, char **rids,
janus_videocodec vcodec, janus_rtp_switching_context *sc) {
janus_videocodec vcodec, janus_rtp_switching_context *sc, janus_mutex* rid_mutex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per our code style, the * should be next to the variable, not the type.

src/rtp.h Outdated
* @returns TRUE if the packet should be relayed, FALSE if it should be dropped instead */
gboolean janus_rtp_simulcasting_context_process_rtp(janus_rtp_simulcasting_context *context,
char *buf, int len, uint32_t *ssrcs, char **rids,
janus_videocodec vcodec, janus_rtp_switching_context *sc);
janus_videocodec vcodec, janus_rtp_switching_context *sc, janus_mutex* rid_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@OxleyS
Copy link
Contributor Author

OxleyS commented Jun 27, 2022

Thanks for looking at this again, I've pushed a commit that addresses all the points above.

@lminiero
Copy link
Member

Thanks! This looks good to me: I'll merge and start updating the other plugins too then.

@lminiero lminiero merged commit 16ced28 into meetecho:master Jun 27, 2022
@lminiero
Copy link
Member

@OxleyS I should have done everything in the above commit, which now includes a new janus_rtp_simulcasting_cleanup function to selectively reset some properties, with or without a mutex. If you can give it a glance just to make sure I didn't mess anything up it would be really appreciated!

@OxleyS
Copy link
Contributor Author

OxleyS commented Jun 27, 2022

Thanks! Just took a look at the new commit, looks good to me.

@OxleyS OxleyS deleted the fix-rid-memory-leak branch June 27, 2022 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multistream Related to Janus 1.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants