Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-variant HLS streams #311

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

olafal0
Copy link

@olafal0 olafal0 commented Oct 22, 2024

Fixes #310. Allows multi-variant HLS streams to work for URL ingresses. This fixes two issues:

1.decodebin3 handles variant selection automatically, but sends the notify::caps signal when the source resolution changes. This caused the pipeline to attempt to create a new video output bin and link it to the input bin's video src pad, which fails. Then, the whole pipeline fails.
2. WebRTCSink.AddTrack creates a capsfilter for each layer with caps: fmt.Sprintf("video/x-raw,width=%d,height=%d", layer.Width, layer.Height). The width and height used for these layers come from the video's initial resolution, which can be very low if it's a low-bitrate variant. So, even if a higher-resolution variant is selected, it will be scaled back down to whatever it was at first.

Changes:

  • In Pipelines, atomically track whether video and audio output bins have already been created. If they have, skip adding and linking them.
  • In WebRTCSinks, use the video encoding option's high layer resolution instead of the source resolution, if the video encoding option's width or height is greater than source width or height.

I've tested with this HLS file: https://devstreaming-cdn.apple.com/videos/streaming/examples/img_bipbop_adv_example_fmp4/master.m3u8 (Note that this may still fail on main, since it contains subtitle tracks, and fails with unsupported mime type (application/x-subtitle-vtt) for the source media. As a workaround, adding application/x-subtitle-vtt to supportedMimeTypes in pkg/media/urlpull/source.go fixes this, and variant selection works correctly.)

@olafal0 olafal0 requested a review from a team as a code owner October 22, 2024 16:15
@CLAassistant
Copy link

CLAassistant commented Oct 22, 2024

CLA assistant check
All committers have signed the CLA.

@biglittlebigben
Copy link
Contributor

Thanks for submitting this. Glad to see there is a way to make this work with decodebin3. The added logic to upscale the video to the largest layer would however break an existing functionality where we drop layers that are bigger than the source, and match the biggest layer size to the source if smaller. You can see this logic in the filterAndSortLayersByQuality function.

It is important that this functionality is not lost in most cases as upscaling is wasteful, and can lead to degraded quality, by decreasing the amount of bits available per macro bloc to encode no extra details.

Does gstreamer with decodebin3 provide any way to get the expected list of variants from the manifest, anywhere in the pipeline? If not, we may want to ensure that the upscaling code is only triggered on multivariant sources.

@olafal0
Copy link
Author

olafal0 commented Oct 23, 2024

Ah, makes sense. Unfortunately I didn't find a way to access the list of variants—the manifest is obviously being parsed, and I can see them in the gstreamer logs, but STREAM_COLLECTION messages didn't contain other variants in my tests. Same with the decodebin3 select-stream signal.

We could potentially recalculate layer sizes and change the caps property of the capsfilter when decodebin3 changes variants, since we definitely have that information. Then layer sizing can remain the same, just with updates when the source resolution changes. I'm not sure how the downstream elements will handle that, but I'll try it out.

* Store original width and height in layers
* When output bins receive new video caps, change capsfilter caps to
  use new resolution, without upscaling
@olafal0
Copy link
Author

olafal0 commented Oct 24, 2024

Update: changing the caps on the capsfilter does work, and avoids upscaling. This does introduce a separate issue, however: we can't discard layers when the source is too small, since those layers need to exist for use later. As an example: an HLS stream is started, and defaults to 320x180. We then use the layers:

LOW: 320x180
MEDIUM: 320x180
HIGH: 320x180

Later, hlsdemux2 selects a higher-resolution stream, the queue in the video output bin is notified of the new caps, we recalculate layer sizes, and then change the caps of the capsfilter. Now, the layers are:

LOW: 480x270
MEDIUM: 980x540
HIGH: 1280x720

The downside is, of course, that if the video remains 320x180, then we're pushing 3 duplicate streams for the lifetime of the input.

It would be best if we could skip creating lower layers when they're duplicates, and then create them when needed. I'll look into that next. Maybe we could block output bins that aren't needed yet?

@biglittlebigben
Copy link
Contributor

Update: changing the caps on the capsfilter does work, and avoids upscaling. This does introduce a separate issue, however: we can't discard layers when the source is too small, since those layers need to exist for use later. As an example: an HLS stream is started, and defaults to 320x180. We then use the layers:

LOW: 320x180
MEDIUM: 320x180
HIGH: 320x180

Later, hlsdemux2 selects a higher-resolution stream, the queue in the video output bin is notified of the new caps, we recalculate layer sizes, and then change the caps of the capsfilter. Now, the layers are:

LOW: 480x270
MEDIUM: 980x540
HIGH: 1280x720

The downside is, of course, that if the video remains 320x180, then we're pushing 3 duplicate streams for the lifetime of the input.

It would be best if we could skip creating lower layers when they're duplicates, and then create them when needed. I'll look into that next. Maybe we could block output bins that aren't needed yet?

Thanks for looking into this further. The livekit protocol doesn't allow changing the layers after initial publication. However, it is possible to:

  • Send video smaller than the nominal layer size (with the caveat that some stream level APIs will return wrong dimensions)
  • Pause sending media on the largest layers. The SFU should deal with properly provided there is media coming on the smaller layers.

So, indeed, one approach would be to block the output of the layers that are duplicates of the smaller ones, and change the dimensions of layers dynamically as needed.

I'm also curious: what is the behavior of the x264enc gstreamer module when the caps change on its sink pad? The underlying x264 encoding library doesn't support changing video size on the fly. Does the gstreamer module recreate an encoder context as needed on caps change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ingress of HLS playlists with multiple variants fail
3 participants