Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for extracting subtitles from MPD manifests #24517

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Add support for extracting subtitles from MPD manifests #24517

wants to merge 2 commits into from

Conversation

Lukas0907
Copy link
Contributor

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

This PR adds support for extracting subtitles from MPD manifests. The added code is similar to to _parse_mpd_formats() but only handles subtitles.

@dstftw
Copy link
Collaborator

dstftw commented Mar 31, 2020

  1. No code copypasting.
  2. Both must be extracted via the same routine with a single pass.

@Lukas0907
Copy link
Contributor Author

Thanks for the review. I will fix it.

@Lukas0907
Copy link
Contributor Author

@dstftw I have refactored the code, wrote a test and force pushed. Please take a look.

@Lukas0907
Copy link
Contributor Author

@dstftw Hi, any chance to get this merged? Thanks!

def _parse_mpd_formats(self, *args, **kwargs):
return self._parse_mpd_formats_subtitles(*args, **kwargs)[0]

def _parse_mpd_formats_subtitles(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Must return info dict.

mpd_doc, mpd_id=mpd_id, mpd_base_url=mpd_base_url,
formats_dict=formats_dict, mpd_url=mpd_url)

def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None):
def _parse_mpd_formats(self, *args, **kwargs):
return self._parse_mpd_formats_subtitles(*args, **kwargs)[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaks.

Comment on lines +966 to 1061
'subtitles',
'https://example.com/streams/1/playlist/playlist.mpd', # mpd_url
'https://example.com/streams/1/playlist', # mpd_base_url
[{'acodec': 'mp4a.40.2',
'asr': 48000,
'container': 'm4a_dash',
'ext': 'm4a',
'filesize': None,
'format_id': '131kbps',
'format_note': 'DASH audio',
'fps': None,
'fragment_base_url': 'https://example.com/streams/1/playlist/',
'fragments': [{'path': '../audio/1_stereo_131072/dash/init.mp4'},
{'duration': 3989.0,
'path': '../audio/1_stereo_131072/dash/segment_0.m4s'},
{'duration': 3989.0,
'path': '../audio/1_stereo_131072/dash/segment_1.m4s'}],
'height': None,
'language': 'de',
'manifest_url': 'https://example.com/streams/1/playlist/playlist.mpd',
'protocol': 'http_dash_segments',
'tbr': 131.072,
'url': 'https://example.com/streams/1/playlist/playlist.mpd',
'vcodec': 'none',
'width': None},
{'acodec': 'mp4a.40.2',
'asr': 48000,
'container': 'm4a_dash',
'ext': 'm4a',
'filesize': None,
'format_id': '196kbps',
'format_note': 'DASH audio',
'fps': None,
'fragment_base_url': 'https://example.com/streams/1/playlist/',
'fragments': [{'path': '../audio/1_stereo_196608/dash/init.mp4'},
{'duration': 3989.0,
'path': '../audio/1_stereo_196608/dash/segment_0.m4s'},
{'duration': 3989.0,
'path': '../audio/1_stereo_196608/dash/segment_1.m4s'}],
'height': None,
'language': 'de',
'manifest_url': 'https://example.com/streams/1/playlist/playlist.mpd',
'protocol': 'http_dash_segments',
'tbr': 196.608,
'url': 'https://example.com/streams/1/playlist/playlist.mpd',
'vcodec': 'none',
'width': None},
{'acodec': 'none',
'asr': None,
'container': 'mp4_dash',
'ext': 'mp4',
'filesize': None,
'format_id': '720p 1712kbps',
'format_note': 'DASH video',
'fps': 25,
'fragment_base_url': 'https://example.com/streams/1/playlist/',
'fragments': [{'path': '../video/720_1712128/dash/init.mp4'},
{'duration': 4000.0,
'path': '../video/720_1712128/dash/segment_0.m4s'},
{'duration': 4000.0,
'path': '../video/720_1712128/dash/segment_1.m4s'}],
'height': 720,
'language': None,
'manifest_url': 'https://example.com/streams/1/playlist/playlist.mpd',
'protocol': 'http_dash_segments',
'tbr': 1712.128,
'url': 'https://example.com/streams/1/playlist/playlist.mpd',
'vcodec': 'avc1.42c00d',
'width': 1280},
{'acodec': 'none',
'asr': None,
'container': 'mp4_dash',
'ext': 'mp4',
'filesize': None,
'format_id': '1080p 4669kbps',
'format_note': 'DASH video',
'fps': 25,
'fragment_base_url': 'https://example.com/streams/1/playlist/',
'fragments': [{'path': '../video/1080_4669440/dash/init.mp4'},
{'duration': 4000.0,
'path': '../video/1080_4669440/dash/segment_0.m4s'},
{'duration': 4000.0,
'path': '../video/1080_4669440/dash/segment_1.m4s'}],
'height': 1080,
'language': None,
'manifest_url': 'https://example.com/streams/1/playlist/playlist.mpd',
'protocol': 'http_dash_segments',
'tbr': 4669.44,
'url': 'https://example.com/streams/1/playlist/playlist.mpd',
'vcodec': 'avc1.42c00d',
'width': 1920}],
{'en': [{'ext': 'vtt',
'url': 'https://example.com/streams/1/subtitles/sub_en.vtt'}],
'fr': [{'ext': 'vtt',
'url': 'https://example.com/streams/1/subtitles/sub_fr.vtt'}]},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style.

@Lukas0907
Copy link
Contributor Author

@dstftw I tried to incorporate your proposed changes in 2508a4b, please have a look.

pukkandan added a commit to yt-dlp/yt-dlp that referenced this pull request Apr 28, 2021
Authored by fstirlitz
Modified from: ytdl-org/youtube-dl#6144

Closes: #73
Fixes:
ytdl-org/youtube-dl#6106
ytdl-org/youtube-dl#14977
ytdl-org/youtube-dl#21438
ytdl-org/youtube-dl#23609
ytdl-org/youtube-dl#28132

Might also fix (untested):
ytdl-org/youtube-dl#15424
ytdl-org/youtube-dl#18267
ytdl-org/youtube-dl#23899
ytdl-org/youtube-dl#24375
ytdl-org/youtube-dl#24595
ytdl-org/youtube-dl#27899

Related:
ytdl-org/youtube-dl#22379
ytdl-org/youtube-dl#24517
ytdl-org/youtube-dl#24886
ytdl-org/youtube-dl#27215

Notes:
* The functions `extractor.common._extract_..._formats` are still kept for compatibility
* Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles`
* Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats
* AES support is untested
* The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players
    * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`.
        Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file
    * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit>
* Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools
    * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg`
    * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac>
    * But validity of the those extracted from ISM are untested
nixxo pushed a commit to nixxo/yt-dlp that referenced this pull request Nov 22, 2021
Authored by fstirlitz
Modified from: ytdl-org/youtube-dl#6144

Closes: #73
Fixes:
ytdl-org/youtube-dl#6106
ytdl-org/youtube-dl#14977
ytdl-org/youtube-dl#21438
ytdl-org/youtube-dl#23609
ytdl-org/youtube-dl#28132

Might also fix (untested):
ytdl-org/youtube-dl#15424
ytdl-org/youtube-dl#18267
ytdl-org/youtube-dl#23899
ytdl-org/youtube-dl#24375
ytdl-org/youtube-dl#24595
ytdl-org/youtube-dl#27899

Related:
ytdl-org/youtube-dl#22379
ytdl-org/youtube-dl#24517
ytdl-org/youtube-dl#24886
ytdl-org/youtube-dl#27215

Notes:
* The functions `extractor.common._extract_..._formats` are still kept for compatibility
* Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles`
* Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats
* AES support is untested
* The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players
    * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`.
        Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file
    * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit>
* Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools
    * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg`
    * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac>
    * But validity of the those extracted from ISM are untested
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants