-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for extracting subtitles from MPD manifests #24517
base: master
Are you sure you want to change the base?
Conversation
|
Thanks for the review. I will fix it. |
@dstftw I have refactored the code, wrote a test and force pushed. Please take a look. |
@dstftw Hi, any chance to get this merged? Thanks! |
def _parse_mpd_formats(self, *args, **kwargs): | ||
return self._parse_mpd_formats_subtitles(*args, **kwargs)[0] | ||
|
||
def _parse_mpd_formats_subtitles(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must return info dict.
youtube_dl/extractor/common.py
Outdated
mpd_doc, mpd_id=mpd_id, mpd_base_url=mpd_base_url, | ||
formats_dict=formats_dict, mpd_url=mpd_url) | ||
|
||
def _parse_mpd_formats(self, mpd_doc, mpd_id=None, mpd_base_url='', formats_dict={}, mpd_url=None): | ||
def _parse_mpd_formats(self, *args, **kwargs): | ||
return self._parse_mpd_formats_subtitles(*args, **kwargs)[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaks.
'subtitles', | ||
'https://example.com/streams/1/playlist/playlist.mpd', # mpd_url | ||
'https://example.com/streams/1/playlist', # mpd_base_url | ||
[{'acodec': 'mp4a.40.2', | ||
'asr': 48000, | ||
'container': 'm4a_dash', | ||
'ext': 'm4a', | ||
'filesize': None, | ||
'format_id': '131kbps', | ||
'format_note': 'DASH audio', | ||
'fps': None, | ||
'fragment_base_url': 'https://example.com/streams/1/playlist/', | ||
'fragments': [{'path': '../audio/1_stereo_131072/dash/init.mp4'}, | ||
{'duration': 3989.0, | ||
'path': '../audio/1_stereo_131072/dash/segment_0.m4s'}, | ||
{'duration': 3989.0, | ||
'path': '../audio/1_stereo_131072/dash/segment_1.m4s'}], | ||
'height': None, | ||
'language': 'de', | ||
'manifest_url': 'https://example.com/streams/1/playlist/playlist.mpd', | ||
'protocol': 'http_dash_segments', | ||
'tbr': 131.072, | ||
'url': 'https://example.com/streams/1/playlist/playlist.mpd', | ||
'vcodec': 'none', | ||
'width': None}, | ||
{'acodec': 'mp4a.40.2', | ||
'asr': 48000, | ||
'container': 'm4a_dash', | ||
'ext': 'm4a', | ||
'filesize': None, | ||
'format_id': '196kbps', | ||
'format_note': 'DASH audio', | ||
'fps': None, | ||
'fragment_base_url': 'https://example.com/streams/1/playlist/', | ||
'fragments': [{'path': '../audio/1_stereo_196608/dash/init.mp4'}, | ||
{'duration': 3989.0, | ||
'path': '../audio/1_stereo_196608/dash/segment_0.m4s'}, | ||
{'duration': 3989.0, | ||
'path': '../audio/1_stereo_196608/dash/segment_1.m4s'}], | ||
'height': None, | ||
'language': 'de', | ||
'manifest_url': 'https://example.com/streams/1/playlist/playlist.mpd', | ||
'protocol': 'http_dash_segments', | ||
'tbr': 196.608, | ||
'url': 'https://example.com/streams/1/playlist/playlist.mpd', | ||
'vcodec': 'none', | ||
'width': None}, | ||
{'acodec': 'none', | ||
'asr': None, | ||
'container': 'mp4_dash', | ||
'ext': 'mp4', | ||
'filesize': None, | ||
'format_id': '720p 1712kbps', | ||
'format_note': 'DASH video', | ||
'fps': 25, | ||
'fragment_base_url': 'https://example.com/streams/1/playlist/', | ||
'fragments': [{'path': '../video/720_1712128/dash/init.mp4'}, | ||
{'duration': 4000.0, | ||
'path': '../video/720_1712128/dash/segment_0.m4s'}, | ||
{'duration': 4000.0, | ||
'path': '../video/720_1712128/dash/segment_1.m4s'}], | ||
'height': 720, | ||
'language': None, | ||
'manifest_url': 'https://example.com/streams/1/playlist/playlist.mpd', | ||
'protocol': 'http_dash_segments', | ||
'tbr': 1712.128, | ||
'url': 'https://example.com/streams/1/playlist/playlist.mpd', | ||
'vcodec': 'avc1.42c00d', | ||
'width': 1280}, | ||
{'acodec': 'none', | ||
'asr': None, | ||
'container': 'mp4_dash', | ||
'ext': 'mp4', | ||
'filesize': None, | ||
'format_id': '1080p 4669kbps', | ||
'format_note': 'DASH video', | ||
'fps': 25, | ||
'fragment_base_url': 'https://example.com/streams/1/playlist/', | ||
'fragments': [{'path': '../video/1080_4669440/dash/init.mp4'}, | ||
{'duration': 4000.0, | ||
'path': '../video/1080_4669440/dash/segment_0.m4s'}, | ||
{'duration': 4000.0, | ||
'path': '../video/1080_4669440/dash/segment_1.m4s'}], | ||
'height': 1080, | ||
'language': None, | ||
'manifest_url': 'https://example.com/streams/1/playlist/playlist.mpd', | ||
'protocol': 'http_dash_segments', | ||
'tbr': 4669.44, | ||
'url': 'https://example.com/streams/1/playlist/playlist.mpd', | ||
'vcodec': 'avc1.42c00d', | ||
'width': 1920}], | ||
{'en': [{'ext': 'vtt', | ||
'url': 'https://example.com/streams/1/subtitles/sub_en.vtt'}], | ||
'fr': [{'ext': 'vtt', | ||
'url': 'https://example.com/streams/1/subtitles/sub_fr.vtt'}]}, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style.
Authored by fstirlitz Modified from: ytdl-org/youtube-dl#6144 Closes: #73 Fixes: ytdl-org/youtube-dl#6106 ytdl-org/youtube-dl#14977 ytdl-org/youtube-dl#21438 ytdl-org/youtube-dl#23609 ytdl-org/youtube-dl#28132 Might also fix (untested): ytdl-org/youtube-dl#15424 ytdl-org/youtube-dl#18267 ytdl-org/youtube-dl#23899 ytdl-org/youtube-dl#24375 ytdl-org/youtube-dl#24595 ytdl-org/youtube-dl#27899 Related: ytdl-org/youtube-dl#22379 ytdl-org/youtube-dl#24517 ytdl-org/youtube-dl#24886 ytdl-org/youtube-dl#27215 Notes: * The functions `extractor.common._extract_..._formats` are still kept for compatibility * Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles` * Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats * AES support is untested * The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`. Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit> * Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg` * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac> * But validity of the those extracted from ISM are untested
Authored by fstirlitz Modified from: ytdl-org/youtube-dl#6144 Closes: #73 Fixes: ytdl-org/youtube-dl#6106 ytdl-org/youtube-dl#14977 ytdl-org/youtube-dl#21438 ytdl-org/youtube-dl#23609 ytdl-org/youtube-dl#28132 Might also fix (untested): ytdl-org/youtube-dl#15424 ytdl-org/youtube-dl#18267 ytdl-org/youtube-dl#23899 ytdl-org/youtube-dl#24375 ytdl-org/youtube-dl#24595 ytdl-org/youtube-dl#27899 Related: ytdl-org/youtube-dl#22379 ytdl-org/youtube-dl#24517 ytdl-org/youtube-dl#24886 ytdl-org/youtube-dl#27215 Notes: * The functions `extractor.common._extract_..._formats` are still kept for compatibility * Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles` * Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats * AES support is untested * The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`. Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit> * Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg` * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac> * But validity of the those extracted from ISM are untested
Before submitting a pull request make sure you have:
In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:
What is the purpose of your pull request?
Description of your pull request and other information
This PR adds support for extracting subtitles from MPD manifests. The added code is similar to to
_parse_mpd_formats()
but only handles subtitles.