-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HLS WebVTT subtitle downloading support #6144
Conversation
i want you to know that in this commit avformat/hlsenc: Add WebVtt support in hls has been added to ffmpeg. |
libavcodec still doesn't have any support for Anyway, developers: is this any good? Should I adapt this to take advantage of #6392? @dstftw, @jaimeMF? |
self.to_screen( | ||
'[hlswebvtt] %s: Downloading manifest' % | ||
(info_dict['id'])) | ||
data = self.ydl.urlopen(url).read() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work in case of live HLS WebVTT streams because you constantly get new subtitle segments at the same playlist URL.
It's a shame X-TIMESTAMP-MAP support patch hasn't been merged to ffmpeg yet after 3 years, but in my use-case (vlive.tv) it's not required, so dumping HLS WebVTT via ffmpeg works quite good.
_extract_m3u8_formats is renamed to _extract_m3u8_formats_and_subtitles and extended to properly handle subtitle references; a wrapper with the old name is provided for compatibility. _parse_m3u8_formats is likewise renamed and extended, but without adding the compatibility wrapper; the test suite is adjusted to test the enhanced method instead.
This code is awesome -- I had just started writing something just like this -- is there a reason it's not in master? |
You tell me. |
Authored by fstirlitz Modified from: ytdl-org/youtube-dl#6144 Closes: #73 Fixes: ytdl-org/youtube-dl#6106 ytdl-org/youtube-dl#14977 ytdl-org/youtube-dl#21438 ytdl-org/youtube-dl#23609 ytdl-org/youtube-dl#28132 Might also fix (untested): ytdl-org/youtube-dl#15424 ytdl-org/youtube-dl#18267 ytdl-org/youtube-dl#23899 ytdl-org/youtube-dl#24375 ytdl-org/youtube-dl#24595 ytdl-org/youtube-dl#27899 Related: ytdl-org/youtube-dl#22379 ytdl-org/youtube-dl#24517 ytdl-org/youtube-dl#24886 ytdl-org/youtube-dl#27215 Notes: * The functions `extractor.common._extract_..._formats` are still kept for compatibility * Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles` * Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats * AES support is untested * The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`. Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit> * Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg` * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac> * But validity of the those extracted from ISM are untested
Authored by fstirlitz Modified from: ytdl-org/youtube-dl#6144 Closes: #73 Fixes: ytdl-org/youtube-dl#6106 ytdl-org/youtube-dl#14977 ytdl-org/youtube-dl#21438 ytdl-org/youtube-dl#23609 ytdl-org/youtube-dl#28132 Might also fix (untested): ytdl-org/youtube-dl#15424 ytdl-org/youtube-dl#18267 ytdl-org/youtube-dl#23899 ytdl-org/youtube-dl#24375 ytdl-org/youtube-dl#24595 ytdl-org/youtube-dl#27899 Related: ytdl-org/youtube-dl#22379 ytdl-org/youtube-dl#24517 ytdl-org/youtube-dl#24886 ytdl-org/youtube-dl#27215 Notes: * The functions `extractor.common._extract_..._formats` are still kept for compatibility * Only some extractors have currently been moved to using `_extract_..._formats_and_subtitles` * Direct subtitle manifests (without a master) are not supported and are wrongly identified as containing video formats * AES support is untested * The fragmented TTML subtitles extracted from DASH/ISM are valid, but are unsupported by `ffmpeg` and most video players * Their XML fragments can be dumped using `ffmpeg -i in.mp4 -f data -map 0 -c copy out.ttml`. Once the unnecessary headers are stripped out of this, it becomes a valid self-contained ttml file * The ttml subs downloaded from DASH manifests can also be directly opened with <https://github.com/SubtitleEdit> * Fragmented WebVTT files extracted from DASH/ISM are also unsupported by most tools * Unlike the ttml files, the XML fragments of these cannot be dumped using `ffmpeg` * The webtt subs extracted from DASH can be parsed by <https://github.com/gpac/gpac> * But validity of the those extracted from ISM are untested
This should resolve #6106.
The patches are fairly invasive. Since downloading HLS WebVTT subtitles seems to be relatively heavyweight bandwidth-wise, downloading them right away in the extractor is out of the question; but there was no other way to download these subtitles in an offline-viewable form. Hence the subtitle downloading code was hacked to make use of the downloader infrastructure and a custom HLS WebVTT downloader was written.
The code was tested with Python 3.4.3 and 2.7.10 on a few ComCarCoff videos.