Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[youtube] fix chapters extractor (fix #24819) #24848

Closed
wants to merge 1 commit into from

Conversation

jaimebl
Copy link

@jaimebl jaimebl commented Apr 17, 2020

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

  • I am the original author of this code and I am willing to release it under Unlicense
  • I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

  • Bug fix
  • Improvement
  • New extractor
  • New feature

Description of your pull request and other information

That's a fix for issue #24819

It modifies the regular expression to capture chapters from description when they follow this pattern:
pattern: 00:00 - 09:24 <title>

You can see an example of that here:
https://www.youtube.com/watch?v=gBRKnvK1JUE

@jaimebl jaimebl changed the title [youtube] fix chapters extractor [youtube] fix chapters extractor (fix #24819) Apr 17, 2020
@jaimebl jaimebl marked this pull request as draft April 17, 2020 21:51
@jaimebl jaimebl force-pushed the #24819-youtube-chapters-fix branch from 5b0398a to bf07f06 Compare April 19, 2020 18:59
@jaimebl jaimebl force-pushed the #24819-youtube-chapters-fix branch from bf07f06 to 78d7146 Compare April 19, 2020 19:02
@jaimebl jaimebl marked this pull request as ready for review April 19, 2020 19:06
@dstftw
Copy link
Collaborator

dstftw commented Jun 5, 2020

For this pattern you must extract start and end time from each entry.

@anabis
Copy link

anabis commented Jul 14, 2020

Hello, there is another way to get the chapters names and timestamp without hazardously parsing the description, it is now in an inline JSON in the page, I don't know Python but I you can get the info like so :
curl -s "https://www.youtube.com/watch?v=AK9r27jWVrc" | grep "ytInitialData" | grep -o "{.*}" | jq '.playerOverlays.playerOverlayRenderer.decoratedPlayerBarRenderer.decoratedPlayerBarRenderer.playerBar.chapteredPlayerBarRenderer.chapters'

@someziggyman
Copy link

Looks like this solution fixes cases for timecodes formatted like this: "00:12 some title of the chapter"
However, it fails for timecode formats like this one video: https://www.youtube.com/watch?v=Le0-eaxQ1_k

And these are mystery, but for some reason timecodes are null as well:
https://www.youtube.com/watch?v=YhO_E8xm0aY
https://www.youtube.com/watch?v=7wriyeBB5hY

I guess it's some regex problem and it may be rather complicated to cover all the formatting cases, however decided t post this comment here. Thank you for making youtube-dl better!

@dstftw dstftw force-pushed the master branch 2 times, most recently from 5e26784 to da2069f Compare September 13, 2020 13:50
@dirkf dirkf closed this Aug 1, 2023
@dirkf dirkf added the defunct PR source branch is not accessible label Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defunct PR source branch is not accessible pending-fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants