[Invidious] Add new extractor #31426

OverShifted · 2022-12-15T16:41:40Z

Before submitting a pull request make sure you have:

Searched the bugtracker for similar pull requests
Read adding new extractor tutorial
Read youtube-dl coding conventions and adjusted the code to meet them
Covered the code with tests (note that PRs without tests will be REJECTED)
Checked the code with flake8

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

I am the original author of this code and I am willing to release it under Unlicense
I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Bug fix
Improvement
New extractor
New feature

Description of your pull request and other information

Add a new extractor which is able to download from Invidious instances, since the Youtube extractor isn't able to download from Invidious correctly.

dirkf · 2022-12-17T00:52:09Z

Thanks, but I doubt that this is a good solution.

The existing YT extradtor knows about a whole lot of Invidious instances. I believe that your problem is just that the list of instances in extractor/youtube.py doesn't include the ones you want. Creating a separate extractor with another unmaintainable list will just make it worse. Or is there some way in which the extraction in the YT module is unsatisfactory for the currently supported IV sites?

See also #29885. The discussion there is now really of historical interest, though (and also the linked PR) because yt-dlp has now implemented a page-based extraction system in the generic extractor to handle these cases (Invidious, PeerTube, etc). yt-dl will eventually pull this in instead of the original PR, so as to maximise commonality and avoid incompatible reinvention.

OverShifted · 2022-12-17T07:03:05Z

It seems like the youtube extractor sends at least one request to youtube.
I've added r'(?:www\.)?yt\.artemislena\.eu' to _INVIDIOUS_SITES.
And also added print("self._downloader.urlopen called with", url_or_request) before this line.
And here is the output:

$ python -m youtube_dl --verbose -F https://yt.artemislena.eu/watch\?v\=BaW_jenozKc
[debug] System config: []
[debug] User config: []
[debug] Custom config: []
[debug] Command-line args: ['--verbose', '-F', 'https://yt.artemislena.eu/watch?v=BaW_jenozKc']
[debug] Encodings: locale UTF-8, fs utf-8, out utf-8, pref UTF-8
[debug] youtube-dl version 2021.12.17
[debug] Git HEAD: a784be739
[debug] Python version 3.10.8 (CPython) - Linux-6.0.12-arch1-1-x86_64-with-glibc2.36
[debug] exe versions: ffmpeg 5.1.2, ffprobe 5.1.2, rtmpdump 2.4
[debug] Proxy map: {}
[youtube] BaW_jenozKc: Downloading webpage
self._downloader.urlopen called with https://www.youtube.com/watch?v=BaW_jenozKc&bpctr=9999999999&has_verified=1

This behavior can be problematic in environments with limited access to youtube itself.

dirkf · 2022-12-17T07:11:23Z

But then won't the download (googlevideo.com) URLs also be inaccessible?

OverShifted · 2022-12-17T07:53:22Z

In my case, no.
That's why I implemented _patch_url.
But even if googlevideo.com was accessible, I wouldn't be able to extract the download link in the first place.

dirkf · 2022-12-17T13:54:04Z

Normally a benefit of yt-dl vs using the YT web interface is to avoid the odious bloat of the latter while being able to capture a lot of the detailed metadata that comes with it.

If a user who has YT access wants yt-dl to process an Invidious page, going to YT directly can give a better result because (AFAIK) less rich metadata is available on the IV page. The API is another matter, but reproducing the details of the YT extractor using the IV API would be a massive task.

But if YT is blocked for the user, it would plainly be better to use the IV page instead. The problem is how to combine these tactics. For one-off uses, the IV page may have a download function, but yt-dl users are going to want a batchable solution.

And all these considerations apply equally for other YT front-ends, which seem to be proliferating.

OverShifted · 2022-12-17T15:40:31Z

IMHO, when a user gives yt-dl an invidious link, he/she probably wants to download from invidious servers.
Because otherwise, he/she could just "convert" that to a youtube link. (just replace the host with youtube.com)

pukkandan · 2022-12-21T13:43:05Z

IMHO, when a user gives yt-dl an invidious link, he/she probably wants to download from invidious servers.
Because otherwise, he/she could just "convert" that to a youtube link. (just replace the host with youtube.com)

If this were a completely new feature, I would agree. But we have be auto-translating invidious inks to youtube for a long time. This means many users would be expecting to get all the metadata youtube provides even with a invidious URL. Having the new extractor return less data is a regression. Perhaps a invidious: prefix could be supported similar to teachable:

dirkf · 2022-12-22T03:43:56Z

Maybe, as there are other front-end sites for which the same issue could arise, we should introduce an option like --[no-]extract-page-only with no- being the default (surely not the best option name). Then an IV extractor could check this and by default punt to self.url_result('https://www.youtube.com/watch?v=' + video_id, ie='Youtube'); or if --extract-page-only it could go ahead and extract the IV page without touching YT.

This might also apply where a site has links and metadata in the page but could also use some API URL(s) for more metadata and formats, whether to avoid blocked URLs or increase extraction speed.

gamer191 · 2023-03-11T10:11:38Z

This means many users would be expecting to get all the metadata youtube provides even with a invidious URL.

If that were the case, why would they use an invidious url?

Having the new extractor return less data is a regression.

I struggle to see how performing the expected behaviour is a regression. Invidious is always going to be worse than youtube, but that doesn't mean people who pass invidious urls expect their urls to be silently converted to youtube urls

we should introduce an option like --[no-]extract-page-only with no- being the default

That seems reasonable, although I think there should be a warning if someone passes an invidious url with neither option, and people can silence that warning by explicitly using --no-extract-page-only (I don't know if that's actually possible to implement)

This might also apply where a site has links and metadata in the page but could also use some API URL(s) for more metadata and formats, whether to avoid blocked URLs or increase extraction speed.

Wanting to avoid Google feels like a completely different use-case to not wanting to download from the website you're using's api.

gamer191 · 2023-05-07T10:13:41Z

Having the new extractor return less data is a regression.

If this is an issue (and imo it's not) perhaps the new invidious extractor should be limited to new instances (that aren't in youtube.py)

krasnh · 2023-12-17T10:03:34Z

Having the new extractor return less data is a regression.

Recently I wanted to download a video from one of the Invidious servers. I was very surprised when it redirected to YouTube. :)

This behavior can be problematic in environments with limited access to youtube itself.

absidue · 2024-01-08T05:33:24Z

Having the new extractor return less data is a regression.

If this is an issue (and imo it's not) perhaps the new invidious extractor should be limited to new instances (that aren't in youtube.py)

@gamer191 Invidious will always return less data than YouTube, regardless of which version of Invidious that you use. It also doesn't support things like multiple audio tracks and subtitle translating (the have to use the Innertube transcript API endpoint, which doesn't support translating, and convert the response to WebVTT, as the publicly listed instances get ratelimited on YouTube's subtitle endpoint). The format list -F would also be useless if you built it based on the Invidious API, as it returns hardcoded dimensions based on the itag (most noticeable for vertical videos, because the dimensions will be horizontal).

dirkf · 2024-01-08T11:44:10Z

Consider the two use cases:

I want to access YouTube content through Invidious without ever directly interacting with YT servers, because they are unreachable for me, or because I dislike them, or whatever.
I want to access YouTube content through Invidious because I got a link to Invidious and actually had no idea that it was anything to do with YouTube.

Since the second case was trivial, if tiresome, to support, that's what happened.

Arguably the first case should have been given priority, since it would have supported users who need (or want) Invidious to act as a proxy, and so are content with whatever limitations that implies.

absidue · 2024-01-20T08:49:04Z

I do agree that passing an Invidious URL should download from Invidious, I just wanted to point out that there is significantly less usable metadata that you might have thought at fist. So you'll either have to decided to show the incorrect metadata that Invidious returns or not show it at all, in either case you are likely to get user complaints.

My point is that while the change does seem like a good idea it will be a breaking change, which you'll want to mention clearly in the changelog and potentially even log a warning message for a while.

Think of it from a users perspective if you upgrade youtube-dl and suddenly your format filter/selector no longer works, because height, width and fps are not available or completely incorrect, you would want to be clearly informed during downloading why that is happening.

OverShifted added 2 commits December 15, 2022 19:35

[invidious] Add new extractor

991ac05

[invidious] Add more tests

a784be7

OverShifted changed the title ~~Invidious~~ [Invidious] Add new extractor Dec 15, 2022

gamer191 mentioned this pull request May 7, 2023

piped get me The handshake operation timed out yt-dlp/yt-dlp#7004

Closed

11 tasks

krasnh mentioned this pull request Jan 8, 2024

Support invidious natively yt-dlp/yt-dlp#8952

Closed

10 tasks

dirkf mentioned this pull request Oct 19, 2024

Support for piped.simpleprivacy.fr yt-dlp/yt-dlp#7537

Open

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Invidious] Add new extractor #31426

[Invidious] Add new extractor #31426

OverShifted commented Dec 15, 2022 •

edited

Loading

dirkf commented Dec 17, 2022

OverShifted commented Dec 17, 2022

dirkf commented Dec 17, 2022

OverShifted commented Dec 17, 2022

dirkf commented Dec 17, 2022

OverShifted commented Dec 17, 2022

pukkandan commented Dec 21, 2022

dirkf commented Dec 22, 2022

gamer191 commented Mar 11, 2023

gamer191 commented May 7, 2023

krasnh commented Dec 17, 2023 •

edited

Loading

absidue commented Jan 8, 2024

dirkf commented Jan 8, 2024

absidue commented Jan 20, 2024

[Invidious] Add new extractor #31426

Are you sure you want to change the base?

[Invidious] Add new extractor #31426

Conversation

OverShifted commented Dec 15, 2022 • edited Loading

Before submitting a pull request make sure you have:

In order to be accepted and merged into youtube-dl each piece of code must be in public domain or released under Unlicense. Check one of the following options:

What is the purpose of your pull request?

Description of your pull request and other information

dirkf commented Dec 17, 2022

OverShifted commented Dec 17, 2022

dirkf commented Dec 17, 2022

OverShifted commented Dec 17, 2022

dirkf commented Dec 17, 2022

OverShifted commented Dec 17, 2022

pukkandan commented Dec 21, 2022

dirkf commented Dec 22, 2022

gamer191 commented Mar 11, 2023

gamer191 commented May 7, 2023

krasnh commented Dec 17, 2023 • edited Loading

absidue commented Jan 8, 2024

dirkf commented Jan 8, 2024

absidue commented Jan 20, 2024

OverShifted commented Dec 15, 2022 •

edited

Loading

krasnh commented Dec 17, 2023 •

edited

Loading