-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New request: create ZIMs of all TEDx talks #1152
Comments
@benoit74 the only problem with this is that we have nearly hundred thousands of TEDx talks in English, if we count other languages we will reach millions. |
Well I see that TEDx in Portuguese already near 5,000 videos. Is there a way to find out if there is some sort of curation by topic that we can also scrape? |
I do not have more ways than you do have. |
We already agreed in the past that we should avoid to have multiple language per ZIM, so I still agree that focusing only on playlist per language is already interesting. Btw, where do you see 5000 videos in Portugues? But anyway, I don't think that 5000 videos is impossible to scrape, even if sure it has an associated cost. We can maybe focus on languages which are badly covered by TED and are common in our known userbase and have a modest number of videos. |
They have two playlists for portuguese ... https://youtube.com/playlist?list=PLsRNoUx8w3rMzRnIIYOsv-oYXbIHqDNk_ Aside the number of videos, don't know what the difference is ... |
Currently, TED scraper / Zimfarm configurations are only scraping the official TED talks, published on TED website. This means about 6.6K individual videos.
Only few TEDx talks are included (e.g. 567 videos in https://download.kiwix.org/zim/ted/ted_mul_tedx_2024-08.zim) but this is only a very small fractions of the 221K TEDx videos hosted on Youtube official channel: https://www.youtube.com/Tedxtalks
AFAIK, these TEDx talks hosted on ted.com are the official ones endorsed by TED organization, where the TEDx talks from youtube channel are talks from conferences organized by independent organizations only reusing the brand (with permission).
I would like that we provide all these TEDx talks as ZIMs as well. We obviously need to discuss a strategy to create ZIMs which are practical to handle (in term of size) and search for content.
The text was updated successfully, but these errors were encountered: