Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New request: create ZIMs of all TEDx talks #1152

Open
benoit74 opened this issue Sep 2, 2024 · 6 comments
Open

New request: create ZIMs of all TEDx talks #1152

benoit74 opened this issue Sep 2, 2024 · 6 comments
Labels
Enhancement New feature or request

Comments

@benoit74
Copy link
Contributor

benoit74 commented Sep 2, 2024

Currently, TED scraper / Zimfarm configurations are only scraping the official TED talks, published on TED website. This means about 6.6K individual videos.

Only few TEDx talks are included (e.g. 567 videos in https://download.kiwix.org/zim/ted/ted_mul_tedx_2024-08.zim) but this is only a very small fractions of the 221K TEDx videos hosted on Youtube official channel: https://www.youtube.com/Tedxtalks

AFAIK, these TEDx talks hosted on ted.com are the official ones endorsed by TED organization, where the TEDx talks from youtube channel are talks from conferences organized by independent organizations only reusing the brand (with permission).

I would like that we provide all these TEDx talks as ZIMs as well. We obviously need to discuss a strategy to create ZIMs which are practical to handle (in term of size) and search for content.

@RavanJAltaie
Copy link
Contributor

@benoit74 the only problem with this is that we have nearly hundred thousands of TEDx talks in English, if we count other languages we will reach millions.
@Popolechien do you think this is feasible?

@RavanJAltaie RavanJAltaie added Enhancement New feature or request labels Sep 19, 2024
@Popolechien
Copy link
Collaborator

Well I see that TEDx in Portuguese already near 5,000 videos. Is there a way to find out if there is some sort of curation by topic that we can also scrape?

@benoit74
Copy link
Contributor Author

I do not have more ways than you do have.

@benoit74
Copy link
Contributor Author

We already agreed in the past that we should avoid to have multiple language per ZIM, so I still agree that focusing only on playlist per language is already interesting.

Btw, where do you see 5000 videos in Portugues?

I see only 364:
image

But anyway, I don't think that 5000 videos is impossible to scrape, even if sure it has an associated cost.

We can maybe focus on languages which are badly covered by TED and are common in our known userbase and have a modest number of videos.

@Popolechien
Copy link
Collaborator

Popolechien commented Sep 19, 2024

Yours says more! videos in portuguese. But on the front page they have the Portuguese one (and Spanish (4,947 videos) and Hindi (1,987))
Screenshot 2024-09-19 at 14 53 03

@benoit74
Copy link
Contributor Author

They have two playlists for portuguese ...

https://youtube.com/playlist?list=PLsRNoUx8w3rMzRnIIYOsv-oYXbIHqDNk_
https://youtube.com/playlist?list=PLsRNoUx8w3rOwHx4kVJL5ksS9vTxj5hXn

Aside the number of videos, don't know what the difference is ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants