Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support downloading epubs from archive.org #1

Open
MerlijnWajer opened this issue Aug 21, 2020 · 8 comments
Open

Support downloading epubs from archive.org #1

MerlijnWajer opened this issue Aug 21, 2020 · 8 comments

Comments

@MerlijnWajer
Copy link
Contributor

Would be a nice feature I think.

@petterreinholdtsen
Copy link
Contributor

Which search URL should be used to find epub files in the Internet Archive?

@MerlijnWajer
Copy link
Contributor Author

Probably either using openlibrary.org, or the advanced search feature (I will need to change the query, but here's an example):

https://archive.org/advancedsearch.php?q=collection%3Ainternetarchivebooks&fl%5B%5D=identifier&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=50&page=1&output=json&callback=callback&save=yes

@MerlijnWajer
Copy link
Contributor Author

MerlijnWajer commented Sep 23, 2020

@MerlijnWajer
Copy link
Contributor Author

You might want to also add AND NOT noindex:* to remove some of the more bogus results.

@MerlijnWajer
Copy link
Contributor Author

One more correction, I think AND NOT format:(ACS Encrypted PDF) is required as well. That should make sure we only get free epubs.

@MerlijnWajer
Copy link
Contributor Author

So likely, this is the right query: https://archive.org/search.php?query=NOT%20format%3A%28ACS%20Encrypted%20EPUB%29%20AND%20NOT%20format%3A%28ACS%20Encrypted%20PDF%29%20AND%20scanningcenter%3A%2A%20AND%20mediatype%3Atexts%20AND%20NOT%20noindex%3A%2A%20sherlock%20holmes

NOT format:(ACS Encrypted EPUB) AND NOT format:(ACS Encrypted PDF) AND scanningcenter:* AND mediatype:texts AND NOT noindex:* sherlock holmes

@MerlijnWajer
Copy link
Contributor Author

This seems like a good test item, renders fine in Dorian too: https://archive.org/details/masterpiecesofsh02doyl

@MerlijnWajer
Copy link
Contributor Author

One more thing, other formats than JSON can also be returned: https://archive.org/advancedsearch.php

In case that might be simpler.

And you can also filter by AND title:titlehere or AND creator:authorhere if you want to match Gutenberg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants