iadownloader is a tool to automatically download files from the Internet Archive. It will download all the files - individually or as a compressed archive - in an internet archive upload url automatically, to a configurable download location (defaults to the current working directory). It can also download complete collections etc, by parsing either json or csv files generated by Internet Archive’s advanced search tool.
iadownloader.py [-h] [-c] [-o OUTPUT_DIR] [-t THREADS] [-T] url
positional arguments:
url URL or path to json/csv file
optional arguments:
-h, --help show this help message and exit
-c, --compressed Get the compressed archive download instead of the individual files
-o OUTPUT_DIR, --output_dir OUTPUT_DIR
Path to output directory
-t THREADS, --threads THREADS
Number of simultaneous downloads (maximum of 10)
-T, --torrent Only download the torrent file if available
The basic usage is to simply invoke iadownloader with a download url.
python iadownloader.py https://archive.org/download/<url>
This causes all the files in the url to be downloaded to the directory the script was invoked from.
Optionally specify the download location:
python iadownloader.py -o /download/path https://archive.org/download/<url>
To download the compressed archive of the upload just add the ‘-c’ flag:
python iadownloader.py -c -o /download/path https://archive.org/download/<url>
You can also specify the amount of threads (up to 10):
python iadownloader.py -t 8 /download/path https://archive.org/download/<url>
It defaults to 4 threads if not specified.
Don’t confuse “download url” with individual file urls. Those are trivially downloaded through your web browser. This tool is to simplify downloading all the included urls in an upload on Internet Archive. Even this can be done using the Web UI quite easily. Where iadownloader shines is the ability to download full collections automatically.
To download a whole collection, all files from a certain author, etc, go to Internet Archive’s advanced search tool and follow the following steps:
- Scroll down to “Advanced Search returning JSON, XML, and more”. In the “Query” field enter collection:<name of collection> for collections, creator:<name of creator> for creators, etc. In “Field to return” select “identifier” if not already selected. Select an appropriate “Number of results” depending on the collection.
- Choose either JSON format or CSV format. CSV format is a bit more convenient since it prompts you to download it immediately, while the JSON format opens a javascript page with embedded JSON data. Save the .csv file to a location. If you choose JSON, save the page and make sure to save it with the .json ending rather than the suggested .js one.
- Run iadownloader.py like this:
python iadownloader.py -o /download/path /path/to/csv-or-json-file
iadownloader will go through all the downloads of the collection and download them into the download path.
iadownloader uses requests, lxml, and tqdm to do its magic. To make sure you have them use the included requirements.txt:
pip install -r requirements.txt
Of course, you need python and pip as well.