You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
from trafilatura.downloads import add_to_compressed_dict, buffered_downloads, load_download_buffer
# list of URLs
mylist = ['https://www.example.org', 'https://www.httpbin.org/html']
# number of threads to use
threads = 4
# converted the input list to an internal format
url_store = add_to_compressed_dict(mylist)
# processing loop
while url_store.done is False:
bufferlist, url_store = load_download_buffer(url_store, sleep_time=5)
# process downloads
for url, result in buffered_downloads(bufferlist, threads):
# do something here
print(url)
print(result)
I'm not sure how to add DOWNLOAD_TIMEOUT to each connection in this code. It would be great if anyone could help out.
Thanks
The text was updated successfully, but these errors were encountered:
Hi @vodkaslime, indeed. It is not currently possible to pass a suitable argument to buffered_downloads, there is a missing link between the config (older) and options (newer) formats.
The code and the docs are both impacted and both need to be updated.
Trying to download multiple urls with download timeout.
I could download single urls one by one with
fetch_url
with setting download timeout. (Not sure if it's best practice to set download timeout):However when following tutorial https://trafilatura.readthedocs.io/en/latest/downloads.html:
I'm not sure how to add
DOWNLOAD_TIMEOUT
to each connection in this code. It would be great if anyone could help out.Thanks
The text was updated successfully, but these errors were encountered: