-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not able to download some files #53
Comments
Hi Franco, Sorry for the late reply. What you are experiencing is a classic syndrome of a few ESGF datanodes not responding (either temporarily or permanently). But this is not the end of the world, luckily most datasets (and files) are replicated across multiple datanodes, so the steps to follow are :
This behaviour is due to esgpull having no knowledge a priori of the state of the datanode listed in ESGF catalogues. So for each file, one of the datanodes listed is picked randomly. Do let me know if this solves your issue and sorry again for the delay |
Hi Atef, |
Hello @AtefBN I have similar questions, but I'm now using Synda selection file to download. I'm trying to download the data by converting Synda selection file as below,
However, only 1.3 TiB / 2.0 TiB is downloaded, even if I try "retry" and "download" again and again. When I see error logs, below two errors occur for many URLs:
I think I can try to download the replica of datasets, but I don't know how to do it. For your information, below is my configuration file.
I wonder whether the change in 'replica' of api.default_options enables the downloading of replicas or not (I miss the "synda replica next"). Thank you so much for developing a wonderful program! |
Hi @hanjunkim0617, it seems whatever datanodes esgpull randomly selected from the list of available for those files is behaving badly. I have a few of these with larger queries and sometimes waiting a few days then retrying the download works. For the time being you need to:
For 1 just check the log files related to the download job, sometimes 503 errors are harder to debug Just a ps, you can see what datanodes are available for you for a query by using the --hints flag for example: This works on all facets but can help you target a specific datanode if you're sure it behaves best and has better performance. Hope this helps. |
Dear @AtefBN Thanks so much for the quick reply! I have two additional questions.
Thanks again for all of your help!! |
Hi,
I have created the following query:
esgpull add project:CMIP6 experiment_id:historical,ssp126,ssp245,ssp585 member_id:r1i1p1f1 source_id:EC-Earth3,MPI-ESM1-2-HR table_id:day variable_id:pr,tasmax,tasmin --track
then:
esgpull update
esgpull download
Some files were effectively downloaded while others were not. I've tried many times with:
esgpull retry
esgpull download
but for 69 files I always end up with the following error (from logfile):
httpcore.ConnectError: All connection attempts failed
My configuration is the following:
[download]
chunk_size = 67108864
http_timeout = 20
max_concurrent = 5
disable_ssl = false
disable_checksum = false
[api]
index_node = "esgf-data.dkrz.de"
http_timeout = 20
max_concurrent = 5
page_limit = 50
[api.default_options]
distrib = "true"
latest = "none"
replica = "none"
retracted = "false"
I've tried switching different index nodes but it did not help.
The corresponding files are actually available on the nodes since I can get them from the ESGF web interface through HTTP Download
Any suggestions?
Thanks
Franco
The text was updated successfully, but these errors were encountered: