You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
looks like #102 may have had some unintended side effects, although I am not %100 sure that I'm interpreting the situation correctly.
Basically I returned to use the pipeline since the last release and I was getting a lot of 429 http errors at the stage of the pipeline where SRR accessions are being resolved to SRX accessions. I think this is likely due to the pipeline exceeding the eutils rate limits (3 requests per second without an API key, 10 requests per second with an API key, per https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/)
Happily, I was able to work around it by getting the SRX numbers from the SRA Run metadata download and just supplying those directly; I assume that this simply avoids calling eutils but I guess this isn't the way it's expected to work.
It seems like adding support for API keys to eutils could be one approach but likely it wouldn't be too hard to exceed 10 requests per second either, so probably some other strategy for throttling or batching the requests would be better. Maybe not parallelizing this aspect of the pipeline and just getting the accessions in a single batch request would make more sense? It's kind of funny that the actual parallel download of fastqs is not an issue while the parallel conversion of ids seems to trigger this (at least, if I'm interpreting correctly).
Command used and terminal output
nextflow run nf-core/fetchngs --force_sratools_download --input alfalfa_gene_index_acclist.txt --nf_core_pipeline rnaseq --outdir alfalfa_gene_index -profile singularity...Error executing process > 'NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SRR1820232)'Caused by: Process `NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO (SRR1820232)` terminated with an error exit status (1)Command executed: echo SRR1820232 > id.txt sra_ids_to_runinfo.py \ id.txt \ SRR1820232.runinfo.tsv \ cat <<-END_VERSIONS > versions.yml "NFCORE_FETCHNGS:SRA:SRA_IDS_TO_RUNINFO": python: $(python --version | sed 's/Python //g') END_VERSIONSCommand exit status: 1Command output: (empty)Command error: WARNING: DEPRECATED USAGE: Forwarding SINGULARITYENV_NXF_DEBUG as environment variable will not be supported in the future, use APPTAINERENV_NXF_DEBUG instea WARNING: While bind mounting '/erdos/adf/nf-core/fetchngs/work/7a/d13ccf59fee2fb454b5ea67d464909:/erdos/adf/nf-core/fetchngs/work/7a/d13ccf59fee2fb454b5ea67d464909': destination is already in the mount point list [ERROR] The server couldn't fulfill the request. [ERROR] Status: 429 Too Many RequestsWork dir: /erdos/adf/nf-core/fetchngs/work/7a/d13ccf59fee2fb454b5ea67d464909Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
Yeah, fair point. I would put this as another pointer towards using ffq #110 If you have the time and it's important to you right now, introducing ffq should be quite straight forward.
thanks @Midnighter - looks like @drpatelh is on top of it already (#100)! I haven't looked closely at that but I'll just add that I'd guess that simply swapping in calls to ffq to the current parallel structure would likely end up hitting the same rate limit issue ( see https://github.com/pachterlab/ffq#failure-modes number 4). So probably trying to use batching for the metadata calls would be relevant in any case?
Still haven't had time to look into adding ffq properly I'm afraid 😏 Going to get a maintenance release out in the next couple of days. We can maybe assess this after we have added ffq support.
Btw, the only way to avoid this is by calling the run info resolution in a single process that can control the request rate. As soon as multiple processes are started in parallel, all bets are off.
Description of the bug
looks like #102 may have had some unintended side effects, although I am not %100 sure that I'm interpreting the situation correctly.
Basically I returned to use the pipeline since the last release and I was getting a lot of 429 http errors at the stage of the pipeline where SRR accessions are being resolved to SRX accessions. I think this is likely due to the pipeline exceeding the eutils rate limits (3 requests per second without an API key, 10 requests per second with an API key, per https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/)
Happily, I was able to work around it by getting the SRX numbers from the SRA Run metadata download and just supplying those directly; I assume that this simply avoids calling eutils but I guess this isn't the way it's expected to work.
It seems like adding support for API keys to eutils could be one approach but likely it wouldn't be too hard to exceed 10 requests per second either, so probably some other strategy for throttling or batching the requests would be better. Maybe not parallelizing this aspect of the pipeline and just getting the accessions in a single batch request would make more sense? It's kind of funny that the actual parallel download of fastqs is not an issue while the parallel conversion of ids seems to trigger this (at least, if I'm interpreting correctly).
Command used and terminal output
Relevant files
this file triggered the error:
SRR_Acc_List.txt
this file has the equivalent SRX accessions and worked fine
SRX_Acc_List.txt
System information
nextflow version 21.10.6.5660
nf-core/fetchngs v1.7
The text was updated successfully, but these errors were encountered: