Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Action to estimate get-sequences space requirement #60

Open
adamovanja opened this issue Dec 14, 2021 · 3 comments
Open

Action to estimate get-sequences space requirement #60

adamovanja opened this issue Dec 14, 2021 · 3 comments
Labels
enhancement New feature or request shelved To be reconsidered at a later time

Comments

@adamovanja
Copy link
Contributor

As a plugin user,
I would like q2fondue to inform me how much space is roughly needed to download the sequences with get-sequences for the accession IDs in my --m-accession-ids-file such that I can compare it with my available Q2 TMPDIR space.

Suggested implementation approach:

  • Can be a separate action estimate-space-req that makes use of vdb-dump --info from sra-tools and sums over space requirements for all runIDs multiplying it with a factor of 8 or 10 (since according to sra-tools' wiki “As a rule of thumb you should have about 8x … 10x the size of the accession available on your filesystem.”)
  • estimate-space-req could be integrated into pipeline get-all before get-sequences is run.
  • Readme and/or tutorial could include a note suggesting to user to change TMPDIR location to a location with more space if its space is exceeded currently.
@adamovanja adamovanja added the enhancement New feature or request label Dec 14, 2021
@misialq
Copy link
Collaborator

misialq commented Dec 15, 2021

Hmmm, that's an interesting feature. I'm not sure, however, I'd go for having this as a separate action. Actions need to produce artifacts and this would not necessarily need to produce anything - it just needs to warn the user about the space requirement. And yes, we could introduce yet another fake type and produce some sort of an artifact but here it would really be an abuse. If there really is a need for something like that, maybe it could just be a part of get-sequences and warn the user before we start fetching anything?

Having said that, if I have a project with 500 runs I would need to do 500 more requests to NCBI to finds its size... It's much simpler to just go to the SRA Run Selector, put in the project ID and one can immediately see the total size of all the files included. And even if one needs to do it for a couple of projects, I'm not sure automating that is worth the effort...

I definitely think though, we should include a section in the readme to make the users aware of space requirements and how they can change tmpdirs. We can also show how to quickly get some estimates using the SRA Web interface.

@adamovanja
Copy link
Contributor Author

I agree with you that the suggested command should not return anything and hence is not suitable as an action.

How about for now we include the section on space requirements in the ReadMe and keep this suggested command as a potential future feature - in case we by then find a more efficient way of doing it or think it might be a crucial addition anyhow?

@misialq
Copy link
Collaborator

misialq commented Dec 15, 2021

Yup, that sounds great to me! 👍

@misialq misialq added the shelved To be reconsidered at a later time label Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request shelved To be reconsidered at a later time
Projects
None yet
Development

No branches or pull requests

2 participants