Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add workflow to download multiple SRA accessions to multiple bams: fetch_multiple_sra_to_bams #537

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

tomkinsc
Copy link
Member

@tomkinsc tomkinsc commented May 3, 2024

add workflow to download multiple SRA accessions to multiple bams: fetch_multiple_sra_to_bams

This is useful in the event a sample is associated with multiple sequencing runs (i.e. more than one SRR###). It also adjusts the Fetch_SRA_to_BAM task to find metadata for the requested run in the event multi-run metadata is returned for a given accession.

tomkinsc added 4 commits May 3, 2024 19:04
…tch_multiple_sra_to_bams

add workflow to download multiple SRA accessions to multiple bams: fetch_multiple_sra_to_bams; this is useful in the event a sample is associated with multiple sequencing runs (i.e. more than one SRR###). It also adjusts the Fetch_SRA_to_BAM task to find metadata for the requested run, in the event multi-run metadata is returned for a given accession
…) function not available until WDL >= 1.1)

name the output tsv file using the first specified ID since we are operating under WDL 1.0 (the `sep()` function is not available to join arrays of strings until WDL >= 1.1, and `~{sep="_" variable}` seemingly does not work outside a command block)
…dinstitute/viral-pipelines into ct-pluralize-fetch-sra-to-bam
@tomkinsc tomkinsc requested a review from dpark01 May 3, 2024 23:34
Copy link
Member

@dpark01 dpark01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks -- since I missed the use case motivating this I just have some questions:

  1. Would it be useful to emit a merged bam File output at the workflow level (either instead of or in addition to the Array[File], again depending on the use case motivating this)?
  2. I might feel better if we threw in a check / assertion that there is one and only one unique biosample_accession value across all the results. If that's not in conflict with the use case, maybe we can do that at the end of the workflow?
  3. And if we can assume that, it might also be nice if the workflow could emit an output like Map[String,String] biosample_metadata (which of course should be identical for all elements of the scatter so this just presents the deduplicated map). This would just contain the keys that start with sample_ but not the other ones that are tied to the SRA entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants