Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for subsampling alignment to uniform coverage #17

Closed
IsmailM opened this issue Jan 9, 2020 · 5 comments · Fixed by #67
Closed

Support for subsampling alignment to uniform coverage #17

IsmailM opened this issue Jan 9, 2020 · 5 comments · Fixed by #67
Labels
enhancement New feature or request

Comments

@IsmailM
Copy link

IsmailM commented Jan 9, 2020

Hey,

Great tool.

Are there any plans to support Bam files (which would then ideally output a downsampled bam file)?

At the moment, if I want to do this, I would need to:

  1. convert BAM to fasta (using samtools fasta -F 4)
  2. downsample with Rasusa
  3. use the read ids in the downsampled fasta to filter my BAM

As such would be a lot easier if Rausa could support BAM files :)

@mbhall88 mbhall88 added the enhancement New feature or request label Jan 13, 2020
@mbhall88
Copy link
Owner

Hi @IsmailM

I'm glad you're finding the tool useful.

Great question. I have some reservations around supporting BAM files as they are not quite as straightforward as fastq/a. For instance, there is the issue of reads having multiple entries in a BAM if there are secondary/supplementary alignments. I.e if the random subsample chooses a secondary alignment entry, should it also have to keep the primary alignment entry?

In the meantime, as you say, your workaround would be the best solution. The added benefit of your solution is that you can apply filtering via samtools prior to feeding into rasusa. As I have mentioned elsewhere, it is not my intention to introduce any kind of filtering options for filetypes in rasusa. The reason for this is that the tool would not strictly be taking a random subsample then. As such, even if I were to implement BAM support you would likely still end up needing to do at least steps 1 and 2 from your current workflow.

Thank you for the feature request nonetheless. If, after discussions, we decide BAM support is not going to happen, I would still very much appreciate input on a code snippet I could add to the README for others trying to do the same thing as you.

@mbhall88 mbhall88 reopened this Jun 6, 2020
@mbhall88 mbhall88 changed the title Support for BAM file? Support for subsampling alignment to uniform coverage Jun 6, 2020
@mbhall88 mbhall88 added this to the Release 0.3.0 milestone Jun 6, 2020
@mbhall88
Copy link
Owner

@IsmailM I just came across VariantBam, which seems to do what you're after I think?

@eesiribloom
Copy link

I would also appreciate input on the code snippet for how to downsample from a bam and end up with a bam again (without re-aligning) :)

@mbhall88
Copy link
Owner

Coincidentally, I have been thinking about this feature lately. Depending on how I go over the next few weeks I may look at implementing this feature.

@mbhall88
Copy link
Owner

Okay, this is implemented in v1.0.0 in the subcommand aln. Please try it out and report any issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants