Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] remove abundance trimming from the default genome-grist workflow #199

Merged
merged 9 commits into from
Sep 26, 2022

Conversation

ctb
Copy link
Member

@ctb ctb commented Aug 27, 2022

This PR removes the abundtrim/trim-low-abund steps from the default genome-grist workflow, per #197.

Specifically,

  • removes /abundtrim/{sample}.abundtrim.fq.gz from most rule inputs and replaces with /trim/{sample}.trim.fq.gz
  • adds new top-level rule abundtrim_reads that does produce /abundtrim/{sample}.abundtrim.fq.gz for all configured samples
  • removes khmer from the trim.yml conda environment and puts it under a new conda environment, abundtrim.yml

Benefits -

  • the abundtrim step is one of the slowest and most memory-intensive steps when working with large data sets, so removing it makes genome-grist's default workflow much faster.
  • it removes a confusing and poorly documented step from genome-grist.
  • it takes us one big step closer to removing the khmer package as a required dependency; the only other place khmer is used is in counting k-mers with the unique-kmers script, but we could replace that in the future with some other tool.

Disadvantages -

  • this should have no significant disadvantages for genome-grist. The major negative outcome will be that the percentage of the metagenome estimated to be "unknown" will increase by a bit.
  • The other disadvantage is that abundance trimming is important for graph-based approaches like spacegraphcats, so people wanting to run sgc downstream of genome-grist will need to run

ref discussion in #197

TODO

  • adjust documentation so that it's clear that this trimming step is optional and largely intended for sgc compat
  • update docs about putting your own metagenomes in place as /trim/{sample}.trim.fq.gz

@ctb ctb changed the title [EXP] try disabling trim-low-abund step. [WIP] remove abundance trimming from the default genome-grist workflow Sep 26, 2022
@ctb ctb changed the title [WIP] remove abundance trimming from the default genome-grist workflow [MRG] remove abundance trimming from the default genome-grist workflow Sep 26, 2022
@ctb ctb merged commit 8c94649 into latest Sep 26, 2022
@ctb ctb deleted the disable/trim_low_abund branch September 26, 2022 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant