revisit/discuss genome-grist k-mer trimming #141

ctb · 2022-01-22T14:58:17Z

Over in #107 (comment), @mr-eyes asked -

Wanted to ask, was there a specific criterion for selecting the abundtrim and trimming parameters? I can't imagine how it will biologically affect the results.

did you take a look at https://peerj.com/preprints/890/?

More - this trimming is not important for either sourmash gather or mapping, which are the two primary read-based analyses that genome-grist does. Read mapping is 'other' than k-mer approaches, and sourmash gather is reference based and lightweight so it basically doesn't care if there are lots of erroneous k-mers hanging out in the data set.

However, doing some kind of k-mer abundance trimming is important for cDBG-graph approaches like spacegraphcats. This is because every erroneous k-mer fragments the cDBG.

So it is nice to have genome-grist download the SRA metagenome and preprocess it for "free".

The default parameters in the trim-low-abund specify that only reads with an estimated k-mer coverage of 18 or higher will be trimmed (-Z 18 -V), at a k-mer abundance of 2 or lower (-C 3). There should be no "loss" of k-mers from low-abundance reads, which are important to retain for metagenomes.

We've used these parameters in a lot of publications and they were chosen and evaluated ages ago. I now have a much better intuition (and we have a lot more data and experience!) and I'm not sure there's a strong reason to revisit them now, but I'm game if someone has criteria on which to evaluate them. It'd be reassuring if nothing else ;)

The text was updated successfully, but these errors were encountered:

ctb · 2022-09-28T01:56:41Z

#199 removes abundance trimming from the default genome-grist workflow. Leaving this open 'til I integrate it into docs more better.

ctb added the faq FYI/questions you should have asked label Jan 22, 2022

ctb mentioned this issue Sep 18, 2022

should we turn off read trimming/trim-low-abund, or make it optional? #197

Closed

ctb mentioned this issue May 3, 2023

/.singularity.d/runscript: 3: exec: trim-low-abund.py: not found sourmash-bio/sourmash#2606

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

revisit/discuss genome-grist k-mer trimming #141

revisit/discuss genome-grist k-mer trimming #141

ctb commented Jan 22, 2022

ctb commented Sep 28, 2022

revisit/discuss genome-grist k-mer trimming #141

revisit/discuss genome-grist k-mer trimming #141

Comments

ctb commented Jan 22, 2022

ctb commented Sep 28, 2022