You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Wanted to ask, was there a specific criterion for selecting the abundtrim and trimming parameters? I can't imagine how it will biologically affect the results.
More - this trimming is not important for either sourmash gather or mapping, which are the two primary read-based analyses that genome-grist does. Read mapping is 'other' than k-mer approaches, and sourmash gather is reference based and lightweight so it basically doesn't care if there are lots of erroneous k-mers hanging out in the data set.
However, doing some kind of k-mer abundance trimming is important for cDBG-graph approaches like spacegraphcats. This is because every erroneous k-mer fragments the cDBG.
So it is nice to have genome-grist download the SRA metagenome and preprocess it for "free".
The default parameters in the trim-low-abund specify that only reads with an estimated k-mer coverage of 18 or higher will be trimmed (-Z 18 -V), at a k-mer abundance of 2 or lower (-C 3). There should be no "loss" of k-mers from low-abundance reads, which are important to retain for metagenomes.
We've used these parameters in a lot of publications and they were chosen and evaluated ages ago. I now have a much better intuition (and we have a lot more data and experience!) and I'm not sure there's a strong reason to revisit them now, but I'm game if someone has criteria on which to evaluate them. It'd be reassuring if nothing else ;)
The text was updated successfully, but these errors were encountered:
Over in #107 (comment), @mr-eyes asked -
More - this trimming is not important for either sourmash gather or mapping, which are the two primary read-based analyses that genome-grist does. Read mapping is 'other' than k-mer approaches, and sourmash gather is reference based and lightweight so it basically doesn't care if there are lots of erroneous k-mers hanging out in the data set.
However, doing some kind of k-mer abundance trimming is important for cDBG-graph approaches like spacegraphcats. This is because every erroneous k-mer fragments the cDBG.
So it is nice to have genome-grist download the SRA metagenome and preprocess it for "free".
The default parameters in the trim-low-abund specify that only reads with an estimated k-mer coverage of 18 or higher will be trimmed (
-Z 18 -V
), at a k-mer abundance of 2 or lower (-C 3
). There should be no "loss" of k-mers from low-abundance reads, which are important to retain for metagenomes.We've used these parameters in a lot of publications and they were chosen and evaluated ages ago. I now have a much better intuition (and we have a lot more data and experience!) and I'm not sure there's a strong reason to revisit them now, but I'm game if someone has criteria on which to evaluate them. It'd be reassuring if nothing else ;)
The text was updated successfully, but these errors were encountered: