Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit/discuss genome-grist k-mer trimming #141

Open
ctb opened this issue Jan 22, 2022 · 1 comment
Open

revisit/discuss genome-grist k-mer trimming #141

ctb opened this issue Jan 22, 2022 · 1 comment
Labels
faq FYI/questions you should have asked

Comments

@ctb
Copy link
Member

ctb commented Jan 22, 2022

Over in #107 (comment), @mr-eyes asked -

Wanted to ask, was there a specific criterion for selecting the abundtrim and trimming parameters? I can't imagine how it will biologically affect the results.

did you take a look at https://peerj.com/preprints/890/?

More - this trimming is not important for either sourmash gather or mapping, which are the two primary read-based analyses that genome-grist does. Read mapping is 'other' than k-mer approaches, and sourmash gather is reference based and lightweight so it basically doesn't care if there are lots of erroneous k-mers hanging out in the data set.

However, doing some kind of k-mer abundance trimming is important for cDBG-graph approaches like spacegraphcats. This is because every erroneous k-mer fragments the cDBG.

So it is nice to have genome-grist download the SRA metagenome and preprocess it for "free".

The default parameters in the trim-low-abund specify that only reads with an estimated k-mer coverage of 18 or higher will be trimmed (-Z 18 -V), at a k-mer abundance of 2 or lower (-C 3). There should be no "loss" of k-mers from low-abundance reads, which are important to retain for metagenomes.

We've used these parameters in a lot of publications and they were chosen and evaluated ages ago. I now have a much better intuition (and we have a lot more data and experience!) and I'm not sure there's a strong reason to revisit them now, but I'm game if someone has criteria on which to evaluate them. It'd be reassuring if nothing else ;)

@ctb ctb added the faq FYI/questions you should have asked label Jan 22, 2022
@ctb
Copy link
Member Author

ctb commented Sep 28, 2022

#199 removes abundance trimming from the default genome-grist workflow. Leaving this open 'til I integrate it into docs more better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
faq FYI/questions you should have asked
Projects
None yet
Development

No branches or pull requests

1 participant