-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deal with k-mers containing non-ACTG sensibly #394
Comments
A handful of options:
I would recommend deliberately ignoring (failing to hash) all kmers containing N and throwing an error for IUPAC nucleotide symbols other than N. |
On Tue, Jul 22, 2014 at 09:48:57AM -0700, Will Trimble wrote:
sounds good to me! this would need to be a 2.0 thing tho. |
Conversation continued (with more specificity) in #1036. Closing this 'un; we're not handling IUPAC anytime soon :) |
Should we have a separate list of requested but ignored features? |
Or there could be a label we could apply 'not-planned' |
Surely the search function can be used for this? |
-0
|
In particular, what do we do with 'N's? (See Aaron Liston e-mail to khmer@lists.idyll.org list, 30 Jan 2014). This is a systemic flaw in khmer currently and needs to be addressed at a fairly fundamental level, perhaps in the hash function (which could simply be extended to deal with arbitrary ch, or ACTGactgNn, or...), but it needs to be thought through in terms of implications. Yech.
Definitely a 3.0 kinda issue.
See also #370.
The text was updated successfully, but these errors were encountered: