-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add get_kmers() and get_kmer_counts() functions #1049
Conversation
|
ready for review & merge @luizirber @camillescott |
…ounts Conflicts: ChangeLog
…ounts Conflicts: ChangeLog
std::vector<std::string> &kmers_vec) const | ||
{ | ||
if (s.length() < _ksize) { | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other cases where the sequence is shorter than K, we raise an exception; is this a case of letting an error pass silently?
I like |
Agree in theory. In practice I've been writing it in Python a lot so took the opportunity to implement it here :). Can we make it more independent of underlying representation (ie not vector of strings)? But either way Python will need the whole copy thing... Ergh. We could have it do the hashing to numbers, maybe? Titus Brown, ctbrown@ucdavis.edu
|
To clarify: the value of |
(note: that doesn't actually fix the problem for screed; we should add a similar iterator to screed records as well. after all, our speciality is k-mer analysis ;)) |
Moar: see https://docs.python.org/3.4/c-api/buffer.html#complex-arrays; I believe we can use the strides parameter to avoid copying the underlying data while returning views on the string. Alternatively, I can put away the clippers and just merge it... |
Added get_kmer_hashes(); I think for now we should leave in get_kmers(), and if it becomes a performance issue we can revisit. (I don't like the idea of adding a lot of complexity around something that's new and unused!). Remaining issue is whether or not get_kmers(), get_kmer_hashes(), and get_kmer_counts() should error out on strings of length < len(ksize). Since we're returning lists, I think it's OK to just return an empty list, as opposed to raising an error, which is what we should do when the return value is nonsensical. By this logic, things like 'consume' should not error out, but we can leave that for a different PR. @camillescott review code & logic? :) @luizirber any comments on the Python 3 implications raised in 6de148b? |
I had it almost fixed on refactor/py3 branch: 'Almost' because:
I noticed because one of the tests was calling hashtable.get() with I don't remember seeing missing else statements, but might be worth to |
On Fri, Jun 05, 2015 at 07:21:15AM -0700, Luiz Irber wrote:
well, and no test to make sure it didn't happen again!
yep :)
I checked in _khmermodule.cc, nothing else there. |
ping @camillescott ready for review and merge |
I still find the implementation very distasteful, but I'll give it an LGTM |
...and merge. AND MERGE... pretty please? |
I will fix the conflict and merge. |
tnx On Tue, Jun 09, 2015 at 07:59:38AM -0700, Luiz Irber wrote:
|
Add get_kmers() and get_kmer_counts() functions
|
||
hi.consume("AAAAAA") | ||
counts = hi.get_kmer_counts("A") | ||
assert len(counts) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdmurray91 points out that this appears to be a repeated test.
Closes #1047.