-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hash functions supporting k > 32 to Counttable #1511
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1511 +/- ##
=========================================
Coverage ? 69.79%
=========================================
Files ? 66
Lines ? 8966
Branches ? 3059
=========================================
Hits ? 6258
Misses ? 1025
Partials ? 1683
Continue to review full report at Codecov.
|
w00t! This code:
now yields:
so we're finally able to swap out hash functions on Hashtable classes! |
(and reversibility is no longer a requirement on Hashtable derived classes, only on Hashgraph derivatives.) |
So this now works, too:
|
What am I missing here: s = "ATGGATATGGAGGACAAGTATATGGAGGACAAGTATATGGAGGACAAGTAT"
a =khmer.Counttable(33, 1e6, 3)
a.get_kmer_hashes(s[:33]) # -> []
a.get_kmer_hashes(s[:34]) # -> [48584371158645721] |
looks like an off by one error to me (i.e. a bug). will investigate.
|
lib/hashtable.cc
Outdated
} | ||
|
||
|
||
void Hashtable::get_kmer_counts(const std::string &s, | ||
std::vector<BoundedCounterType> &counts) const | ||
{ | ||
KmerIterator kmers(s.c_str(), _ksize); | ||
KmerHashIterator * kmers = new_kmer_iterator(s); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would wrap it in std::unique_ptr< KmerHashIterator > kmers(new_kmer_iterator(s))
and then get rid of the delete
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably also change new_kmer_iterator
to return a unique_ptr directly.
tests/test_cpython_hierarchy.py
Outdated
def test_counttable_no_unhash(): | ||
x = khmer.Counttable(4, 21, 3) | ||
|
||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
->
with pytest.raises(ValueError):
x.reverse_hash(1)
@@ -148,29 +148,9 @@ static bool convert_PyObject_to_HashIntoType(PyObject * value, | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment for this function needs updating (delete last two lines)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
perfect, thanks!
|
done, much cleaner, thanks!
|
Question for all you foolish mortals: how should we handle testing of CPython class hierarchies? Do we just test most of the methods on the base types and then test only the derived methods on derived types, and rely on code coverage to tell us when we're missing something? Or do we massively duplicate tests? Or what? |
I'd probably go for option (a). Another option I like is to devise a set of tests/checks for things that are graph-ish and apply them (via a for loop) to all things that want to be graph-ish (same for table-ish, X-ish). And then you add tests for each type that test what is special about that type. |
I can't work out why travis is ignoring this. According to the logs it is ignoring this "as per configuration" which isn't super helpful for debugging. One thing that might wake travis up is if you change the target of this PR to |
@betatim Consider the following (from our .travis.yml file).
|
yes! that is almost certainly the problem. willfix.
|
when I run the coverage tools locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
* get_raw_tables | ||
* do_subset_partition_with_abundance | ||
|
||
Nodegraph: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why this is under the "counting types" heading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed 8f9e8b0
} catch (khmer_exception &e) { | ||
PyErr_SetString(PyExc_ValueError, e.what()); | ||
return NULL; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
w00t!
@@ -0,0 +1,442 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are all pretty simple and cover the basic features well.
@@ -509,6 +507,53 @@ const | |||
return posns; | |||
} | |||
|
|||
class MurmurKmerHashIterator : public KmerHashIterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be in lib/kmer_hash.cc
(or lib/kmer_hash.hh
)?
tests/test_tabletype.py
Outdated
else: | ||
raise Exception("unknown tabletype") | ||
|
||
z = kh.get('ATGGC') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, this should be loaded.get
shouldn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed dc6b287 - thanks!
On Mon, Feb 13, 2017 at 03:04:36PM -0800, Daniel Standage wrote:
+* get_raw_tables
+* do_subset_partition_with_abundance
+
+Nodegraph:
Not sure why this is under the "counting types" heading.
Fixed 8f9e8b0
|
On Mon, Feb 13, 2017 at 07:09:47PM -0800, Luiz Irber wrote:
luizirber commented on this pull request.
> @@ -509,6 +507,53 @@ const
return posns;
}
+class MurmurKmerHashIterator : public KmerHashIterator
shouldn't this be in `lib/kmer_hash.cc` (or `lib/kmer_hash.hh`)?
it should never be used publicly, and I think we should actually be
hiding the TwoBit iterator. -0 on moving it for now.
Presumably as part of Cythonization it will need to be exposed?
|
On February 14, 2017 9:03:56 AM PST, "C. Titus Brown" ***@***.***> wrote:
On Mon, Feb 13, 2017 at 07:09:47PM -0800, Luiz Irber wrote:
> luizirber commented on this pull request.
>
>
>
> > @@ -509,6 +507,53 @@ const
> return posns;
> }
>
> +class MurmurKmerHashIterator : public KmerHashIterator
>
> shouldn't this be in `lib/kmer_hash.cc` (or `lib/kmer_hash.hh`)?
it should never be used publicly, and I think we should actually be
hiding the TwoBit iterator. -0 on moving it for now.
Presumably as part of Cythonization it will need to be exposed?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1511 (comment)
Not about exposing it externally, I would use that for HLL
|
Not about exposing it externally, I would use that for HLL
oh! cool. I'd rather get this merged first tho :). is ok?
|
On February 14, 2017 9:30:10 AM PST, "C. Titus Brown" ***@***.***> wrote:
> Not about exposing it externally, I would use that for HLL
oh! cool. I'd rather get this merged first tho :). is ok?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#1511 (comment)
That's the idea =]
|
The Mac OS X build is holding this up. Can we merge, pretty please? :) |
This follows from #1504 and adds support for irreversible hash functions (supporting k > 32).
Specifically, in this PR we:
Counttable
to use murmurhash and thus supported k > 32;_Countgraph
, _Counttable
,_SmallCountgraph
,_SmallCounttable
_Nodegraph
, and_Nodetable
;Basically this is many small changes that are now reasonably well tested and understood, taking advantage of the refactoring done in #1504.
make test
Did it pass the tests?make clean diff-cover
If it introduces new functionality inscripts/
is it tested?make format diff_pylint_report cppcheck doc pydocstyle
Is it wellformatted?
without a major version increment. Changing file formats also requires a
major version number increment.
ChangeLog
?http://en.wikipedia.org/wiki/Changelog#Format
changes were made?
tested for streaming IO?)