-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract storage into separate class; split Hashtable into Hashtable and Hashgraph. #1495
Conversation
@betatim @luizirber I am having linker symbol resolution issues. Any suggestions? |
Got it to compile! Comments would be appreciated. Is it worth following through on this and fixing up the load and save code? |
All the tests pass! @betatim @luizirber @standage ready for an initial review! |
Current coverage is 95.80% (diff: 100%)
|
Mostly good 😄 One pedantic comment: all the variables that used to be of type I only spotted one used of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a few minor (also pedantic) comments about the class design. But I must say that being able to make changes of this scale confidently is a HUGE testament to the robustness of the khmer test suite!
std::vector<uint64_t> _tablesizes; | ||
size_t _n_tables; | ||
uint64_t _occupied_bins; | ||
uint64_t _n_unique_kmers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why duplicate these class members? Isn't the idea of inheritance that the base class stores the common members and functions?
friend class Hashbits; | ||
protected: | ||
std::vector<uint64_t> _tablesizes; | ||
size_t _n_tables; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this redundant with _tablesizes.size()
? Not a big deal, but why risk the two variables getting out of sync?
On Thu, Nov 03, 2016 at 11:22:01AM -0700, Tim Head wrote:
this should all have been fixed in #1504. |
This PR separates the storage details of hashes from the actual hash function. It should support both more types of storage (4-bit and two byte CountMin sketches? maybe even exact storage?) and multiple hash functions.
It also separates out table and graph operations, making it easier to split class defn's between functionality that needs reversible hashing (graph operations) and functions that don't (table functions). Ultimately this can lead to new CPython objects (maybe Nodetable and Counttable?) that support k > 32 while the Nodegraph and Countgraph CPython objects would remain limited to k <= 32.
I tried to use templates instead of composition but couldn't get the various friend & inheritance declarations to work. Suggestions welcome.
No discernable performance impact was observed in some basic benchmarking.
Built on #1494. Might be a way to do #1490 (irreversible hashing for k > 32).
make test
Did it pass the tests?make clean diff-cover
If it introduces new functionality inscripts/
is it tested?make format diff_pylint_report cppcheck doc pydocstyle
Is it wellformatted?
without a major version increment. Changing file formats also requires a
major version number increment.
ChangeLog
?http://en.wikipedia.org/wiki/Changelog#Format
changes were made?
tested for streaming IO?)