Extract storage into separate class; split Hashtable into Hashtable and Hashgraph. #1495

ctb · 2016-10-30T12:45:38Z

This PR separates the storage details of hashes from the actual hash function. It should support both more types of storage (4-bit and two byte CountMin sketches? maybe even exact storage?) and multiple hash functions.

It also separates out table and graph operations, making it easier to split class defn's between functionality that needs reversible hashing (graph operations) and functions that don't (table functions). Ultimately this can lead to new CPython objects (maybe Nodetable and Counttable?) that support k > 32 while the Nodegraph and Countgraph CPython objects would remain limited to k <= 32.

I tried to use templates instead of composition but couldn't get the various friend & inheritance declarations to work. Suggestions welcome.

No discernable performance impact was observed in some basic benchmarking.

Built on #1494. Might be a way to do #1490 (irreversible hashing for k > 32).

Is it mergeable?
make test Did it pass the tests?
make clean diff-cover If it introduces new functionality in
scripts/ is it tested?
make format diff_pylint_report cppcheck doc pydocstyle Is it well
formatted?
Did it change the command-line interface? Only additions are allowed
without a major version increment. Changing file formats also requires a
major version number increment.
Is it documented in the ChangeLog?
http://en.wikipedia.org/wiki/Changelog#Format
Was a spellchecker run on the source code and documentation after
changes were made?
Do the changes respect streaming IO? (Are they
tested for streaming IO?)
Is the Copyright year up to date?

ctb · 2016-10-30T13:44:29Z

@betatim @luizirber I am having linker symbol resolution issues. Any suggestions?

ctb · 2016-10-30T22:26:51Z

Got it to compile!

Comments would be appreciated. Is it worth following through on this and fixing up the load and save code?

ctb · 2016-11-01T04:02:14Z

All the tests pass!

@betatim @luizirber @standage ready for an initial review!

codecov-io · 2016-11-01T04:14:39Z

Current coverage is 95.80% (diff: 100%)

No coverage report found for master at 4e5a234.

Powered by Codecov. Last update 4e5a234...25179ed

ctb · 2016-11-03T12:13:15Z

Ready for review! @betatim @standage

betatim · 2016-11-03T18:22:01Z

Mostly good 😄

One pedantic comment: all the variables that used to be of type Hashtable but now are Hashgraphs are still called hashtable :-/ It'll start getting confusing when Hashtables are used more why the variable called hashtable actually contains a graph...

I only spotted one used of Hashtable directly, everyone else goes through Hashgraph. Makes me wonder if the other stop_bf related things should continue to be Hashtables as well. This is where I'll resume the reviewing.

standage

I made a few minor (also pedantic) comments about the class design. But I must say that being able to make changes of this scale confidently is a HUGE testament to the robustness of the khmer test suite!

standage · 2016-11-03T18:39:44Z

lib/storage.hh

+    std::vector<uint64_t> _tablesizes;
+    size_t _n_tables;
+    uint64_t _occupied_bins;
+    uint64_t _n_unique_kmers;


Why duplicate these class members? Isn't the idea of inheritance that the base class stores the common members and functions?

standage · 2016-11-03T18:40:53Z

lib/storage.hh

+friend class Hashbits;
+protected:
+    std::vector<uint64_t> _tablesizes;
+    size_t _n_tables;


Isn't this redundant with _tablesizes.size()? Not a big deal, but why risk the two variables getting out of sync?

ctb · 2016-11-03T19:39:07Z

@standage, I'll fix your points on the next PR, which is forthcoming.

@betatim, the inheritance hierarchy will expand and I will do a more thorough review then.

Thanks all :).

ctb · 2016-11-14T13:45:32Z

On Thu, Nov 03, 2016 at 11:22:01AM -0700, Tim Head wrote:

One pedantic comment: all the variables that used to be of type Hashtable but now are Hashgraphs are still called hashtable :-/ It'll start getting confusing when Hashtables are used more why the variable called hashtable actually contains a graph...

this should all have been fixed in #1504.

ctb added 3 commits October 30, 2016 06:39

split Hashgraph from Hashtable

ef927da

refactored to split storage from main classes

1ebaa68

minor cleanup

2d023b4

ctb added 2 commits October 30, 2016 15:11

clean up

23293b4

properly initialize _max_count, re-enable add fn

1a99727

ctb added 4 commits October 31, 2016 07:13

Merge branch 'mergeXXX' into refactor/storage

a6234d5

remove some debugging prints

b518995

Merge remote-tracking branch 'origin/master' into refactor/storage

7ea8cee

further update from merge

df8fff9

ctb changed the base branch from refactor/extract_hash to master October 31, 2016 14:42

ctb added 12 commits October 31, 2016 07:43

Merge branch 'master' of github.com:dib-lab/khmer into refactor/storage

23e03cd

move storage stuff into single file

168e21f

cleanup & simplification of Storage classes

47373b8

inline more stuff

7fd947e

compiles but crashes. hmm

9374b52

used dynamic_cast on the appropriate object

97d2243

consolidated save/load code; made Hashbits save/load work

adebf02

shifted save functions over to ByteStorage

fd6c42c

fix save/load behavior

b7d2c46

fix bigcount tests

0b45cf8

cleanup

96d8bb2

remove commented out code

7fd08f8

added missing storage.cc

e48256a

This was referenced Nov 2, 2016

Hash Function Arch Refactoring #1450

Closed

Make Read accessible from python #1492

Merged

add copyright headers

7971dbf

ctb added 4 commits November 3, 2016 02:51

Merge branch 'master' into refactor/storage

f69ad78

updated ChangeLog

bcf4752

updated copyright

7f27d99

some more cleanup

8c121ba

update lib/Makefile with storage.*

25179ed

standage approved these changes Nov 3, 2016

View reviewed changes

ctb merged commit 34711f6 into master Nov 3, 2016

ctb deleted the refactor/storage branch November 3, 2016 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract storage into separate class; split Hashtable into Hashtable and Hashgraph. #1495

Extract storage into separate class; split Hashtable into Hashtable and Hashgraph. #1495

ctb commented Oct 30, 2016 •

edited

Loading

ctb commented Oct 30, 2016

ctb commented Oct 30, 2016

ctb commented Nov 1, 2016

codecov-io commented Nov 1, 2016 •

edited

Loading

ctb commented Nov 3, 2016

betatim commented Nov 3, 2016

standage left a comment

standage Nov 3, 2016

standage Nov 3, 2016

ctb commented Nov 3, 2016

ctb commented Nov 14, 2016

Extract storage into separate class; split Hashtable into Hashtable and Hashgraph. #1495

Extract storage into separate class; split Hashtable into Hashtable and Hashgraph. #1495

Conversation

ctb commented Oct 30, 2016 • edited Loading

ctb commented Oct 30, 2016

ctb commented Oct 30, 2016

ctb commented Nov 1, 2016

codecov-io commented Nov 1, 2016 • edited Loading

Current coverage is 95.80% (diff: 100%)

ctb commented Nov 3, 2016

betatim commented Nov 3, 2016

standage left a comment

Choose a reason for hiding this comment

standage Nov 3, 2016

Choose a reason for hiding this comment

standage Nov 3, 2016

Choose a reason for hiding this comment

ctb commented Nov 3, 2016

ctb commented Nov 14, 2016

ctb commented Oct 30, 2016 •

edited

Loading

codecov-io commented Nov 1, 2016 •

edited

Loading