More consolidation of Hashtable derived types. #1504

ctb · 2016-11-03T19:41:12Z

Briefly, this PR:

interposes a new Hashgraph CPython class that inherits from Hashtable and becomes a new base for Countgraph and Nodegraph (in the CPython interface).
articulates the C++ internals so that they use Hashtable and Hashgraph appropriately (as well as Countgraph etc. where absolutely necessary).
renames the C++ and CPython API to match: Counttable, Nodetable, Countgraph, Nodegraph.

These changes make a clear distinction between 'tables' and 'graphs' - tables have all of
the basic functionality needed for counting, while graphs support various traversal methods.
This paves the way for:

(a) adding new hash functions, including irreversible ones supporting k > 32 for the *table objects; and
(b) building out a new Counttable CPython object that will support the non-graph operations for k > 32.

(I don't like the 'Counttable' name that much but it fits with Countgraph. I suppose we could do Countstable. But that might engender confusion. Thoughts?)

Is it mergeable?
make test Did it pass the tests?
make clean diff-cover If it introduces new functionality in
scripts/ is it tested?
make format diff_pylint_report cppcheck doc pydocstyle Is it well
formatted?
Did it change the command-line interface? Only additions are allowed
without a major version increment. Changing file formats also requires a
major version number increment.
Is it documented in the ChangeLog?
http://en.wikipedia.org/wiki/Changelog#Format
Was a spellchecker run on the source code and documentation after
changes were made?
Do the changes respect streaming IO? (Are they
tested for streaming IO?)
Is the Copyright year up to date?

codecov-io · 2016-11-03T21:11:18Z

Current coverage is 95.76% (diff: 100%)

Merging #1504 into master will increase coverage by 0.02%

@@             master      #1504   diff @@
==========================================
  Files            36         36          
  Lines          2938       2952    +14   
  Methods           0          0          
  Messages          0          0          
  Branches        449        449          
==========================================
+ Hits           2813       2827    +14   
  Misses           55         55          
  Partials         70         70

Powered by Codecov. Last update e2d1132...6303cb8

* rename CountingHash to Countgraph throughout * rename Hashbits to Nodegraph throughout

ctb · 2016-11-08T23:39:39Z

Ready for initial review, y'all. @betatim @luizirber @camillescott @standage. Might be easier to go commit by commit :(.

Still have to check out my modifications to the abundance dist functions, and I'm not sure if I should add tests for Counttable and Nodetable on this PR, but I think most of it is done.

…-merge-storage2

betatim · 2016-11-10T15:55:50Z

Prefer Counttable over Countstable. More substantial comments later

ctb · 2016-11-10T23:22:22Z

look into using our saner inheritance hierarchy to simplify hashtable extraction in the CPython code.

…khmer into feature/assembly/junction_count-merge-storage2

…-merge-storage2 A reconciliation branch for the storage/hashgraph refactoring & junction count stuff.

…actor/storage2

betatim · 2016-11-14T17:01:30Z

khmer/_cpy_counttable.hh

+    Counttable * counttable;
+} khmer_KCounttable_Object;
+
+static PyMethodDef khmer_counttable_methods[] = {


Do we need this here if it is just empty?

I went both ways on this - it's nice to have it there for if/when we add methods... but yeah, I guess no need.

betatim · 2016-11-14T17:02:11Z

khmer/_cpy_hashgraph.hh

+    0,              /*tp_setattro*/
+    0,              /*tp_as_buffer*/
+    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,       /*tp_flags*/
+    "hashgraph object",           /* tp_doc */


Delete all the following lines

betatim · 2016-11-14T17:03:04Z

khmer/_cpy_hashgraph.hh

+    0,                       /* tp_new */
+};
+
+#define is_hashgraph_obj(v)  (Py_TYPE(v) == &khmer_KHashgraph_Type)


betatim · 2016-11-14T17:11:13Z

lib/hashgraph.cc

+                ++since;
+            }
+        }
+#else


Should we keep this around or delete?

I'd like to see these bits of code detritus go, but I think perhaps that should get its own PR -- there's a fair amount of it.

betatim · 2016-11-14T17:34:09Z

lib/hashtable.cc


-    // Iterate through the reads and consume their k-mers.
-    while (!parser->is_complete( )) {
+    BoundedCounterType max_count = 0;


If you move this up to be the first definition in the method you (probably) don't have to pay for a copy when returning it.

OK, but I don't understand why :)

https://en.wikipedia.org/wiki/Return_value_optimization it probably isn't that exciting in this case but for other/bigger/expensive to copy types it might be more interesting

betatim · 2016-11-14T18:03:31Z

I assume that the substantial bits of code that were moved didn't get modified when you moved them because the tests still pass.

betatim · 2016-11-15T12:44:56Z

khmer/_cpy_counttable.hh

+
+    if (self != NULL) {
+        WordLength k = 0;
+        PyListObject * sizes_list_o = NULL;


Do you know who owns the reference to the list that comes from PyArg_ParseTuple?

PyArg_ParseTuple doesn't increase the reference count, and presumably the ref count can't decrease to zero while we are using an argument to this function.

Yep (or at least not with the GIL held).

ctb · 2016-11-15T13:57:09Z

Let's ahead and push the magic merge button when the build ends! (but please don't squash commits :)

camillescott · 2016-11-15T18:08:57Z

Doh, I was too late :(

ctb · 2016-11-15T18:10:12Z

wassup @camillescott?

ctb added 6 commits November 3, 2016 14:59

pulled CountingHash methods back to Hashtable; removed counting.*

8295782

removed hashbits.{cc,hh}

9991ecc

pulled most Countgraph functions back to base in cPython interface

bf9f899

cleanup and documentation

7371492

rename CountingHash file class to ByteStorage

7132983

Merge branch 'master' of github.com:dib-lab/khmer into refactor/storage2

a6040c4

ctb added 10 commits November 7, 2016 15:54

Merge branch 'master' into refactor/storage2

30e1ba4

added stub new type derived from Hashbits

d28d38b

interpose Hashgraph type

ac0cc3a

move graph-specific methods into KHashgraph_Type

29e42ed

moved all the graph-specific methods onto hashgraph

22942f6

moved methods back to running on Hashtables where possible

d021c05

properly updated the inheritance hierarchy

a681366

extract tablesizes parsing code, add test for bad tablesizes list

c9fc7c9

introduced Counttable and Nodegraph type data structures

a7213b1

added new Nodetable and Counttable CPython types

c52c88a

ctb mentioned this pull request Nov 8, 2016

Rename CountingHash to Countgraph and Hashbits to Nodegraph #1506

Merged

ctb added 4 commits November 8, 2016 23:02

Rename CountingHash to Countgraph and Hashbits to Nodegraph (#1506)

bc48a6f

* rename CountingHash to Countgraph throughout * rename Hashbits to Nodegraph throughout

rename hashbits and counting objects too

8a171d4

Merge branch 'master' of github.com:dib-lab/khmer into refactor/storage2

5a532e3

updated changelog

4ebc47f

ctb added 6 commits November 8, 2016 23:40

fix ChangeLog indent

6c33e2d

some minimal documentation for hashgraph methods

6096ab5

fixed up ChangeLog a bit more

12fc7b4

added (c) headers to _cpy* include files

95ed9f6

move graph classes into hashgraph.cc,hh

c4e2318

Merge branch 'refactor/storage2' into feature/assembly/junction_count…

a6f896d

…-merge-storage2

This was referenced Nov 10, 2016

A reconciliation branch for the storage/hashgraph refactoring & junction count stuff. #1508

Merged

Junction count assembly & CPython wrappers for assemblers #1503

Merged

ctb added 5 commits November 10, 2016 17:03

Merge branch 'feature/assembly/junction_count' of github.com:dib-lab/…

651519a

…khmer into feature/assembly/junction_count-merge-storage2

Merge pull request #1508 from dib-lab/feature/assembly/junction_count…

07fc51d

…-merge-storage2 A reconciliation branch for the storage/hashgraph refactoring & junction count stuff.

Merge branch 'master' into refactor/storage2

e2f58f1

forgot to update one call to hashing fn in _khmer

ad1d7b9

Merge branch 'refactor/storage2' of github.com:dib-lab/khmer into ref…

b9ff3fc

…actor/storage2

This was referenced Nov 12, 2016

Add hash functions supporting k > 32 to Counttable #1511

Merged

Is differential code coverage still being calculated? #1512

Closed

Extract storage into separate class; split Hashtable into Hashtable and Hashgraph. #1495

Merged

betatim reviewed Nov 14, 2016

View reviewed changes

khmer/_cpy_hashgraph.hh

0, /* tp_new */

};

#define is_hashgraph_obj(v) (Py_TYPE(v) == &khmer_KHashgraph_Type)

Copy link

Member

betatim Nov 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete me

betatim reviewed Nov 14, 2016

View reviewed changes

betatim approved these changes Nov 14, 2016

View reviewed changes

betatim reviewed Nov 15, 2016

View reviewed changes

ctb added 2 commits November 15, 2016 05:39

minor code cleanup

e112c9e

Merge branch 'master' of github.com:dib-lab/khmer into refactor/storage2

cda6c5f

Merge branch 'master' of github.com:dib-lab/khmer into refactor/storage2

6303cb8

betatim merged commit a375468 into master Nov 15, 2016

standage deleted the refactor/storage2 branch November 15, 2016 18:22

This was referenced Dec 20, 2016

Rename Hashbits and CountingHash at C++ level #1233

Closed

Split _khmer.cc up using #includes #1497

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More consolidation of Hashtable derived types. #1504

More consolidation of Hashtable derived types. #1504

ctb commented Nov 3, 2016 •

edited

Loading

codecov-io commented Nov 3, 2016 •

edited

Loading

ctb commented Nov 8, 2016

betatim commented Nov 10, 2016

ctb commented Nov 10, 2016

betatim Nov 14, 2016

ctb Nov 15, 2016

betatim Nov 14, 2016

betatim Nov 14, 2016

betatim Nov 14, 2016

camillescott Nov 15, 2016

betatim Nov 14, 2016

ctb Nov 15, 2016

betatim Nov 15, 2016

betatim commented Nov 14, 2016

betatim Nov 15, 2016

betatim Nov 15, 2016

ctb Nov 15, 2016

ctb commented Nov 15, 2016

camillescott commented Nov 15, 2016

ctb commented Nov 15, 2016

More consolidation of Hashtable derived types. #1504

More consolidation of Hashtable derived types. #1504

Conversation

ctb commented Nov 3, 2016 • edited Loading

codecov-io commented Nov 3, 2016 • edited Loading

Current coverage is 95.76% (diff: 100%)

ctb commented Nov 8, 2016

betatim commented Nov 10, 2016

ctb commented Nov 10, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betatim commented Nov 14, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ctb commented Nov 15, 2016

camillescott commented Nov 15, 2016

ctb commented Nov 15, 2016

ctb commented Nov 3, 2016 •

edited

Loading

codecov-io commented Nov 3, 2016 •

edited

Loading