Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More consolidation of Hashtable derived types. #1504

Merged
merged 34 commits into from
Nov 15, 2016
Merged

Conversation

ctb
Copy link
Member

@ctb ctb commented Nov 3, 2016

Briefly, this PR:

  • interposes a new Hashgraph CPython class that inherits from Hashtable and becomes a new base for Countgraph and Nodegraph (in the CPython interface).
  • articulates the C++ internals so that they use Hashtable and Hashgraph appropriately (as well as Countgraph etc. where absolutely necessary).
  • renames the C++ and CPython API to match: Counttable, Nodetable, Countgraph, Nodegraph.

These changes make a clear distinction between 'tables' and 'graphs' - tables have all of
the basic functionality needed for counting, while graphs support various traversal methods.
This paves the way for:

(a) adding new hash functions, including irreversible ones supporting k > 32 for the *table objects; and
(b) building out a new Counttable CPython object that will support the non-graph operations for k > 32.

(I don't like the 'Counttable' name that much but it fits with Countgraph. I suppose we could do Countstable. But that might engender confusion. Thoughts?)

  • Is it mergeable?
  • make test Did it pass the tests?
  • make clean diff-cover If it introduces new functionality in
    scripts/ is it tested?
  • make format diff_pylint_report cppcheck doc pydocstyle Is it well
    formatted?
  • Did it change the command-line interface? Only additions are allowed
    without a major version increment. Changing file formats also requires a
    major version number increment.
  • Is it documented in the ChangeLog?
    http://en.wikipedia.org/wiki/Changelog#Format
  • Was a spellchecker run on the source code and documentation after
    changes were made?
  • Do the changes respect streaming IO? (Are they
    tested for streaming IO?)
  • Is the Copyright year up to date?

@codecov-io
Copy link

codecov-io commented Nov 3, 2016

Current coverage is 95.76% (diff: 100%)

Merging #1504 into master will increase coverage by 0.02%

@@             master      #1504   diff @@
==========================================
  Files            36         36          
  Lines          2938       2952    +14   
  Methods           0          0          
  Messages          0          0          
  Branches        449        449          
==========================================
+ Hits           2813       2827    +14   
  Misses           55         55          
  Partials         70         70          

Powered by Codecov. Last update e2d1132...6303cb8

@ctb
Copy link
Member Author

ctb commented Nov 8, 2016

Ready for initial review, y'all. @betatim @luizirber @camillescott @standage. Might be easier to go commit by commit :(.

Still have to check out my modifications to the abundance dist functions, and I'm not sure if I should add tests for Counttable and Nodetable on this PR, but I think most of it is done.

@betatim
Copy link
Member

betatim commented Nov 10, 2016

Prefer Counttable over Countstable. More substantial comments later

@ctb
Copy link
Member Author

ctb commented Nov 10, 2016

  • look into using our saner inheritance hierarchy to simplify hashtable extraction in the CPython code.

Counttable * counttable;
} khmer_KCounttable_Object;

static PyMethodDef khmer_counttable_methods[] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this here if it is just empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went both ways on this - it's nice to have it there for if/when we add methods... but yeah, I guess no need.

0, /*tp_setattro*/
0, /*tp_as_buffer*/
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /*tp_flags*/
"hashgraph object", /* tp_doc */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete all the following lines

0, /* tp_new */
};

#define is_hashgraph_obj(v) (Py_TYPE(v) == &khmer_KHashgraph_Type)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete me

++since;
}
}
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this around or delete?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see these bits of code detritus go, but I think perhaps that should get its own PR -- there's a fair amount of it.


// Iterate through the reads and consume their k-mers.
while (!parser->is_complete( )) {
BoundedCounterType max_count = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you move this up to be the first definition in the method you (probably) don't have to pay for a copy when returning it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but I don't understand why :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://en.wikipedia.org/wiki/Return_value_optimization it probably isn't that exciting in this case but for other/bigger/expensive to copy types it might be more interesting

@betatim
Copy link
Member

betatim commented Nov 14, 2016

I assume that the substantial bits of code that were moved didn't get modified when you moved them because the tests still pass.


if (self != NULL) {
WordLength k = 0;
PyListObject * sizes_list_o = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know who owns the reference to the list that comes from PyArg_ParseTuple?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyArg_ParseTuple doesn't increase the reference count, and presumably the ref count can't decrease to zero while we are using an argument to this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep (or at least not with the GIL held).

@ctb
Copy link
Member Author

ctb commented Nov 15, 2016

Let's ahead and push the magic merge button when the build ends! (but please don't squash commits :)

@betatim betatim merged commit a375468 into master Nov 15, 2016
@camillescott
Copy link
Member

Doh, I was too late :(

@ctb
Copy link
Member Author

ctb commented Nov 15, 2016

wassup @camillescott?

@standage standage deleted the refactor/storage2 branch November 15, 2016 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants