-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] fix bug in LCA_Database.downsample
#2117
Conversation
@ccbaumler can you take on reviewing this? Should be enough to verify that the description of the PR matches the code changes, and that the tests pass. No hurry. Thanks! |
Codecov Report
@@ Coverage Diff @@
## latest #2117 +/- ##
==========================================
+ Coverage 84.30% 91.69% +7.38%
==========================================
Files 130 99 -31
Lines 15280 11004 -4276
Branches 2171 2171
==========================================
- Hits 12882 10090 -2792
+ Misses 2095 611 -1484
Partials 303 303
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
I'm on it! |
@ctb Two questions:
|
and they are optimized for membership checks.
and they are not optimized for a membership check, but rather for iteration (
this is hard to respond to comprehensively, because it enters a whole world of testing philosophy, but here are a few thoughts - (1) a good rule for a FIRST test in a bugfix PR is one that replicates the bug precisely. that all having been said, if you have reason to believe that I'm wrong - especially about point (2) - the ideal path forward would be to describe why you think we need additional tests, or -- even better! -- to write a test that breaks something that should work :). tl;dr we can always write more tests, but it's good to have a specific reason for doing so, however nebulous. |
(great questions!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the insight! LGTM.
I discovered this bug while working on other things -
In the base
LCA_Database
implementation, a dictionary_hashval_to_idx
is used to track which hash values (keys) belong to which set of signatures (individual signatures are referenced by integers,idx
). The dictionary is created as acollections.defaultdict(set)
ref so that when adding a new key, it automatically comes with a new set as a value - this permits the callin
LCA_Database.insert(...)
.The bug is that after
LCA_Database.downsample(...)
is called,_hashval_to_idx
is recreated as a regular dictionary, which causesinsert
to fail.This PR adds a new test,
test_lca.py:test_api_create_insert_two_then_scale_then_add
, that does a downsample and then does an insert. This test exposes the bug above, which the PR also fixes.