-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] add lca DBs as inputs to 'sourmash search' and 'gather' #533
Conversation
Codecov Report
@@ Coverage Diff @@
## master #533 +/- ##
==========================================
- Coverage 88.56% 88.54% -0.03%
==========================================
Files 25 25
Lines 3543 3754 +211
Branches 37 37
==========================================
+ Hits 3138 3324 +186
- Misses 405 430 +25
Continue to review full report at Codecov.
|
* a trial refactoring of the lca db * save and load seem to basically work * got search & gather on LCA working in new framework * allow key error * keep identities with no lineage assigment * add multigather * Hyperlink DOIs to preferred resolver (#562) * fix divide by zero issue in MinHash.contained_by (#572) * fix divide by zero issue in contained_by * remove unused lineages and identifiers * update report output * majority of lca tests passing now! * fix gather? * ...all tests pass?
about the test coverage: we can add a |
assert status != 0 | ||
|
||
|
||
@utils.in_tempdir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this looks like tmpdir
in pytest... https://github.com/dib-lab/sourmash/pull/557/files#diff-9402af88a46ea03a8087cfebdbdb762bR16
and RunnerContext
looks pretty close to a pytext fixture too https://docs.pytest.org/en/latest/fixture.html#fixture
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured they would ;). But I'm happy with the functionality for now and not inclined to fix it; can be adjusted in the future, yah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, the tmpdir
example I linked was a trial on using it instead of our current approach, good to see another option (and eventually refactor it out =P)
I'm working on regenerating databases. Once that's done and I do some performance benchmarking, I think I'd like to suggest merging this (since it's becoming big). |
sourmash/commands.py
Outdated
query.minhash = query.minhash.downsample_scaled(args.scaled) | ||
|
||
# empty? | ||
if not query.minhash.get_mins(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not query.minhash.get_mins(): | |
if not len(query.minhash): |
I'll review in detail on Tue, but agree on merging after perf bench |
Ready for review & merge! @luizirber the LCA DBs are here on the HPC:
|
Latest databases available here for download: https://osf.io/vk4fa/files/. Will update documentation appropriately. |
try: | ||
record_remnants.remove(ident) | ||
except KeyError: | ||
# @CTB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is this a new pattern? If you pass
an exception, you need to sign it to be made responsible at some point in the future? =]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, but merge at will!
This lets 'search' and 'gather' take LCA DBs as well as SBTs and signatures as input; adds 'multigather' command.
Note that results on SBT and LCA databases for the shakya unassigned contigs are nearly but not completely identical. Trying to decide if this needs to be tracked down...
For merge purposes:
make test
Did it pass the tests?make coverage
Is the new code covered?without a major version increment. Changing file formats also requires a
major version number increment.
changes were made?
Command used to construct LCA DBs from SBT:
New LCA DB size:
Benchmarking: