Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] use picklist to exclude signatures #1623

Merged
merged 7 commits into from
Jun 22, 2021
Merged

[MRG] use picklist to exclude signatures #1623

merged 7 commits into from
Jun 22, 2021

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Jun 22, 2021

Use a picklist to exclude, rather than include, signatures.

Ref #1599

@ctb
Copy link
Contributor

ctb commented Jun 22, 2021

well that is way cleaner syntax than I was thinking about!

(I was noodling on about using ~ somewhere in the picklist args. Don't know why. This is nice!)

@bluegenes bluegenes changed the title [EXP] use picklist to exclude signatures [WIP] use picklist to exclude signatures Jun 22, 2021
src/sourmash/picklist.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 22, 2021

Codecov Report

Merging #1623 (2c2cd15) into latest (0814bcc) will increase coverage by 8.10%.
The diff coverage is 93.10%.

Impacted file tree graph

@@            Coverage Diff             @@
##           latest    #1623      +/-   ##
==========================================
+ Coverage   81.39%   89.49%   +8.10%     
==========================================
  Files         103       76      -27     
  Lines       10549     6864    -3685     
  Branches     1228     1236       +8     
==========================================
- Hits         8586     6143    -2443     
+ Misses       1754      510    -1244     
- Partials      209      211       +2     
Flag Coverage Δ
python 89.49% <93.10%> (+<0.01%) ⬆️
rust ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/sourmash/sourmash_args.py 93.94% <87.50%> (-0.18%) ⬇️
src/sourmash/picklist.py 91.57% <95.23%> (+0.55%) ⬆️
src/core/src/index/sbt/mod.rs
src/core/src/ffi/hyperloglog.rs
src/core/src/sketch/nodegraph.rs
src/core/src/signature.rs
src/core/src/wasm.rs
src/core/src/errors.rs
src/core/src/ffi/nodegraph.rs
src/core/src/ffi/cmd/compute.rs
... and 19 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0814bcc...2c2cd15. Read the comment docs.

@bluegenes
Copy link
Contributor Author

bluegenes commented Jun 22, 2021

The reporting needs a bit of a rethink. For now I have:

def report_picklist(args, picklist):
if picklist.pickstyle == PickStyle.INCLUDE:
notify(f"for given picklist, found {len(picklist.found)} matches to {len(picklist.pickset)} distinct values")
n_missing = len(picklist.pickset - picklist.found)
elif picklist.pickstyle == PickStyle.EXCLUDE:
notify(f"for given picklist, found {len(picklist.found)} matches by excluding {len(picklist.pickset)} distinct values")
n_missing = len(picklist.pickset) - len(picklist.found)
if n_missing:
notify(f"WARNING: {n_missing} missing picklist values.")
if args.picklist_require_all:
error("ERROR: failing because --picklist-require-all was set")
sys.exit(-1)

... but thinking about it, n_missing doesn't really make sense in this context. Plus, because we're excluding rather than including, n_missing can end up negative! However, we do need to make sure picklist_require_all is still considered.

@ctb @luizirber What might be a better reporting strategy for exclude? We're currently not keeping track of which signatures were excluded, just those that we want to keep.

@ctb
Copy link
Contributor

ctb commented Jun 22, 2021

hah! I have no informed idea... I was actually hesitant about the reporting in the first place for related reasons, but it seemed to work out OK and I left it in.

Maybe if it's exclude, omit any n_missing calculations?

@bluegenes
Copy link
Contributor Author

bluegenes commented Jun 22, 2021

This PR enables one way to handle #433: if trying to use gather on a signature in the database and want to ignore the exact match, can exclude any sigs matching the query sig.

@bluegenes bluegenes changed the title [WIP] use picklist to exclude signatures [MRG] use picklist to exclude signatures Jun 22, 2021
@bluegenes
Copy link
Contributor Author

ok, I think this is ready for review! @ctb @luizirber

Copy link
Contributor

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe one additional test, for an explicit :include?

Copy link
Contributor

@ctb ctb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah! added test in 2c2cd15. Nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants