-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1798: implementation of function to test if cardinality estimate is accurate wrt scale size #2031
Conversation
…cale and num_sketches
Thanks @dkoslicki!! Quick question -- how does |
@bluegenes It shouldn't impact things at all. As I understand it, sourmash uses the denominator of the fraction (so 100 means 1/100th of the size of the original set), whereas I use the actual fraction (so 1/100). My way can lead to non-int estimates of set sizes (as can the sourmash approach, if it doesn't enforce |
ah, makes sense -- thanks! |
tagging @ctb as requested -- cardinality estimate accuracy eqns (🎉 !) |
Codecov Report
@@ Coverage Diff @@
## latest #2031 +/- ##
==========================================
+ Coverage 84.15% 91.67% +7.52%
==========================================
Files 129 98 -31
Lines 15087 10807 -4280
Branches 2119 2119
==========================================
- Hits 12696 9907 -2789
+ Misses 2095 604 -1491
Partials 296 296
Flags with carried forward coverage won't be shown. Click here to find out more. Continue to review full report at Codecov.
|
hi @bluegenes I kind of like having these in utils as well as in the running code base as in #2032 (comment). So I'm good for merge :). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thanks @dkoslicki!
Fixes #1798 in part: implements a function that tells you if your sketch size is large enough in order to be able to trust
num_hashes * scale
as an estimate of the number of distinct k-mersThis is placed in the
Utils
folder since @bluegenes will likely want to mover it elsewhere. Tests are in the script itself: run in main.Note that this does not fully fix #1798 since we still need to be sure the de-biasing term is included. A future PR will address this.
If you are a new contributor, please provide
My ORCID: 0000-0002-0640-954X.