Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use exact fractional sequences per group #1599

Merged
merged 2 commits into from
Sep 3, 2024

Conversation

victorlin
Copy link
Member

@victorlin victorlin commented Aug 23, 2024

Description of proposed changes

The previous _calculate_fractional_sequences_per_group() was an approximation of this exact value. The approximation could return a fractional value above 1, which would fail the assertion in get_probabilistic_group_sizes().

Related issue(s)

Checklist

  • Automated checks pass
  • Check if you need to add a changelog message
  • Check if you need to add tests
  • Check if you need to update docs

@victorlin victorlin self-assigned this Aug 23, 2024
@corneliusroemer
Copy link
Member

Thanks! Is the linked issue correct? Shouldn't it close #1598? Rather than #1588

@victorlin
Copy link
Member Author

Thanks, it should close both. I've updated the PR description.

Copy link

codecov bot commented Aug 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.99%. Comparing base (47c83e0) to head (d73dbee).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1599      +/-   ##
==========================================
- Coverage   71.02%   70.99%   -0.04%     
==========================================
  Files          79       79              
  Lines        8256     8247       -9     
  Branches     2003     2001       -2     
==========================================
- Hits         5864     5855       -9     
  Misses       2101     2101              
  Partials      291      291              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@corneliusroemer
Copy link
Member

There's a reason the original calculation is the way it is. I don't think that's a bug or unnecessary.

Imagine there are 90 groups with 1 sequence and 10 groups with 1000.
We want to sample 1000 sequences.

The original calculation would pick around 91 sequences per group.

Yours now picks 10 per group.

The original resulted in around 1000 sampled sequences.

Your calculation now in only 190.

@victorlin
Copy link
Member Author

@corneliusroemer that scenario is not affected by the changes here. I've responded in more detail at #1588 (comment)

@victorlin victorlin marked this pull request as ready for review August 23, 2024 18:13
@victorlin victorlin force-pushed the victorlin/use-exact-fractional-spg branch from f7ec437 to d73dbee Compare August 29, 2024 18:54
The previous `_calculate_fractional_sequences_per_group()` was an
approximation of this exact value. The approximation could return a
fractional value above 1, which would fail the assertion in
`get_probabilistic_group_sizes()`.
@victorlin victorlin force-pushed the victorlin/use-exact-fractional-spg branch from d73dbee to f1e09e1 Compare September 3, 2024 17:59
@victorlin
Copy link
Member Author

I meant to release this as part of 25.4.0 but forgot to merge 🤦 it will come in the next release.

@victorlin victorlin merged commit f1d65fb into master Sep 3, 2024
27 of 28 checks passed
@victorlin victorlin deleted the victorlin/use-exact-fractional-spg branch September 3, 2024 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Subsample has uncaught assertion in ncov Simplify probabilistic sampling calculation
3 participants