You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each scenario implements its own way of converting the fractional number to a whole number that can be used for sampling.
Poisson sampling method: sample from a Poisson distribution with the mean $\lambda$ being the fractional number of sequences per group which is constant across all groups.
Probabilistic rounding method: round the number probabilistically by adding a random number between [0,1) and truncating the decimal part. The Poisson sampling method does not work for weighted sampling because the fractional number of sequences per group is not guaranteed to be constant across all groups.
Replace the Poisson sampling method with the probabilistic rounding method. A notable difference: with the Poisson sampling method, there is a slim chance for 2 or more sequences per group. That would not happen with the probabilistic rounding method. I think that is fine and even preferred because it avoids the possibility of under-sampling.
This can be explained through example: --group-by month with 12 months and --subsample-max-sequences 10. This means the fractional number of sequences per group should be¹$\frac{10}{12} \approx 0.83$. For each of the 12 months we need a whole number of sequences per group.
Probabilistic rounding would be: 0.83 has a 83% chance of rounding to 1 and a 17% chance of rounding to 0.
The text was updated successfully, but these errors were encountered:
Probabilistic sampling is necessary when targeting a fractional number of sequences per group. This is possible in two scenarios:
Uniform sampling when the number of groups exceeds the number of total requested sequences
Weighted sampling when this expression evaluates to a fractional number:
Each scenario implements its own way of converting the fractional number to a whole number that can be used for sampling.
Poisson sampling method: sample from a Poisson distribution with the mean$\lambda$ being the fractional number of sequences per group which is constant across all groups.
augur/augur/filter/subsample.py
Line 299 in 47c83e0
Probabilistic rounding method: round the number probabilistically by adding a random number between [0,1) and truncating the decimal part. The Poisson sampling method does not work for weighted sampling because the fractional number of sequences per group is not guaranteed to be constant across all groups.
augur/augur/filter/subsample.py
Line 452 in 47c83e0
Proposed change
Replace the Poisson sampling method with the probabilistic rounding method. A notable difference: with the Poisson sampling method, there is a slim chance for 2 or more sequences per group. That would not happen with the probabilistic rounding method. I think that is fine and even preferred because it avoids the possibility of under-sampling.
This can be explained through example:$\frac{10}{12} \approx 0.83$ . For each of the 12 months we need a whole number of sequences per group.
--group-by month
with 12 months and--subsample-max-sequences 10
. This means the fractional number of sequences per group should be¹Probabilistic rounding would be: 0.83 has a 83% chance of rounding to 1 and a 17% chance of rounding to 0.
The text was updated successfully, but these errors were encountered: