Add KS tests for weighted sampling #1530

dhardy · 2024-11-18T19:55:47Z

Added a CHANGELOG.md entry

Motivation

Some of these are non-trivial distributions we didn't really test before.

To validate solution of #1476.

Details

Single-element weighted sampling is simple enough.

fn choose_two_iterator is also simple enough: there are no weights, so we can just assign each pair of results a unique index in the list of 100 * 99 / 2 possibilities (nothing that we sort pairs since the order of chosen elements is not specified).

fn choose_two_weighted_indexed gets a bit more complicated; I choose to approach it by building a table for the CDF of size num*num including impossible variants. Most of the tests don't pass, so there must be a mistake here.

Aside: using let key = rng.random::<f64>().ln() / weight; (src/seq/index.rs:392) may help with #1476 but does not fix the above.

Some failures

dhardy · 2024-11-19T09:31:07Z

I can confirm that choose_multiple_weighted has a significant problem, since sampling two elements from 0, 1, 2 with weights 1, 1/2, 1/3 a million times and sorting yields 532298 counts of (0, 1), 338524 counts of (0, 2) and 129178 counts of (1, 2). (Unlike #1476, this example does not require very small weights.)

This is sampling without replacement, so expected samples are:

(0,1) or (1, 0): 531818
(0, 2) or (2, 0): 339393
(1, 2) or (2, 1): 128788

…ust-random#1476 Also improves choose_two_weighted_indexed time by 23% (excluding new test)

Approx 2% improvement to tests sampling 2 of 100 elements

This results in approx 18% faster tests choosing 2-in-100 items

dhardy · 2024-11-19T11:04:01Z

I fixed my calculation of the CDF, found a variant which failed like #1476, fixed this by taking the logarithm of keys, and applied some optimisation to the Efraimidis-Spirakis algorithm.

dhardy added 10 commits November 18, 2024 17:06

Add KS test for WeightedIndex

2e28810

Add KS test for WeightedAliasIndex

eb1836b

Add KS test for WeightedTreeIndex

a8ce256

Add KS test for IndexedRandom::choose_weighted

9e03a15

Add KS test for IndexedRandom::choose_multiple_weighted (one element)

4b0a296

Add KS test for IndexedRandom::choose_multiple_weighted (two elements)

2584212

Some failures

Add KS test for IteratorRandom::choose

336ddbe

Add KS test for IteratorRandom::choose_stable

2ef212b

Add KS test for IteratorRandom::choose_multiple_fill (two elements)

a0908bc

Fix cdf for choose_two_weighted_indexed

865aba2

dhardy requested a review from benjamin-lieser November 18, 2024 19:55

dhardy added 5 commits November 19, 2024 10:09

More complex test for choose_multiple_weighted

16a16c6

Fix calculated CDF in choose_two_weighted_indexed

554d331

Test and fix choose_multiple_weighted with very small probabilities: r…

d645952

…ust-random#1476 Also improves choose_two_weighted_indexed time by 23% (excluding new test)

sample_efraimidis_spirakis: keep at most amount candidates

b806b29

Approx 2% improvement to tests sampling 2 of 100 elements

sample_efraimidis_spirakis: use algorithm A-ExpJ

0f662b1

This results in approx 18% faster tests choosing 2-in-100 items

dhardy marked this pull request as ready for review November 19, 2024 11:02

dhardy added 2 commits November 19, 2024 11:04

Rustfmt

51c6fdd

Clippy

a1f61ae

This was referenced Nov 23, 2024

choose_multiple_weighted returns unexpect probability of result #1476

Open

Prepare 0.9.0-beta.0 #1535

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add KS tests for weighted sampling #1530

Add KS tests for weighted sampling #1530

dhardy commented Nov 18, 2024

dhardy commented Nov 19, 2024 •

edited

Loading

dhardy commented Nov 19, 2024

Add KS tests for weighted sampling #1530

Are you sure you want to change the base?

Add KS tests for weighted sampling #1530

Conversation

dhardy commented Nov 18, 2024

Motivation

Details

dhardy commented Nov 19, 2024 • edited Loading

dhardy commented Nov 19, 2024

dhardy commented Nov 19, 2024 •

edited

Loading