Compare sampled normal distribution to PDF #1121

vks · 2021-05-07T21:16:37Z

This tests that we are sampling the normal distribution correctly. For now, this is limited to the normal distribution, but it can be extended to other distributions as well. I'm using sparklines to make debugging easier:

Sampled normal distribution:
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▃▃▄▄▅▅▆▆▇▇▇▇█▇▇▇▇▇▆▆▅▅▄▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Expected normal distribution:
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▃▃▄▄▅▅▆▆▇▇▇▇█▇▇▇▇▇▆▆▅▅▄▄▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
Difference:
▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▁▁▃▅▂▂▄▁▁▆▄▁▃▂▁▁█▃▅▂▂▂▁▂▄▂▃▅▂▁▁▂▂▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
max diff: 0.006058964254376686
max expected_error: 0.0035582208162928813

(This looks a bit better in the terminal.)

Refs #357.

Those will only be used for testing distributions.

`Normal` is sampled many times into a histogram, which is then compared to the expected probability density function. To make debugging easier, sparklines of the expected distribution, the histogram, and their difference are printed. Currently, the test fails if the difference is significantly larger than the expected error of the histogram bin. However, the error estimate does not take the error in the normalization due to the finite width of the histogram into account. This should not be a problem, as long as the distribution is almost zero outside the range covered by the histogram. In principle, this approach can be generalized to other distributions.

dhardy · 2021-05-08T07:29:04Z

Excellent!

One million samples: what is the run-time, and is this accurate enough to be really useful? Should we run it in all test-suite runs or put it behind a feature flag and limit when it runs?

TheIronBorn · 2021-05-08T15:46:35Z

We could verify other qualities as well. Mean, standard deviation, skewness, kurtosis, etc. See https://github.com/miloyip/normaldist-benchmark#procedure

vks · 2021-05-08T20:31:27Z

@dhardy

One million samples: what is the run-time, and is this accurate enough to be really useful?

One million samples was quite accurate already for the chosen bins (as shown above, the absolute maximum error per bin was ca. 0.006 in probability units). I should probably calculate the relative errors though to make sure. 100 000 samples also gives you a perfect sparkline, but for 10 000 samples you get visible differences.

Should we run it in all test-suite runs or put it behind a feature flag and limit when it runs?

On my CPU the runtime was unnoticeable. I don't think it will be a problem, but let me check the runtime on the CI after I fixed the tests.

@TheIronBorn

We could verify other qualities as well. Mean, standard deviation, skewness, kurtosis, etc.

We could, and this is already supported by average, but this is somewhat redundant to testing the histogram and may require even more samples to get a reasonable precision.

vks · 2021-05-09T00:43:00Z

The new tests run in about half a second, which I think is acceptable even if extended to all distributions.

dhardy

Code looks good (some minor comments).

The test runtime sounds very acceptable :+1

rand_distr/tests/pdf.rs

src/distributions/bernoulli.rs

vks · 2021-05-09T17:58:04Z

@dhardy I think I addressed your comments?

dhardy

👍

vks added 4 commits May 7, 2021 17:44

Update dependency

fd50cd4

Implement sparklines

0eb2454

Those will only be used for testing distributions.

Fix tests for old Rust versions

e8329a2

vks added 7 commits May 8, 2021 20:21

More fixes for old Rust versions

da94e87

Another fix for old Rust

70efd2e

rand_distr: Fix clippy warnings

0cde3bf

rand_core: Fix clippy warnings

67f491a

rand_hc: Fix clippy warnings

aee0044

rand: Fix clippy warnings

1b8eaf2

Add missing copyright headers

ab6f3dd

Fix Rust 1.36 compatibility

8dac1dc

dhardy approved these changes May 9, 2021

View reviewed changes

rand_distr/tests/pdf.rs Outdated Show resolved Hide resolved

rand_distr/tests/pdf.rs Outdated Show resolved Hide resolved

src/distributions/bernoulli.rs Show resolved Hide resolved

vks added 2 commits May 9, 2021 14:45

Define constant for histogram size

76a7305

Simplify loop

66da6e6

dhardy approved these changes May 10, 2021

View reviewed changes

vks merged commit 6d236fd into rust-random:master May 12, 2021

vks deleted the distr-test branch May 12, 2021 16:28

vks mentioned this pull request May 15, 2021

Testing of distributions #357

Closed

dhardy mentioned this pull request Jul 22, 2021

Adding skew normal random variable #1149

Closed

saona-raimundo mentioned this pull request Aug 3, 2021

rand_distr: Add Zipf distribution #1136

Merged

saona-raimundo mentioned this pull request Sep 5, 2021

SkewNormal distribution implementation #1174

Merged

vks mentioned this pull request Apr 26, 2024

Tracker: Closing Old Issues #1432

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare sampled normal distribution to PDF #1121

Compare sampled normal distribution to PDF #1121

vks commented May 7, 2021

dhardy commented May 8, 2021

TheIronBorn commented May 8, 2021

vks commented May 8, 2021

vks commented May 9, 2021

dhardy left a comment

vks commented May 9, 2021

dhardy left a comment

Compare sampled normal distribution to PDF #1121

Compare sampled normal distribution to PDF #1121

Conversation

vks commented May 7, 2021

dhardy commented May 8, 2021

TheIronBorn commented May 8, 2021

vks commented May 8, 2021

vks commented May 9, 2021

dhardy left a comment

Choose a reason for hiding this comment

vks commented May 9, 2021

dhardy left a comment

Choose a reason for hiding this comment